{"id":562,"date":"2022-10-01T01:14:46","date_gmt":"2022-10-01T01:14:46","guid":{"rendered":"https:\/\/timallanwheeler.com\/blog\/?p=562"},"modified":"2022-10-05T19:26:10","modified_gmt":"2022-10-05T19:26:10","slug":"convex-duality","status":"publish","type":"post","link":"https:\/\/timallanwheeler.com\/blog\/2022\/10\/01\/convex-duality\/","title":{"rendered":"Convex Duality"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Convex duality is the sort of thing where you see the symbolic derivation and it sort of makes sense, but then you forget the steps and a day or two later you just sort of have to trust yourself that the theorems and facts hold but forget where they really come from. I recently relearned convex duality from <a href=\"https:\/\/www.youtube.com\/watch?v=McLq1hEq3UY&amp;ab_channel=Stanford\">the great lectures of Stephen Boyd<\/a>,  and in so doing, also learned about a neat geometry interpretation that helps solidify things.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Working with Infinite Steps<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Everything starts with an optimization problem in standard form:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\begin{matrix}\\underset{\\boldsymbol{x}}{\\text{minimize}} &amp; f(\\boldsymbol{x}) \\\\ \\text{subject to} &amp; \\boldsymbol{g}(\\boldsymbol{x}) \\leq \\boldsymbol{0} \\\\ &amp; \\boldsymbol{h}(\\boldsymbol{x}) = \\boldsymbol{0} \\end{matrix}\\]\n\n\n\n<p class=\"wp-block-paragraph\">This problem has an objective \\(f(\\boldsymbol{x})\\), some number of inequality constraints  \\(\\boldsymbol{g}(\\boldsymbol{x}) \\leq \\boldsymbol{0}\\), and some number of equality constraints \\(\\boldsymbol{h}(\\boldsymbol{x}) = \\boldsymbol{0}\\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Solving an unconstrained optimization problem often involves sampling the objective and using its gradient, or several local values, to know which direction to head in to find a better design \\(\\boldsymbol{x}\\). Solving a constrained optimization problem isn&#8217;t as straightforward &#8211; we cannot accept a design that violates any constraints.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can, however, view our constrained optimization problem as an unconstrained optimization problem. We simply incorporate our constraints into our objective, setting them to infinity when the constraints are violated and setting them to zero when the constraints are met:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\underset{\\boldsymbol{x}}{\\text{minimize}} \\enspace f(\\boldsymbol{x}) + \\sum_i \\infty \\cdot \\left( g_i(\\boldsymbol{x}) &gt; 0\\right) +  \\sum_j \\infty \\cdot \\left( h_j(\\boldsymbol{x}) \\neq 0\\right)\\]\n\n\n\n<p class=\"wp-block-paragraph\">For example, if our problem is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\begin{matrix}\\underset{x}{\\text{minimize}} &amp; x^2 \\\\ \\text{subject to} &amp; x \\geq 1 \\end{matrix}\\]\n\n\n\n<p class=\"wp-block-paragraph\">which looks like this<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"311\" height=\"250\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_372-1.png\" alt=\"\" class=\"wp-image-573\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_372-1.png 311w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_372-1-300x241.png 300w\" sizes=\"auto, (max-width: 311px) 100vw, 311px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">then we can rewrite it as <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\underset{x}{\\text{minimize}} \\enspace x^2+\\infty \\cdot \\left( x &lt; 1\\right)\\]\n\n\n\n<p class=\"wp-block-paragraph\">which looks like<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"312\" height=\"248\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_373.png\" alt=\"\" class=\"wp-image-576\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_373.png 312w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_373-300x238.png 300w\" sizes=\"auto, (max-width: 312px) 100vw, 312px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">We&#8217;ve simply raised the value of infeasible points to infinity. Problem solved?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By doing this, we made our optimization problem much harder to solve with local methods. The gradient of designs in the infeasible region is zero, so we cannot use those to know which way to progress toward feasibility. It is also discontinuous. Furthermore, infinity is rather difficult to work with &#8211; it just produces more infinities when added or multiplied to most numbers. This new optimization problem is perhaps simpler, but we&#8217;ve lost a lot of information.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What we can do instead is reinterpret our objective slightly. What if, instead of using this infinite step, we simply multiplied our inequality function \\(g(x) = 1 &#8211; x\\) by a scalar \\(\\lambda\\)?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\underset{x}{\\text{minimize}} \\enspace x^2+ \\lambda \\left( 1 &#8211; x \\right)\\]\n\n\n\n<p class=\"wp-block-paragraph\">If we have a feasible \\(x\\), then \\(\\lambda = 0\\) gives us the original objective function. If we have an infeasible \\(x\\), then \\(\\lambda = \\infty\\) gives us the penalized objective function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We&#8217;re taking the objective and constraint, which are separate functions:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"256\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_374.png\" alt=\"\" class=\"wp-image-578\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_374.png 630w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_374-300x122.png 300w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\"> and combining them into a single objective based on \\(\\lambda\\):<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"399\" height=\"255\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_375.png\" alt=\"\" class=\"wp-image-579\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_375.png 399w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_375-300x192.png 300w\" sizes=\"auto, (max-width: 399px) 100vw, 399px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s call our modified objective function the <em>Lagrangian:<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{L}(x, \\lambda) =  x^2+ \\lambda \\left( 1 &#8211; x \\right)\\]\n\n\n\n<p class=\"wp-block-paragraph\">If we fix  \\(\\lambda\\), we can minimize our objective over \\(x\\). I am going to call these <em>dual values:<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"443\" height=\"252\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_377.png\" alt=\"\" class=\"wp-image-581\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_377.png 443w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_377-300x171.png 300w\" sizes=\"auto, (max-width: 443px) 100vw, 443px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">We find that all of these dual values are below the optimal value of the original problem:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"473\" height=\"259\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_378.png\" alt=\"\" class=\"wp-image-582\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_378.png 473w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_378-300x164.png 300w\" sizes=\"auto, (max-width: 473px) 100vw, 473px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">This in itself is a big deal. We can produce a lower bound on the optimal value simply by minimizing the Lagrangian. Why is this?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Well, it turns out that we are guaranteed to get a lower bound any time that \\(\\lambda \\geq 0\\). This is because a positive \\(\\lambda\\) ensures that our penalized objective is at or below the original objective for all feasible points. Minimizing is thus guaranteed to give us a design at or below the original value. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is such a big deal, that we&#8217;ll call this minimization problem the <em>Lagrange dual function<\/em>, and give it a shorthand:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{D}(\\lambda) = \\underset{x}{\\text{minimize}} \\enspace \\mathcal{L}(x, \\lambda)\\]\n\n\n\n<p class=\"wp-block-paragraph\">which for our problem is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{D}(\\lambda) = \\underset{x}{\\text{minimize}} \\enspace x^2 +  \\lambda \\left( 1 &#8211; x \\right)\\]\n\n\n\n<p class=\"wp-block-paragraph\">Okay, so we have a way to get a lower bound. How about we get the *best* lower bound? That gives rise to the <em>Lagrange dual problem<\/em>:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\underset{\\lambda \\geq 0}{\\text{maximize}} \\enspace \\mathcal{D}(\\lambda)\\]\n\n\n\n<p class=\"wp-block-paragraph\">For our example problem, the best lower bound is given by \\(\\lambda = 2\\), which produces \\(x = 1\\), \\(f(x) = 1\\). This is the solution to the original problem.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Having access to a lower bound can be incredibly useful. It isn&#8217;t guaranteed to be useful (the lower bound can be \\(-\\infty\\)), but it can also be relatively close to optimal. For convex problems, the bound is tight &#8211; the solution to the Lagrange dual problem has the same value as the solution to the original (primal) problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Geometric Interpretation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We&#8217;ve been working with a lot of symbols. We have some plots, but I&#8217;d love to build further intuition with a geometric interpretation. This interpretation also comes from Boyd&#8217;s lectures, and <a href=\"https:\/\/web.stanford.edu\/~boyd\/cvxbook\/\">textbook<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s work with a slightly beefier optimization problem with a single inequality constraint:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\begin{matrix}\\underset{\\boldsymbol{x}}{\\text{minimize}} &amp; \\|\\boldsymbol{A} \\boldsymbol{x} &#8211; \\boldsymbol{b} \\|_2^2 \\\\ \\text{subject to} &amp; x_1 \\geq c \\end{matrix}\\]\n\n\n\n<p class=\"wp-block-paragraph\">In this case, I&#8217;ll use<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\boldsymbol{A} = \\begin{bmatrix} 1 &amp; 0.3 \\\\ 0.3 &amp; 2\\end{bmatrix} \\qquad \\boldsymbol{b} = \\begin{bmatrix} 0.5 \\\\ 1.5\\end{bmatrix} \\qquad c = 1.5\\]\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s plot the objective, show the feasible set in blue, and show the optimal solution:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"373\" height=\"365\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_379.png\" alt=\"\" class=\"wp-image-589\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_379.png 373w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_379-300x294.png 300w\" sizes=\"auto, (max-width: 373px) 100vw, 373px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Our Lagrangian is again comprised of two terms &#8211; the objective function \\( \\|\\boldsymbol{A} \\boldsymbol{x} &#8211; \\boldsymbol{b} \\|_2^2\\) and the constraint \\(\\lambda (c &#8211; x_1)\\). We can build a neat geometric interpretation of convex duality by plotting (constraint, objective) pairs for various values of \\(\\boldsymbol{x}\\):<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"366\" height=\"312\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_380.png\" alt=\"\" class=\"wp-image-590\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_380.png 366w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_380-300x256.png 300w\" sizes=\"auto, (max-width: 366px) 100vw, 366px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">We get a parabolic region. This isn&#8217;t a huge surprise, given that we have a quadratic objective function. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note that the horizontal axis is the constraint value. We only have feasibility when \\(g(\\boldsymbol{x}) \\leq 0\\), so our feasible designs are those left of the vertical line.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"365\" height=\"316\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_381.png\" alt=\"\" class=\"wp-image-591\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_381.png 365w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_381-300x260.png 300w\" sizes=\"auto, (max-width: 365px) 100vw, 365px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">I&#8217;ve already plotted the optimal value &#8211; it is at the bottom of the feasible region. (Lower objective function values are better, and the bottom has the lowest value).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s see if we can include our Lagrangian. The Lagrangian is:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{L}(\\boldsymbol{x}, \\lambda) = f(\\boldsymbol{x}) + \\lambda g(\\boldsymbol{x})\\]\n\n\n\n<p class=\"wp-block-paragraph\">If we fix \\(\\boldsymbol{x}\\), it is linear!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The Lagrange dual function is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{D}(\\lambda) = \\underset{\\boldsymbol{x}}{\\text{minimize}} \\enspace f(\\boldsymbol{x}) + \\lambda g(\\boldsymbol{x})\\]\n\n\n\n<p class=\"wp-block-paragraph\">If we evaluate \\(\\mathcal{D}\\) for a particular \\(\\lambda\\), we&#8217;ll minimize and get a design  \\(\\hat{\\boldsymbol{x}}\\). The value of \\(\\mathcal{D}(\\lambda)\\) will be \\(f(\\hat{\\boldsymbol{x}}) + \\lambda g(\\hat{\\boldsymbol{x}})\\). Remember, for \\(\\lambda \\geq 0\\) this is a lower bound. Said lower bound is the y-intercept of the line with slope \\(-\\lambda\\) that passed through our (constraint, objective) at \\(\\hat{\\boldsymbol{x}}\\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, with \\(\\lambda = 0.75\\), we get:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"363\" height=\"309\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_383.png\" alt=\"\" class=\"wp-image-594\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_383.png 363w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_383-300x255.png 300w\" sizes=\"auto, (max-width: 363px) 100vw, 363px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">We got a lower bound, and it is indeed lower than the true optimum. Furthermore, that line is a supporting hyperplane of our feasible region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s try it again with  \\(\\lambda = 3\\):<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"364\" height=\"315\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_384.png\" alt=\"\" class=\"wp-image-595\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_384.png 364w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_384-300x260.png 300w\" sizes=\"auto, (max-width: 364px) 100vw, 364px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Our \\(\\lambda\\) is still non-negative, so we get a valid lower bound. It is below our optimal value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now, in solving the Lagrange dual problem we get the best lower bound. This best lower bound just so happens to correspond to the \\(\\lambda\\) that produces a supporting hyperplane at our optimum (\\(\\lambda \\approx 2.15\\)):<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"366\" height=\"318\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_385.png\" alt=\"\" class=\"wp-image-596\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_385.png 366w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/09\/Selection_385-300x261.png 300w\" sizes=\"auto, (max-width: 366px) 100vw, 366px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Our line is tangent at our optimum. This ties in nicely with the KKT conditions (out of scope for this blog post), which require that the gradient of the Lagrangian be zero at the optimum (for a problem with a differentiable objective and constraints), and thus the gradients of the objective and the constraints (weighted by \\(\\lambda\\)) balance out. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Notice that in this case, the inequality constraint was active, with our optimum at the constraint boundary with \\(g(\\boldsymbol{x}) = 0\\). We needed a positive \\(\\lambda\\) to balance against the objective function gradient.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If we change our problem such that the unconstrained optimum is feasible, then we do not need a positive dual value to balance against the objective function gradient. Let&#8217;s adjust our constraint to max our solution feasible:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"368\" height=\"359\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_387.png\" alt=\"\" class=\"wp-image-600\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_387.png 368w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_387-300x293.png 300w\" sizes=\"auto, (max-width: 368px) 100vw, 368px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">We now get an objective \/ constraint region with a minimum inside the feasible area:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"309\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_388.png\" alt=\"\" class=\"wp-image-601\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_388.png 360w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_388-300x258.png 300w\" sizes=\"auto, (max-width: 360px) 100vw, 360px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Our best lower bound is obtained with \\(\\lambda = 0\\):<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"366\" height=\"315\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_389.png\" alt=\"\" class=\"wp-image-602\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_389.png 366w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2022\/10\/Selection_389-300x258.png 300w\" sizes=\"auto, (max-width: 366px) 100vw, 366px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">This demonstrates <em>complementary slackness<\/em>, the idea that \\(\\lambda_i g_i(\\boldsymbol{x})\\) is always zero. If constraint \\(i\\) is active, \\(g_i(\\boldsymbol{x}) = 0\\);\\ and if constraint \\(i\\) is inactive, \\(\\lambda_i = 0\\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So with this geometric interpretation, we can think of choosing \\(\\lambda\\) as picking the slope of a line that supports this multidimensional volume from below and to the left. We want the line with the highest y-intercept.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Multiple Constraints (and Equality Constraints)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I started this post off with a problem with multiple inequality and equality constraints. In order to be thorough, I just wanted to mention that the Lagrangian and related concepts extend the way you&#8217;d expect:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{L}(\\boldsymbol{x}, \\boldsymbol{\\lambda}, \\boldsymbol{\\nu}) = f(\\boldsymbol{x}) + \\sum_i \\lambda_i g_i(\\boldsymbol{x}) + \\sum_j \\nu_j h_j(\\boldsymbol{x})\\]\n\n\n\n<p class=\"wp-block-paragraph\">and<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\mathcal{D}(\\boldsymbol{\\lambda}, \\boldsymbol{\\nu}) = \\underset{\\boldsymbol{x}}{\\text{maximize}} \\enspace \\mathcal{L}(\\boldsymbol{x}, \\boldsymbol{\\lambda}, \\boldsymbol{\\nu})\\]\n\n\n\n<p class=\"wp-block-paragraph\">We require that all \\(\\lambda\\) values be nonnegative, but do not need to restrict the \\(\\nu\\) values for the equality constraints. As such, our Lagrangian dual problem is:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[\\underset{\\boldsymbol{\\lambda} \\geq \\boldsymbol{0}, \\&gt;\\boldsymbol{\\nu}}{\\text{maximize}} \\enspace \\mathcal{D}(\\boldsymbol{\\lambda}, \\boldsymbol{\\nu})\\]\n\n\n\n<p class=\"wp-block-paragraph\">The geometric interpretation still holds &#8211; it just has more dimensions. We&#8217;re just finding supporting hyperplanes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Convex duality is the sort of thing where you see the symbolic derivation and it sort of makes sense, but then you forget the steps and a day or two later you just sort of have to trust yourself that the theorems and facts hold but forget where they really come from. I recently relearned [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-562","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/comments?post=562"}],"version-history":[{"count":42,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/562\/revisions"}],"predecessor-version":[{"id":622,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/562\/revisions\/622"}],"wp:attachment":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/media?parent=562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/categories?post=562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/tags?post=562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}