Convex Optimization Research Notebook

Accelerating von Neumann's Alternating Projections Algorithm for Convex Feasibility

The code for my experiments and algorithm is hosted on github.

My research focuses on accelerating the alternating projections algorithm for finding a point in the intersection of convex sets. The algorithm, due to Jon von Neumann, is beautifully simple: just alternatingly project the iterate onto your convex sets until it converges to an optimal point, where optimality is obtained if and only if the iterate lies in the intersection of the convex sets.

Aside from aesthetics, I am interested in the work because it is practical: the majority of convex optimization problems can be reduced to solving this problem. As such, this method is relevant to control, finance, and machine learning, to name but a few fields.

Under the guidance of Professor Stephen Boyd, I have devised a class of modifications to the alternating projections method that is provably correct. These modifications are guaranteed to take at most the number of iterations that the vanilla algorithm takes and, though I have not yet theoretically verified this, should take many fewer in most cases. The trade-off is that our version of the algorithm requires us to solve three projections per iteration instead of two; that said, the extra projection can be computed extremely cheaply.

In this post, I motivate the problem by illustrating a shortcoming of the alternating projections method (convergence rapidly slows down as the iterates approach optimality). I then propose an algorithm that takes inspiration from the alternating projections method and provably converges to an optimal point in the intersection of the convex sets in question; the high-level idea is to solve a small quadratic program after each pair of projections. There is much work left to be done, and I close by listing the tasks that remain.

Motivation

The appeal of the alternating projection algorithm lies in its simplicity. But, like other first-order methods, its convergence can be rather slow. Especially towards the tail end of the algorithm, you’ll find that the iterate often bounces back and forth between the convex sets without making much forward progress towards the intersection; if you come from the deep learning community, you can think of this problem as somewhat analogous to the vanishing gradient problem.

The problem is best illustrated through example. Below is a run of the alternating projections algorithm where the goal is to find a point in the intersection of two circles. The algorithm begins with an initial iterate at .

Notice how the iterate makes good progress at first but then gets stuck zigzagging between the two sets as it approaches the intersection. This slow convergence makes the algorithm impractical in many problem domains, especially where high accuracy is required.

So here’s the key question: Can we come up with a cheap modification to the alternating projection algorithm that prevents the sequence of iterates from degenerating into a zigzag trajectory?

The Algorithm

For the remainder of this post, I will assume that the inputs to my algorithm are an affine set and a convex cone . This assumption is not really an important one – the algorithm is correct so long as both sets are convex. But it turns out that almost all practical convex optimization problems can be reduced to finding a point in the intersection of an affine set and a convex cone, hence the assumption. In full specificity, the problem set-up is:

Input: An affine set and a convex cone .

Goal: Find a point .

The initial iterate may be chosen randomly. Let be the projection operator onto and let be the projection operator onto . Let be the -th iterate. After the -th step of the algorithm, perform the following updates:

In other words, is the iterate that would be obtained by averaged projections. (It is easy to show that averaged projections is equivalent to alternating projections.) In order to accelerate convergence, before performing the next averaged projection iteration, we will project onto the intersection of a number of halfspaces that contain . More concretely, let be the halfspace containing and let be the halfspace containing . After obtaining these halfspaces, we can generate the next iterate by projecting onto their intersection; that is, .

It is not hard to prove that the above algorithm converges to an optimal solution of the feasibility problem; my previous post has a very rough proof sketch. I will post a complete proof sometime later.

There are many tweaks to the above algorithm that preserve convergence and may accelerate it; for example, we can choose to keep all the halfspaces and project onto the intersection of halfspaces on the -th iteration. Empirical results I’ve run suggest that doing so accelerates convergence. Doing so also increases the cost of each iteration, but I suspect the overhead is not sufficient. For example, see this post in which I discuss how to solve the projection cheaply.

This algorithm improves upon the vanilla version in the two-dimensional case. Below is a run of my algorithm on the problem we saw in the previous section.

Results

The results of a toy experiment I’ve run are promising and are plotted below; the x-axis is the iteration number and the y-axis is the distance from the iterate to the intersection. Experiments were run on a 1000 dimensional affine space and a 1000 dimensional second order cone. I compared the performance of my algorithm, both when only two halfspaces are retained and when all of the halfspaces are retained, with the alternating projections algorithm and the state-of-the-art Dykstra’s algorithm. It is evident that keeping more halfspaces helps, at least in this experiment. (You can ignore the “ip (1.00)” label in the legend – it stands for “include the next halfspace with probability 1.0” and is a remnant from experiments with a randomized version of the algorithm.)

I also want to answer the question whether momentum updates may help prevent the QP algorithm that projections onto pairs of halfspaces from encountering the same problems that vanilla averaged projections encounters. I hypothesize that momentum will help, and my results support my hypothesis.

Conclusion

My work is preliminary but promising: the algorithm I presented is but one example of an acceleration we can apply to the alternating projection method. There are many other things we could try, and their justifications fall out from the proof of my algorithm (I will post the proof soon!).

Future Work

There’s a ton of future work to be done. Here’s a partial list:

  • Analyze the algorithm’s rate of convergence.
  • Attempt to do a cheap line search or plane search to even further accelerate convergence.
  • Run experiments on larger problems.
  • Implement a production-quality version of my algorithm, roll it into SCS, and evaluate it on Mark’s test suite.

Unrelated to this work, I’m also helping Steven Diamond with the design and implementation of CVXPY 1.0.

Proof Sketch: Outer Approximation Acceleration of Averaged Projections

A simple but potentially powerful acceleration scheme: perform averaged projections and then solve a small QP in which you project the averaged iterate onto the intersection of the halfspaces that the projection nets you. It is not difficult to show that this algorithm converges to an optimal solution. Below is a proof sketch.

Key Steps in the Proof of Alternating Projections

An analysis of the proof of Von Neumann’s alternating projections algorithm.

First results for alternating projection acceleration

  1. Plane search does not obey fejer monotonicity
  2. Surprisingly, initial evidence implies that the heavy ball method only hurts. This is of course not a conclusion, but rather an anecdotal observation.
  3. QP (mem == 2) + overprojection seems to work decently well. Is the QP projection cheap? Probably. Closed form? Maybe.

The baseline algorithms:

  1. Alternating projections
  2. Alternating projections with heavy-ball momentum

The state-of-the-art:

  1. Dykstra

What did I try?

Problem set-up:

  • 1500-dimensional variable
  • Affine constraint with random matrix \in , 1% sparsity.
  • , constrained to be in the second-order cone.

(1) A plane search:

are a finite number of previous iterates that were on the affine set, the convex cone. It turns out that performing this plane search produces sequences that are not Fejer monotonic!

(2) Halfpsace Outer Approximation (HOA)

I used the most recent halfspaces generated by projecting onto the convex sets, and obtained the next iterate by projecting onto the intersection of these sets. This works “fairly well.” It works “better” than Dykstra’s if we only look at the number of iterations. Of course, the number of iterations isn’t a “good” metric, but it’s a start. Even small () works well.

(3) HOA with randomly accepting new hyperplanes.

I don’t understand this well yet, but I suspect there is something interesting here.

(4) HOA with momentum

Anecdotally, momentum “helps” with respect to the number of iterations. Over-projecting seems to be more useful than interpolating between the previous velocity and the current one. Again, very anecdotal evidence.

Future work and experiments

(0) Write-up what I’ve done and learned this quarter and send it to Prof. Boyd.

(1) Anneal the momentum (say by a factor of , the # of iterations)?

(2) Flip a p-biased coin to decide whether or not to oveproject

(3) Flip a p-biased coin and overproject by a lot

(4) Take advantage of short-and-fat matrix structure, if it arises

(5) Project onto random subsets of equality constraints

(6) Test on the homogeneous self-dual embedding of a cone program (SCS).

Questions for Professor Boyd

(1) Does a closed-form projector onto the intersection of two halfspaces exist, and/or can it be computed cheaply by some means?

(2) Why is the optimal value returned by CVXPY inaccurate? (roughly 10e-2 precision.)

(3) Lots of stuff I want to try empirically. If anything stands out, will go forward with theoretical analysis. HOA + momentum seems the most promising right now. Does that sound good?

(4) Logistics for continuing the work next quarter? Is an independent study still possible? I think I have sufficient momentum now to carry me forward. Will you be reachable by email? (If not, that’s totally understandable!)

(5) I’m still very much interested in TA-ing convex in the spring, if the offer still stands.

Checklist

1. The Problems.

Each problem must eventually be cast as a feasibility problem. In particular, the feasibility problem must be cast as finding a point in the intersection of an affine set and a cone. Note that the cone, however, may be the cross-product of multiple different cones.

2. + 3. The Algorithms

.

Alternating projections

Dykstra’s

ADMM

Halfspace Eliminating Alternating Projections (memory )

Halfspace Eliminating Alternating Projections (memory )

Halfspace Eliminating Alternating Projections, with momentum

Halfspace Eliminating Alternating Projections, with plane search / momentum

Halfspace Eliminating Alternating Projections, with plane search / momentum and additional problem transformations

4. Theoretical Analyses

Halfspace Eliminating Alternating Projections (free, from Pang)

Halfspace Eliminating Alternating Projections, with plane search

Halfspace Eliminating Alternating Projections, with plane search and additional problem transformations

5. The Test Cases

2D feasibility problems, for intuition’s sake.

Intersection of second-order cone and affine set, randomly generated

Random cone problems of varying sizes.

  • NB: Can use the method described in the long SCS paper to generate these problems. Will need to take generated problem and form the self-dual homogeneous embedding in order to recover the feasibility problem, however.

Open test suite for feasibility problems?

  • To the best of my knowledge, none exist!