# Accelerating von Neumann's Alternating Projections Algorithm for Convex Feasibility

The code for my experiments and algorithm is hosted on github.

My research focuses on accelerating the alternating projections algorithm for finding a point in the intersection of convex sets. The algorithm, due to Jon von Neumann, is beautifully simple: just alternatingly project the iterate onto your convex sets until it converges to an optimal point, where optimality is obtained if and only if the iterate lies in the intersection of the convex sets.

Aside from aesthetics, I am interested in the work because it is practical: the majority of convex optimization problems can be reduced to solving this problem. As such, this method is relevant to control, finance, and machine learning, to name but a few fields.

Under the guidance of Professor Stephen Boyd, I have devised a class of modifications to the alternating projections method that is provably correct. These modifications are guaranteed to take at most the number of iterations that the vanilla algorithm takes and, though I have not yet theoretically verified this, should take many fewer in most cases. The trade-off is that our version of the algorithm requires us to solve three projections per iteration instead of two; that said, the extra projection can be computed extremely cheaply.

In this post, I motivate the problem by illustrating a shortcoming of the alternating projections method (convergence rapidly slows down as the iterates approach optimality). I then propose an algorithm that takes inspiration from the alternating projections method and provably converges to an optimal point in the intersection of the convex sets in question; the high-level idea is to solve a small quadratic program after each pair of projections. There is much work left to be done, and I close by listing the tasks that remain.

## Motivation

The appeal of the alternating projection algorithm lies in its simplicity. But, like other first-order methods, its convergence can be rather slow. Especially towards the tail end of the algorithm, you’ll find that the iterate often bounces back and forth between the convex sets without making much forward progress towards the intersection; if you come from the deep learning community, you can think of this problem as somewhat analogous to the vanishing gradient problem.

The problem is best illustrated through example. Below is a run of the alternating projections algorithm where the goal is to find a point in the intersection of two circles. The algorithm begins with an initial iterate at $(0, 10)$.

Notice how the iterate makes good progress at first but then gets stuck zigzagging between the two sets as it approaches the intersection. This slow convergence makes the algorithm impractical in many problem domains, especially where high accuracy is required.

So here’s the key question: Can we come up with a cheap modification to the alternating projection algorithm that prevents the sequence of iterates from degenerating into a zigzag trajectory?

## The Algorithm

For the remainder of this post, I will assume that the inputs to my algorithm are an affine set $\mathcal{A}$ and a convex cone $\mathcal{K}$. This assumption is not really an important one – the algorithm is correct so long as both sets are convex. But it turns out that almost all practical convex optimization problems can be reduced to finding a point in the intersection of an affine set and a convex cone, hence the assumption. In full specificity, the problem set-up is:

Input: An affine set $\mathcal{A}$ and a convex cone $\mathcal{K}$.

Goal: Find a point $x \in \mathcal{A} \cap \mathcal{K}$.

The initial iterate $z_0$ may be chosen randomly. Let $\Pi_{\mathcal{A}}$ be the projection operator onto $\mathcal{A}$ and let be $\Pi_\mathcal{K}$ the projection operator onto $\mathcal{K}$. Let $z_k$ be the $k$-th iterate. After the $k$-th step of the algorithm, perform the following updates:

In other words, $s_k$ is the iterate that would be obtained by averaged projections. (It is easy to show that averaged projections is equivalent to alternating projections.) In order to accelerate convergence, before performing the next averaged projection iteration, we will project $s_k$ onto the intersection of a number of halfspaces that contain $\mathcal{A} \cap \mathcal{K}$. More concretely, let $\tilde{A} = \{x \mid (z_k - q_k)^T(x - q_k) \leq 0\}$ be the halfspace containing $\mathcal{A}$ and let $\tilde{K} = \{x \mid (z_k - r_k)^T(x - r_k) \leq 0\}$ be the halfspace containing $\mathcal{K}$. After obtaining these halfspaces, we can generate the next iterate by projecting $s_k$ onto their intersection; that is, $z_{k+1} = \Pi_{\tilde{A} \cap \tilde{K}}(s_k)$.

It is not hard to prove that the above algorithm converges to an optimal solution of the feasibility problem; my previous post has a very rough proof sketch. I will post a complete proof sometime later.

There are many tweaks to the above algorithm that preserve convergence and may accelerate it; for example, we can choose to keep all the halfspaces and project onto the intersection of $2k$ halfspaces on the $k$-th iteration. Empirical results I’ve run suggest that doing so accelerates convergence. Doing so also increases the cost of each iteration, but I suspect the overhead is not sufficient. For example, see this post in which I discuss how to solve the projection cheaply.

This algorithm improves upon the vanilla version in the two-dimensional case. Below is a run of my algorithm on the problem we saw in the previous section.

## Results

The results of a toy experiment I’ve run are promising and are plotted below; the x-axis is the iteration number and the y-axis is the distance from the iterate to the intersection. Experiments were run on a 1000 dimensional affine space and a 1000 dimensional second order cone. I compared the performance of my algorithm, both when only two halfspaces are retained and when all of the halfspaces are retained, with the alternating projections algorithm and the state-of-the-art Dykstra’s algorithm. It is evident that keeping more halfspaces helps, at least in this experiment. (You can ignore the “ip (1.00)” label in the legend – it stands for “include the next halfspace with probability 1.0” and is a remnant from experiments with a randomized version of the algorithm.)

I also want to answer the question whether momentum updates may help prevent the QP algorithm that projections onto pairs of halfspaces from encountering the same problems that vanilla averaged projections encounters. I hypothesize that momentum will help, and my results support my hypothesis.

## Conclusion

My work is preliminary but promising: the algorithm I presented is but one example of an acceleration we can apply to the alternating projection method. There are many other things we could try, and their justifications fall out from the proof of my algorithm (I will post the proof soon!).

## Future Work

There’s a ton of future work to be done. Here’s a partial list:

• Analyze the algorithm’s rate of convergence.
• Attempt to do a cheap line search or plane search to even further accelerate convergence.
• Run experiments on larger problems.
• Implement a production-quality version of my algorithm, roll it into SCS, and evaluate it on Mark’s test suite.

Unrelated to this work, I’m also helping Steven Diamond with the design and implementation of CVXPY 1.0.