# Professor Boyd [10/24/2016]

## Notes from the meeting

Goal: See if projection methods can be accelerated by more intelligent selection of the iterates. In particular, make more use of the information that we collect along the way during projection: supporting hyperplanes and their associated halfspaces, and sequences of iterates upon which to do a line search.

### Projecting onto an affine set and a convex cone is general

Consider starting with an affine set and a convex cone. This is the most general setting – it generalizes to any intersection of cones. To see this, say that we wanted to project a vector $x$ onto $C = C_1 \cap C_2 \cap \ldots \cap C_r$, each $C_i \in \mathbb{R}^n$. Form

Finding a point in $\underset{i}\cap C_i$ is equivalent to finding a point in $\tilde{C} \cap \tilde{A}$. Note that $\tilde{C}$ is a convex cone and that $\tilde{A}$ is an affine set. To project onto $\tilde{A}$ from $\tilde{C}$, we just take the average of each block $x_i$.

### Projections can be cheap, even for extremely high dimensional data

Say we have some variable $x \in \mathbb{R}^{10^7}$ and a fixed $z$ in the same space, and say we want $x$ to be the projection of $z$ onto a polyhedron. Provided that the polyhedron is defined by a small number of halfpsaces, we can compute this projection extremely cheaply. To be more concrete, our optimization problem is of the form

where $F \in \mathbb{R}^{10 \times 10^7}$. That is, $F$ is short and fat. The fact that $F$ is short is precisely what will let us compute the projection of $z$ onto the polyhedron efficiently.

The Lagrangian of the above problem is

Minimizing it in the normal way, we have

Substituting $x = z - F^T\gamma$ into our objective function eliminates both $z$ and $x$:

Note that $FF^T \in \mathbb{R}^{10 \times 10}$! Our minimization problem has become

We can solve the problem easily after paying the overhead to compute $Fz$ and $FF^T$.

### Generating a large, random cone program.

Recall the KKT conditions for the problem in SCS.

Randomly generate $z \in_{\mathcal{R}} \mathbb{R}^n$. Generate $x = \Pi_C(z)$. Then generate $s = z - \Pi(z) = \Pi_{C^*}(z)$. We will have $x^Ts = 0, x \in C, s \in C^{*}$. From there, enough will be specified to generate the data matrix $A$ and the vector $b$. What’s more, you’ll have the solution $x$, too.

### Canonical sources and algorithms to read up on

1. Borwein’s survey on alternating projections
2. On the heavy ball method and momentum
3. “FCM Method,” from the 364B notes? Something about cutting planes with memory=2. I couldn’t find it.

### Ideas on accelerating alternating projections

1. Dumbest, simplest idea (that just might work well enough): project onto the intersection of the supporting halfspaces to form a new iterate.
2. Somewhat smarter: Keep more than 2 halfspaces. (Will this help? Certainly not in $\mathbb{R}^2.$)
3. Somewhat smarter still: Project onto more the halfspaces, but also do a line/plane search of the previous $k$ iterates, $k$ a parameter.

Suggestion (2):

• Say we let $\tilde{F}$ be the supporting hyperplanes we collect for the convex cone.
• And let $F$ be a 0-1 diagonal matrix that whose $i$-th diagonal entry is 1 if and only if we’re picking up the $i$-th equality constraint.
• Then, at each step, we’re solving

Key question: Can we solve this efficiently? Will the trick above help? Will FA become bigger than A? (No.)

Suggestion (3):

$a_i$ are a finite number of previous iterates that were on the affine set. That is, we’re doing a plane search. Can we solve this cheaply? We don’t have to solve it exactly, just need to do better than projecting the last of the iterates (the most recent one) onto $\mathcal{C}$, because that’s what we do by default. We’re assuming we know how to project onto $\mathcal{C}$. Is the objective function differentiable? Does the cost still go down if we do this? Do this, then do the cutting plane strategy, then project onto affine?

Boyd was under the impression / had the feeling that if we did three “weird” things, things might just work out well. One or two might not suffice, however.

### Papers to bring up:

1. Set Intersection Problems: Supporting Hyperplanes & Quadratic Programming
• Upshot: superlinear convergence in certain cases
• Work to be done: Numerical validations on real test suite
2. SuperMann: Special case –> find point in intersection of (convex) sets

### Approaches to discuss for accelerating alternating projections:

1. Solve QP + over-project? (Note that projecting onto the intersection of two halfspaces is a simple cone projection problem.)
2. What if we just initialize our new iterate at the intersection of the two halfspaces (when finding the intersection of two sets)?
3. Randomness (Mert Pilanci’s work) (choose which halfspaces to use uniformly at random?)

### Questions:

1. Canonical sources?

### Action items:

1. Hooked into / clone the test suite (Mark N…?)
2. Implement vs. theoretical guarantees – what’s the best way to go about getting a feel for how “good” an algorithm is? A bit of both simultaneously?
• prove convergence at the very least …?