28 Oct 2016
The heavy-ball method is a *multi-step* iterative method that exploits iterates
prior to the most recent one. It does so by maintaining the
**momentum** of the previous two iterates. If you imagine
each iterate as a point in the trajectory of a falling ball, then the points
carry with them a momentum or a velocity. The heavy-ball method exploits that
velocity:

where are parameters. The heavy ball method is one
way of guarding against zigzagging trajectories in iterative descent and
alternating projection methods. (Scribed from Polyak’s book.)

I can’t find anything about accelerating alternating projections with the
heavy-ball method, though Escalante’s book *Alternating Projection Methods*
does describe how to perform a line search when all of the sets are
affine (I imagine the technique extends beyond the affine case).

28 Oct 2016
Steps:

- Framework for
**generating random cone programs**, (see extended SCS paper,
look for code) generating the KKT matrix and in particular the affine set
and the cone whose intersection we wish to probe ^{1}.
- Base class for the
**model**: **feasibility problem**.
- Base class for
**algorithms: iterative (projection)**.
- Inherited classes, descendants of (3), that
**implement various projection
algorithms**.

One thing, though: I’d like to, at least initially, use CVXPY in order to
implement the more involved acceleration schemes (for example, the plane search
along the points in the affine set). Actually, I’d like to use CVXPY for *all*
of my algorithms, because I don’t actually know how to write algorithms to
project onto convex sets … (!). // Taking advantage of special cone structures.

Scratch that. I know how to write algorithms to project onto convex sets, or
I vaguely know how to (subgradient descent methods, for example). What I don’t
know, or don’t yet buy (it is important for me to buy this!), is whether it is
often the case that it is easier to project onto and
individually rather than onto directly.

24 Oct 2016
## Notes from the meeting

Goal: See if *projection methods* can be accelerated by more intelligent
selection of the iterates. In particular, make more use of the information
that we collect along the way during projection: supporting hyperplanes and
their associated halfspaces, and sequences of iterates upon which to do a
line search.

### Projecting onto an affine set and a convex cone is general

Consider starting with an *affine set and a convex cone*. This is the most
general setting – it generalizes to any intersection of cones. To see this,
say that we wanted to project a vector onto ,
each . Form

Finding a point in is equivalent to finding a point
in . Note that is a convex cone and
that is an affine set. To project onto from
, we just take the average of each block .

### Projections can be cheap, even for extremely high dimensional data

Say we have some variable and a fixed in the same
space, and say we want to be the projection of onto a polyhedron.
Provided that the polyhedron is defined by a small number of halfpsaces, we can
compute this projection extremely cheaply. To be more concrete, our optimization
problem is of the form

where . That is, is short and fat. The
fact that is short is precisely what will let us compute the projection
of onto the polyhedron efficiently.

The Lagrangian of the above problem is

Minimizing it in the normal way, we have

Substituting into our objective function eliminates both
and :

Note that ! Our minimization problem has
become

We can solve the problem easily after paying the overhead to compute
and .

### Generating a large, random cone program.

Recall the KKT conditions for the problem in SCS.

Randomly generate . Generate .
Then generate . We will have . From there, enough will be specified to generate the data matrix
and the vector . What’s more, you’ll have the solution , too.

### Canonical sources and algorithms to read up on

- Borwein’s survey on alternating projections
- On the heavy ball method and momentum
- “FCM Method,” from the 364B notes? Something about cutting planes with
memory=2. I couldn’t find it.

### Ideas on accelerating alternating projections

- Dumbest, simplest idea (that just might work well enough): project onto
the intersection of the supporting halfspaces to form a new iterate.
- Somewhat smarter: Keep more than 2 halfspaces. (Will this help? Certainly
not in )
- Somewhat smarter still: Project onto more the halfspaces, but also do a
line/plane search of the previous iterates, a parameter.

Suggestion (2):

- Say we let be the supporting hyperplanes we collect for
the convex cone.
- And let be a 0-1 diagonal matrix that whose -th diagonal entry
is 1 if and only if we’re picking up the -th equality constraint.
- Then, at each step, we’re solving

Key question: Can we solve this efficiently? Will the trick above help?
Will FA become bigger than A? (No.)

Suggestion (3):

are a finite number of previous iterates that were on the affine set. That is, we’re doing a
plane search. Can we solve this cheaply? We don’t have to solve it exactly, just
need to do better than projecting the last of the iterates (the most recent one) onto
, because that’s what we do by default.
We’re assuming we know how to project onto .
Is the objective function differentiable? Does the cost still go down if we
do this? Do this, then do the cutting plane strategy, then project onto affine?

Boyd was under the impression / had the feeling that if we did three “weird” things,
things might just work out well. One or two might not suffice, however.

### Papers to bring up:

- Set Intersection Problems: Supporting Hyperplanes & Quadratic Programming
- Upshot: superlinear convergence in certain cases
- Work to be done: Numerical validations on real test suite

- SuperMann: Special case –> find point in intersection of (convex) sets

### Approaches to discuss for accelerating alternating projections:

- Solve QP + over-project? (Note that projecting onto the intersection of
two halfspaces is a simple cone projection problem.)
- What if we just initialize our new iterate at the intersection of the
two halfspaces (when finding the intersection of two sets)?
- Randomness (Mert Pilanci’s work) (choose which halfspaces to use uniformly
at random?)

### Questions:

- Canonical sources?

### Action items:

- Hooked into / clone the test suite (Mark N…?)
- Implement vs. theoretical guarantees – what’s the best way to go about
getting a feel for how “good” an algorithm is? A bit of both simultaneously?
- prove convergence at the very least …?

21 Oct 2016
Key Idea: SCS reduces the problem of solving a convex cone program to the
problem of finding a nonzero point in the intersection of a subspace and a
cone.

Key Idea: Can take many convex problems and convert them into feasibility
problems by forming the KKT system (i.e., system of equations and inequalities
that constitute the KKT optimality conditions)!

primal-dual pair –> hoomogeneous self-dual embedding = cvx feasability problem
–> solve with ADMM

nonzero solution to embedding –> solution to original; else certificate of
infeasibility

Homogeneous self-dual embeddings traditionally used in interior-point methods.

### Primal-Dual Pair

### KKT Conditions

The KKT conditions are given by

The KKT conditions can be embedded into a system of equations and inclusions
to obtain a feasibility problem:

subject to

where the last equality implicitly encodes complementary slackness and
explicitly enforces the duality gap to be zero.

### Homogeneous Self-Dual Embedding

The above KKT system has no solution if the original problem is either
primal or dual infeasible. We can get around this problem by adding two
non-negative variables and to the embedding:

A special feature of this problem is that at most of of and
are nonzero because equals at every
solution (due to skew symmetry), and each component of the inner product
is non-negative.

Possible cases:

- : Then the original problem has a solution () is a
scaling factor.
- : Then the duality gap is negative (primal or
dual is infeasible).
- provides us with the least information.
The problem can be expressed as

Let ,
, be the big matrix
above. Then the modified KKT problem becomes

.

Note that the equality constraint can be rewritten as

Written this way, it becomes evident that the SCS problem can be solved with
the alternating projections method.

21 Oct 2016
A coordinate system is a way from translating from numbers to a vector
in Euclidean space.

Say you have a set of basis vectors which differ from your basis
vectors .

A matrix can be thought of as
a transformation that moves to . B expresses the
actual coordinates of a vector expressed in the foreign basis of
(from her language to our language). The inverse takes a vector
written in our language to a vector written in her language.

*Change of coordinates*

Say . The are the coordinates
of in the standard basis. If is another basis for
, then

where , and is the
first coordinate of in the basis. This
is apparent because .

Read more