Longest increasing subsequence

The Longest Increasing Subsequence problem is to find the longest increasing subsequence of a given sequence. It also reduces to a graph theory problem of finding the longest path in a directed acyclic graph.

Overview
Formally, the problem is as follows:

Given a sequence $$a_1, a_2, \ldots, a_n$$, find the largest subset such that for every $$i < j$$, $$a_i < a_j$$.

Longest Common Subsequence
A simple way of finding the longest increasing subsequence is to use the Longest Common Subsequence (Dynamic Programming) algorithm.


 * 1) Make a sorted copy of the sequence $$A$$, denoted as $$B$$. $$O(n \log(n) )$$ time.
 * 2) Use Longest Common Subsequence on with $$A$$ and $$B$$. $$O(n^2)$$ time.

Dynamic Programming
There is a straight-forward Dynamic Programming solution in $$O(n^2)$$ time. Though this is asymptotically equivalent to the Longest Common Subsequence version of the solution, the constant is lower, as there is less overhead.

Let A be our sequence $$a_1,a_2,\ldots,a_n$$. Define $$q_k$$ as the length of the longest increasing subsequence of A, subject to the constraint that the subsequence must end on the element $$a_k$$. The longest increasing subsequence of A must end on some element of A, so that we can find its length by searching for the maximum value of q. All that remains is to find out the values $$q_k$$.

But $$q_k$$ can be found recursively, as follows: consider the set $$S_k$$ of all $$i < k$$ such that $$a_i < a_k$$. If this set is null, then all of the elements that come before $$a_k$$ are greater than it, which forces $$q_k = 1$$. Otherwise, if $$S_k$$ is not null, then q has some distribution over $$S_k$$. By the general contract of q, if we maximize q over $$S_k$$, we get the length of the longest increasing subsequence in $$S_k$$; we can append $$a_k$$ to this sequence, to get that:


 * $$q_k = max(q_j | j \isin S_k) + 1$$

If the actual subsequence is desired, it can be found in $$O(n)$$ further steps by moving backward through the q-array, or else by implementing the q-array as a set of stacks, so that the above "+ 1" is accomplished by "pushing" $$a_k$$ into a copy of the maximum-length stack seen so far.

Pseudocode
function lis_length( a ) n := a.length q := new Array(n) for k from 0 to n:       max := 0; for j from 0 to k, if a[k] > a[j]: if q[j] > max, then set max = q[j]. q[k] := max + 1; max := 0 for i from 0 to n:       if q[i] > max, then set max = q[i]. return max;

Faster Algorithm
Let $$A_{i,j}$$ be the smallest number of all increasing subsequences of length $$j$$ using elements $$a_1, a_2, a_3, \ldots, a_i$$.

Given $$a_i$$, you can naively linear search $$A_{i-1,j} \forall j$$ to find the first $$j$$ where $$a_i \leq A_{i-1,j}$$ - greedy in a sense.

Example
 We start with: A_initial = [∞, ∞, ∞, ∞, ∞, .. , ∞] a = [1, 2, 5, 3, 7, 3, 8, 5]

first, we start with $$a_0 = 1$$. as $$A_0$$ is smaller than $$a_0$$, we put it there.

A_0 = [∞, 1, ∞, ∞, ∞, ∞, .. , ∞] which represents that we can have a LIS of 1 element, with that subsequence ending with 1  Similarly, A_1 = [∞, 1, 2, ∞, ∞, ∞, .. , ∞] we first ask if we can construct a new LIS of 1 element, ending with 2. The answer is yes, but the previous best 1 is smaller, and thus we greedily ignore it - intuitively we choose the smaller number because it will always give us more numbers to append. Because 2 is bigger than 1, or $$a_1 > A_{1,1}$$, we now know that we can construct a new LIS of 2 elements with a new subsequence ending with 2, appending it to the increasing subsequence with one element.  We fast-forward to $$a_3 = 3$$, with A_3 = [∞, 1, 2, 5, ∞, ∞, ∞, .. , ∞] with similar logic. Now we linearly ask ourselves the same question for $$A_{3,1}$$. Then at $$A_{3,2}$$, we notice that because $$a_3 > A_{3,j=2}$$, that means we can use $$a_3$$ to create a new LIS of 3 elements, combining the previous subsequence of 2 elements (ending in 2) and appending 3. The last element of this new subsequence of length 3 is smaller than the previous best subsequence of length 3, so this is now our new best new subsequence of length 3. Note that since $$a_3 > A_{3,j=2}$$, you cannot construct a subsequence of a longer length ending with $$a_3$$ (prove by contradiction - if you can, $$A_{3,j=2}$$ would be smaller), and thus we can terminate early.

This is an output-sensitive $$O(kN)$$ algorithm, where $$k$$ is the size of the output - for each of the output $$k$$, we make $$N$$ comparisons.

Notice that $$\forall a_i > A_{i-1,j}, a_i > A_{i-1,j+1}$$, we do not update as we already have a "better" subsequence of length $$j + 1$$. $$\forall a_i < A_{i-1,j-1}, a_i < A_{i-1,j+1}$$ we trivially cannot update as $$a_i$$ does not creating a longer subsequence. We only update at $$a_i > A_{i-1,j}, a_i < A_{i-1,j+1}$$.

Even Faster Algorithm
Building on the $$O(kN)$$ algorithm, observe that for any particular $$i$$, $$A_{i,1} < A_{i,2} < \ldots < A_{i,j}$$. This suggests that if we want the longest subsequence that ends with $$a_{i+1}$$, we only need to look for a $$j \ni A_{i,j} < a_{i+1} \leq A_{i,j+1}$$ and the length will be $$j+1$$.

Since $$A$$ is always ordered in increasing order, and the operation does not change this ordering, we can do a binary search for every single $$a_1, a_2, \ldots, a_n$$. Given $$N$$ elements, and the max length subsequence $$k$$, each element will take at most $$O(\log k)$$ search, giving us the final complexity of $$O(N \log k)$$.

Implementation

 * C
 * C++ ($$O(n \log n)$$ algorithm - output sensitive - $$O(n \log k)$$)
 * Python ($$O(n^2)$$)