Dynamic programming

[Unfinished]

There are two types of Dynamic Programming: Top-Down or Bottom-Up. The Top-Down method is also called Memoization.

Consider the Fibonacci sequence: 0,1,1,2,3,5,8,13,21...

The first two terms is one, and any term greater than 2 is the sum of the two previous terms.

Let fib(n) denote the nth term of the Fibonacci sequence. So,

fib(n)    =0, if n=0

=1, if n=1

=fib(n-2)+fib(n-1), if n>2

So a recursive algorithm can be easily developed:

int badfib(int n){ if (n<0)           return -1; else if(n==0)    return 0;

else if(n==1)    return 1; else                return fib(n-2)+fib(n-1); }

Notice that this recursive function have a complexity of O(2^n). Any function that does exponential calculations is most likely inefficient and therefore unacceptable.

How is badfib inefficient?

Let's do an example. Calculate badfib(5)

badfib(5)=fabfib(3)+badfib(4) babfib(3)=badfib(1)+fadfib(2) badfib(1)=1 badfib(1)=1 badfib(0)=0 badfib(2)=1+0=1 badfib(3)=1+1=2 badfib(4)=badfib(3)+badfib(2) babfib(3)=badfib(1)+fadfib(2) badfib(1)=1 badfib(1)=1 badfib(0)=0 badfib(2)=1+0=1 badfib(3)=1+1=2 badfib(1)=1 badfib(0)=0 badfib(2)=1+0=1 badfib(4)=2+1=3 badfib(5)=3+2=5
 * badfib(2)=badfib(1)+badfib(0)
 * badfib(2)=badfib(1)+badfib(0)
 * badfib(2)=badfib(1)+badfib(0)

as you see from *, badfib(3) is calculated seperate three times. If we attempt to trace bafib(30), badfib(2), along with badfib(3), badfib(4),... is going to be calculated a lot more times, and thus inefficiency.

The recursive function badfib is inefficient because it calculates the same subproblem more than once.

In Dynamic Programming (Bottom Up), we start from smaller cases and store the calculated values in a table for future use, an effective strategy to most dependency-based problems. This avoids calculating the subproblem twice.

The code is relatively simple:

int fib(int n){

int fibArray[n];

fibArray[0]=0;

fibArray[1]=1;

for (int i=2;i<n;i++)

fibArray[i]=fibArray[i-2]+fibArray[i-1];

return fibArray[n-1];

}

We start from the basic cases: the 0th and 1st Fibonacci number and work our way to the bigger cases and eventually to the nth Fibonacci number. The complexity of the function is only O(n). Therefore, only approximately 30 calculations needs to be made using the Dynamic Programming technique as opposed to 1 billion calculations in recursion.

Dynamic Programming (DP) generates all enumerations, or rather, cases of the smaller breakdown problems, leading towards the larger cases, and eventually it will lead towards the final enumeration of size n. Such as in Fibonacci numbers, DP generated all Fibonacci numbers up to n.

Once you are given a problem, it is usually a good idea to check if DP is applicable to it. The second step to solving a problem using DP is to recognize the recursive relationship. The relationship maybe straightforward or even pointed out, or it maybe hidden and you have to find it. In any case, since you have already determined that it is indeed a DP problem, you should at least have a pretty good idea of the relationship.

Before you start coding, ask yourself one last question: Will this method solve the question?

Another example:

Given two sequences of characters:

ACATGGA

BCTGA

find the longest subsequence of these two sequences.

for example, CTA is a subsequence of both sequences since a subsequence does NOT has to be consecutive.

ACATGGA

BCTGA

However, CAT is a subsequence of ONLY the first sequence (order does matter).

ACATGGA

BCTGA -- ???

So how do we approach the problem?

If one of the sequence has no elements, then on a good day the longest subsequence will have no elements either.

Assume the longest subsequence of the first i-1 elements of the first sequence and first j-1 elements of the second sequence has length n.

Then if the i-th element and j-th element is equivalent, then it can be easily proven that element is part of the longest subsequence. If these two elements are not equivalent, then the longest subsequence (LSQ) is the larger of the LSQ of first i elements of first sequence and first j-1 elements of second sequence and the LSQ of first i-1 elements of first sequence and first j elements of the second sequence.

Let LSQ(i,j) represent the length of LSQ of first i elements of first sequence and first j elements of second sequence:

=0, if (i==0)

=0, if (j==0)

LSQ(i,j) =LSQ(i-1,j-1), if (i>sequence1.length || j>sequence2.length)

=LSQ(i-1,j-1)+1, if (sequence1[i-1]==sequence2[j-1]) *

=LSQ(i,j-1), if (LSQ(i,j-1)>=LSQ(i-1,j))

=LSQ(i-1,j), if (LSQ(i,j-1)<LSQ(i-1,j))


 * notice that in C++, the index of array starts at 0. so x[i] really means the (i+1)-th element of x.

We construct a 8x6 table by adding an extra row and column to 7x5 table (7 is the length of the first string, 5 is the length of the second string). Fill in the table according to the formula above (make sure you fill in the zeros first).

To find the exact subsequence, tracing back is required to re-construct the subsequence, with help from the table constructed above. Let LSD_S(i,j) represent the maximum subsequence in string format, then:

="", if(i==0 || j==0)

LSD_S(i,j)=LSD_S(i-1,j-1)+sequence1[i-1], if (table[i][j]-table[i-1][j-1]==1)

=LSD_S(i,j-1), if (table[i][j]==table[i][j-1])

=LSD_S(i-1,j), if (table[i][j]==table[i-1][j])