10900 - So you want to be a 2n-aire?
You are a player on a quiz show. In the beginning, you have $1. For each correct answer this price doubles, but once you give a wrong answer, you lose everything. The game is over either after you decide to stop, or after you answer questions.
Each time you are given a question, you think for a while, and come up with a possible answer. You can also estimate the probability that your answer is correct. Based on this you can decide whether you will stop playing (and take the current price), or try to answer the question.
What is your expected price, if you use an optimal strategy?
Assume that is a random variable uniformly distributed over the interval .
First of all, we will explain the last sentence. Take a lot of questions, try to answer each of them, and each time write down the value . That sentence says that the numbers you have written down will all come from the interval and they will be approximately uniformly distributed.
Now, what is the optimal strategy?
Suppose that your current price is , and that the probability of answering the current question right is . If you keep the price, you will gain . If you answer the question, with probability you will have and you will still be in the game with one question less to answer, with probability you gain nothing and you are out.
If this is the last question, the expected price if you answer the question is . If this is more than , you should answer the question, otherwise you shouldn't.
Now consider the general case. Let be the answer we seek – the expected price, if you have questions left and play optimally. How to compute it? If you don't answer, you will get dollars. If you do, you have the probability that you get the price .
When you will be in this situation, you will know the exact value , thus your expected price will be .
But when we compute our answer, we don't know the question you are going to get. What now? We will simply take an "average over all possible values of p". (As there is an infinite number of possible values, the "average" will actually be an integral.)
We get the following recurrence:
Using them, we can compute the values and output the value .
An easy way of computing : split the integral into two parts, on the first part integrate the constant function , on the second part integrate the linear function .
A More Formal Approach
First we define a couple of random variables. Let denote the amount that you win when you have questions left, and let denote the probability that you'll answer the next question correctly when you have questions left.
We want to determine , which is the expected prize when there are questions left.
We use to denote the conditional expected value of given that . This is just the expression that was derived above.
Now, as explained above, we don't know the value of . The solution above was to "average over all possible values of p". This is the correct thing to do, but here's the formal justification for this step:
There is a theorem from probability theory which says that . (See, for example, section 7.5.2, "Computing Expectations by Conditioning", in _A First Course in Probability_ by Sheldon Ross, to see why this theorem holds.)
where is the density function of . Since is uniformly distributed over , it follows that
- for t <= p <= 1 and
- for p < t or p > 1
Plugging the density function and the conditional expected value into the expression for yields the same results as above:
1 0.5 1 0.3 2 0.6 24 0.25 30 0.8 0 0
1.500 1.357 2.560 230.138 45517159.608