Welcome to the MATH 100-COSC 49/149 Web Page

Math 100/COSC 149 and COSC 49

Topics in Probability:
Randomness and Randomized Algorithms

Instructor: Prof. Peter Winkler (peter.winkler at dartmouth.edu)

### News

The final exam has been graded; everyone passed. (The last question was treated as extra credit, thus the scores are out of 56; range, 18 to 58, median 38.) Grad students all earned P's; all students can get their exam score and final grade by email to the instructor. Check this site when you return to campus in early January: a time and place will be announced where the answers to the final exam questions will be explained, and exams seen (but not kept).

The syllabus below has been changed to reflect actual coverage. Instructor's office hours this week (Nov 13-17) will be the usual, 1:15-2:45 Tues and 10:15-11:45 Thursday. In addition, our TA, Andrew Bohm, will host an office hour on Thursday morning 8-9 am in Vail 702 (highly recommended for homework issues).

### Abstract

What is randomness, and what is it good for? It was once thought by many that randomness is an invention of humans, needed to account for our limited ability to predict the future. Now it seems that randomness is the oldest concept in the universe, dating to the Big Bang.

For us mathematicians and computer scientists, randomness turns out to be a valuable commodity---providing in some cases the best, even the only, way to solve certain problems and perform certain computations. But it is elusive and hard to find in its pure form. Can we improve it? Can we measure it? Can we even define it? Once we have used it to help solve a problem, can we dispense with it?

In this course we will examine randomness, trying to find out how to best obtain it and best use it. Programming will not be required (but might be useful for some students in some special projects).

Prerequisites: Mathematical background of a senior mathematics major or a beginning graduate student in mathematics or theoretical computer science, including a course in probability (e.g., MATH 20 or Math 60). If in doubt, please see the instructor. Dist: QDS.

Here is what was covered in the course:

1. Randomized algorithms: Quicksort; mincut; Las Vegas v. Monte Carlo

2. Obtaining and measuring randomness: information, entropy, compression; Von Neumann's trick; Huffman code

3. The minimax principle for randomized algorithms: games and game trees, Yao's Theorem, equilibrium for 0-sum games

4. Tools from probability: linearity of expectation, Markov and Chebyshev inequalities

5. The probabilistic method: sum bound, balls in bins, coupon collecting, birthday problem

6. Simple random walk on graphs: hitting and cover time, Whirling tour

7. Markov chains: reducible, ergodic, reversible; eigenvalues; stationary distribution; detailed balance

8. Estimation by sampling: self-reducible problems; total variation distance; mixing time

9. Coupling and path coupling, Jerrum-Sinclair trick, activity parameter

### Classes

Room: Kemeny Hall 105
Lectures: "10" slot, in particular: Monday, Wednesday and Friday 10:10 - 11:15.
X-hour: Thursdays (same room) 12:15 pm - 1:05 pm. Will be used only when so announced in class.

### Staff

Instructor:
Peter Winkler -- Kemeny Hall 231
Office Hours: Tu 1:15 - 2:45; Th 10:15 - 11:45.

### Textbook

Rajeev Motwani and Prabhakar Raghavan, Randomized Algorithms Cambridge 1995.
Available online at https://rajsain.files.wordpress.com/2013/11/randomized-algorithms-motwani-and-raghavan.pdf
Supplementary materials (including recent advances) will be supplied in class.

Your grade will be based on homework, class participation, two hour exams, and a final exam. Homework and grading might be slightly different for enrollees of MATH 100/COSC 149 and enrollees of the simultaneous undergraduate course, COSC 49. The hour exams will be given in class on Monday, Oct 2 and Monday, Oct 23; let me know immediately if you have any possible conflict. The final exam will be in the regular slot for MWF10 courses, namely, 8am on Tuesday morning, Nov 21.

### Quizzes

There may be unannounced in-class quizzes, just to make sure everyone is keeping up.

### Homework

Homework will be assigned at each class period, due at the beginning of the next class.

### Assignments

Due Wednesday Sept 13: Read Chapter 1 of your text, and: (easy) compute the precise number of comparisons needed by Midsort to sort 255 records; (hard) compute the number of comparisons needed by Midsort to sort 2k - 1 records, as a function of k. In Midsort, you are miraculously provided at each stage with an element that has exactly half the remaining elements below it and half above. You still have to determine for each other element whether it is below or above. As a check, Midsorting 7 records takes 10 comparisons.

Due Friday Sept 15: In our ideal tennis tournament, 2n players are bracketed randomly and the outcome of each match is equally likely to be won by either player. As usual in an elimination tournament, all the players meet in pairs in Round 1, then the survivors meet in Round 2, etc., until one player is left. What is the probability that, at some point, Rafa plays Roger?

Due Monday Sept 18: Show that the min cut algorithm that contracts random pairs of vertices, instead of random edges, could (for some graphs) succeed with only exponentially small probability; that is, there are graphs on n vertices for which it works with probability less than cn, for some constant c. This is Exercise 1.2 in your text. Warning: this requires a bit more care than you might think.

Due Wednesday Sept 20: Given a coin whose probability of coming up heads is a positive number p other than 1/2, get an upper and lower bound for the minimum expected number of flips needed to make a fair binary decision. Von Neumann's trick already gives you an upper bound, namely 1/p(1-p) flips; can you do better? How could you go about getting a lower bound bigger than 2 flips?

Due Friday Sept 22: Derive a Huffman code for the Hawaiian alphabet, using the following letter frequencies: A 27%, O 13%, K 12%, N 9%, I 8%, E 7%, U 6%, L 5%, H 5%, P 4%, M 3%, W 1%. (You are encouraged to check Wikipedia or your favorite source to make sure you understand the Huffman code algorithm.) Then, compute your average code length (sum of code length times frequency).

Due Monday Sept 25: Suppose that you need lossless compression of 128-bit strings in which each bit is independently a 1 with probability 1%, and 0 with probability 99%. Design a code for this purpose. What is the average length of your code? Example: you could divide the string into 64 pairs of consecutive bits and code 00 as 0, 10 as 10, 01 as 110 and 11 as 111; this would give you average code-length 64(1x.99x.99 + 2x.01x.99 + 3x.99x.01 + 3x.01x.01) ~ 65.9, a considerable improvement from 128. But I bet you can do better.

Due Wednesday Sept 27: How many random bits do you need, on average, to do the following? (a) Generate a random integer between 1 and n (inclusive); (b) Shuffle a deck of n cards, that is, generate a uniformly random permutation of the integers 1 through n. In each case, you'll want to construct an algorithm and then compute the expected number of random bits it uses.

Due Friday Sept 29: Suppose you are creating a random integer between 1 and n by using your random bits as the binary expansion random real number r = .b1b2b3... in [0,1]. As soon as you know which interval [(j-1)/n, j/n) your r is in, you stop and output j. What's the expected number of bits required by this algorithm? HINT #1: Try this first for n = 3. HINT #2: You don't need to know the binary expansion of 1/n, 2/n, etc. to do this. The key is to observe that fewer than logn bits never suffice, and after that there are just n-1 "bad" strings of any given length that leave you undecided between two intervals.

Due Wednesday Oct 4: Show that for any n and any deterministic algorithm that evaluates complete binary game trees of depth n with binary inputs, there is an instance that forces the algorithm to read every one of the 2n input bits. (This is Problem 2.1 in your text.)

Due Friday Opt 6: What happens if you change the book's definition of ZPP to require that there is a polynomial P such that given input of size n, the Turing machine always stops in time bounded by P(n)?

Due Monday Oct 9: Let T be a complete ternary tree of depth d, that is, each node has three children until you get down to the leaves. At each non-leaf node is a majority gate that returns the majority value of its three input bits. Design a randomized algorithm to evaluate T, and determine its performance, i.e., the expected number of the 3d input bits that it reads assuming worst-case input.

Due Wednesday Oct 11: How big can n be so that a random graph on n vertices might not contain a clique or anticlique of size 5? (Use the sum bound.)

Due Friday Oct 13: Jake amasses a complete collection of Snapple caps, of which there are n types, each equally likely. Right after he completes his set he gives all of his duplicate caps to Sally, keeping only one of each type. What is the expected number of caps Sally needs to add, to complete her own collection?

Due Monday Oct 16: Show that if n is big enough (how big?) then with positive probability it will have the property that for any two distinct vertices u and v, there is a third vertex w that is adjacent to u but not to v.

Due Wednesday Oct 18: Roberta and Charles each write down a positive integer. The one who writes a smaller number earns \$1 from the other player, unless it is only smaller by 1, in which case the writer of the bigger number wins \$2 from the other player. If both players write down the same number, no money changes hands. For example, say Roberta writes "4". If Charles writes "1" or "2" he wins \$1; if "3", he loses \$2; if "4", they tie; if "5", he wins \$2; if more than 5, he loses \$1. What's the equilibrium strategy (same for both players)? HINT: The answer gives positive probability to the numbers 1 through k, for a certain value of k.

Due Friday Oct 20: Compute the expected hitting time h0i from vertex 0 to vertex i for simple random walk on the n-cycle Cn.

Due Wednesday Oct 25: Let G be the graph on n vertices consisting of a clique on k vertices and a path attached to vertex z of the clique, ending at vertex v. Let u be a vertex of the clique other than z; compute the expected hitting time from u to v. HINT: Do it in three stages: u to z, z to the first vertex on the path, and from there to v.

Due Friday Oct 27: Compute the expected number of coinflips needed to get n heads in a row. (You can think of this process as an n+1-state Markov chain with absorption at the last state.)

Due Monday Oct 30: Compute the stationary distribution of the Markov chain whose states are all possible left-to-right orderings of four tokens, labeled 1, 2, 3, and 4. A step of the chain is made by choosing token i with probability 0.i, then exchanging it with the token on its right. If the chosen token i is already in the rightmost position, the chain remains in its previous state (but time still advances by 1). ADVICE: Before struggling with a 4!x4! matrix, try detailed balance.

Due Wednesday Nov 1: Let Xt be a lazy random walk on the n-dimensional discrete hypercube, i.e., the set of binary sequences of length n. You start at (0,0,...,0) and at each tick of the clock, you roll your n-sided die (getting, say, the value i) and then flip your coin. If the coin comes up heads you flip the ith bit of your state, otherwise you do nothing. Let t(k) be the time it takes to get your kth new die value. (All rolls count, even if you didn't make the corresponding move.) By symmetry, you know that the stationary distribution of this Markov chain is uniform (each state having probability 1/2n). Compute the total variation distance between Xt(k) and the stationary distribution.

Due Friday Nov 3: Show that k-COLOR COUNT is self-reducible. In other words, show that the number of colorings of the vertices of a graph G by colors 1 through k is equal to the sum of those numbers for two other graphs---which will eventually be easy to count colorings of.

Due Monday Nov 6: Let P be a partially ordered set with n elements, and let M be the Markov chain on linear extensions of P that chooses a number k u.a.r. in {1,2,...,n-1} and interchanges the elements at levels k and k+1, if possible---that is, if those elements are incomparable in P. Show M is irreducible.

Due Wednesday Nov 8: Let P be a partially ordered set with n elements, and let S be the region of n-space consisting of all vectors (x1,...,xn) for which xixj whenever i < j in P, and 0 ≤ xi ≤ 1 for all i. Show that the problem of estimating the number of linear extensions of P can be reduced to estimating the volume of S.

Due Friday Nov 10: For n = 4, 5, 6 determine whether our Markov chain on 3-colorings of the n-cycle Cn is irreducible. Justify your answers.

Due Monday Nov 13: This is the final homework assignment, and will count a little more heavily than the others. Do one of the following:

Write a program that runs two coupled chains of 4-colorings of a 6-cycle, and run it until the chains merge. The coupling will be the natural one, where you choose a vertex and a color and in each chain, you use that color on that vertex when possible. Then, alter your program so that it applies the Jerrum-Sinclair switcheroo: if the chosen vertex v currently has the same color in both copies, but color i appears on a neighbor of v in only one chain while color j appears only in the other, then when you choose i in one chain you choose j in the other and vice-versa. (In all other cases, the coupling is as before.) Run again---as often as you like---and compare the merging time with what you got before.

Obtain (with justification) a closed-form expression for the number of k-colorings of an n-cycle.

### Honor Code

Students are encouraged to work together to do homework problems. What is important is a student's eventual understanding of homework problems, and not how that is achieved. The honor principle applies to homework in the following way. What a student turns in as a written homework solution is to be his or her own understanding of how to do the problem. Students must state what sources they have consulted, with whom they have collaborated, and from whom they have received help. Students are discouraged from using solutions to problems that may be posted on the web, and as just stated, must reference them if they use them. The solutions you submit must be written by you alone. Any copying (electronic or otherwise) of another person's solutions, in whole or in part, is a violation of the Honor Code.

If you have any questions as to whether some action would be acceptable under the Academic Honor Code, please speak to me, and I will be glad to help clarify things. It is always easier to ask beforehand than to have trouble later!

### Disabilities

I encourage any students with disabilities, including "invisible" disabilities such as chronic diseases and learning disabilities, to discuss appropriate accommodations that might help you with this class, either after class or during office hours. Dartmouth College has an active program to help students with disabilities, and I am happy to do whatever I can to help out, as appropriate.

The Student Disabilities Center is located at 318 Wilson Hall, ext. 6-9900, http://www.dartmouth.edu/~accessibility, if you have any questions. Any student with a documented disability requiring academic adjustments or accommodations is requested to speak with me by the end of the second week of the term. All discussions will remain confidential, although the Academic Skills Center may be consulted to verify the documentation of the disability and advise on an appropriate response to the need. It is important, however, that you talk to me soon, so that I can make whatever arrangements might be needed in a timely fashion.