Last week Warren Buffet offered up a $1 Billion prize for anyone picking a perfect NCAA tournament bracket this March. Just exactly how difficult could it be to pick a perfect bracket? I set to find out that answer mathematically. After a couple days of analysis and Monte-Carlo simulations I have come up with an answer:
1 in 500,000,000,000 (1 in 500 billion)
For comparison, the odds of winning powerball are about 1 in 175,000,000. So for every person out there with a perfect NCAA bracket there would be 2,860 lottery winners. Daunting, but do-able.
How could I come up with such a number? Read on.
To answer this problem you have to first know how the NCAA brackets work. Not counting the first 4 teams in (which usually don’t factor into the bracket anyways) 64 teams are slotted into a 6 round, single elimination tournament. In order to get from 64 teams down to 1, precisely 63 teams must lose 1 game. So, if you assume that eachgame is a 50/50 crapshoot, then the math is simple- the odds are 1 in 2^63, or about 1 in 9.2 quintillion (9.2 * 10^18). Now that indeed would be an impossible task. To put that number in perspective, for each correct bracket with those odds there would be 52 billion powerball winners. Another way to wrap your mind around that number is to visualize that there are 8*10^17 square inches on the surface of earth (including all oceans, etc), so if you placed 11 brackets on every square inch of earth and they were all unique (no repeated brackets), you would have precisely one winner. Odds are that one square inch of real estate is not in your yard.
But each game is not a 50/50 crapshoot, and there lies the tantalizing probability of bringing those odds way, way down in your favor. For instance a #16 seed has never beaten a #1 seed. They have been playing this bracket format since I was a kid and it has not happened yet. Similarity a #2 seed rarely loses to a #15 seed. And if a #15 seed wins a game, it rarely wins a 2nd game. So we can use the fact that some teams are better than others to weight down the probabilities around the whole bracket.
To model this I used a Monte-Carlo simulation. Label each team A1-A16 (first region) through D1-D16 (last region). Assign “ping-pong balls” to each team based on rank. A #1 seed gets 99 ping pong balls (or their virtual equivalent). #2 gets 95, #3 gets 90 all they way down to #16 getting 1 ball. When 2 teams play, assign a random number between 0 to 1 for each team and then multiply that number by their ping-pong ball rating. The team with the higher final number wins.
Using this method, a #1 seed will beat a #16 seed about 99 times out of 100. Similarly, a #1 seed will beat a #3 seed about 99 times out of 99+90=189 (just over 50%). We can actually adjust these weights based on historical brackets. Here is the actual perl script I used to do this simulation.
Now, getting a perfect bracket is hard enough. If this year a bunch of #16-#13 seeds make runs deep into the tournament that scenario would be impossible to predict with the weights I have assigned to the teams. My weights make it much more likely that the top teams will make the deepest tournament runs, which models reality. To test the model I used the 2008 bracket as the model bracket I want the computer to come up with. Why 2008? That was a less-madness year, where all 4 #1 seeds made the final four, and there were few upsets throughout the brackets.
My script simulates a whole tournament, multiple times. I wanted to see how many times the computer would have to simulate the tournament to get all 63 games right (compared to the 2008 perfect bracket).
The first time through, the computer got 34 games right (out of a possible 63). The second and third tries yielded less than 34 correct games, but the 4th try improved on the result and correctly predicted 45 games. From there it takes a lot of iterations to get more games accurately predicted. In fact, here are the numbers I got.
Number of correct picks (compared to 2008 perfect bracket)
Iterations the computer needed to achieve this result
34    1
45    4
47    236
48    6,144
49    9,688
50    19,770
51    101,360
52    212,544
53    351,630
54    2,162,574
55    33,794,131
56    43,477,602
63    500,000,000,000 (est)
As you can see, the numbers go up very slowly past 50 correctly predicted games.  My computer can crunch about 10,000 brackets a second, but that still takes a long long time to get to 500,000,000,000.  So then, how do I arrive at 500,000,000,000?  Graph the known data points on a logarithmic scale, and extrapolate where 63 lands. A decently straight line to me shows about 500,000,000,000.
So, tell me. What are you going to do with your billion?
 
		