The committee lies

December 19, 2023 Neville Aga Leave a comment

So the final 2023 CFP rankings have come and gone, and I reflect on the model performance. There are 3 different ways we can look at what happened — computer best teams, computer most deserving teams, and playoffPredictor.com method with committee bias.

Computer best teams involves carefully looking at margin-of-victory and using that to deduce the best teams. Close wins are punished (a 1-point win gives an m=-0.8) and blowout wins are rewarded (35-points and greater wins give an m=+0.5). This is detailed in the full method. I know it works because results are well correlated with other computers (who churn though *much* more data than I do), and I have excellent results — I finished the 2023 regular season at around 52.3% ATS, which was 6th (out of 28 computers) and 2nd (out of 17 computers) that picked all games this season. From a best team perspective — meaning bet on these teams in Vegas to win on a neutral field, here is what we have after week 14:

Teams that were ultimately selected by the committee as the 4 “committee best” are highlighted in yellow.

We see that 2 of the 4 of the “committee best” are nowhere close to the “computer best”. The computer best top 4 includes 2-loss Oregon and 1-loss Ohio State. These teams were dominant — For example, Oregon beat top-25 Oregon State by 24 points, where Washington beat them by only 7. Computers and book makers agree — if Oregon and Washington were to meet a 3rd time this season Oregon would still be a very slight favorite. The fact that Washington beat them already twice has nothing to do with where the money would land. Oregon, Ohio State, Georgia — all really, really good teams.

Now, look at it from computer “most deserving” standpoint. Most deserving involves slightly punishing 1-point and 2-point wins, and giving slight advantages to blowouts of 25 points and again 35+ points. It is a formula that is well correlated to human polls with a end of season η around 1.2. Again, see the full paper. Most deserving in 2023 were:

The computer got it exactly right, including #4 Alabama and #5 FSU. But that is not the method I have used for 10 years. What I have used is a committee bias — basically see differences between the computer most deserving and committee best, and use the average of that to predict the future. That method cratered this year. 2023 final predictions were:

The model of computer most deserving + bias predicted #2 Georgia and #4 Ohio State. But these were wrong.

What happened?

Well, to put it bluntly, the committee lied. They lie every week from week 9 right through week 13. Week 14 (the final week) they just pick who they want. Take Georgia for example — the committee had them #2 in weeks 9-10, and #1 in weeks 11-13. Rece Davis even asked the CFP chair point blank before the Bama game “is Georgia unequivocally one of the 4 best teams?” Translation — they have already done enough to get into the playoff. The votes weeks 9-13 would say yes — the computer had Georgia as low as #9 in week 9 – they had not beaten anyone. Georgia only rose to #3 in the computer most deserving by week 13 because finally they had blowout wins over Tennessee, Ole Miss, and Missouri. But that bias was so strong– as high as .11 in week 9 and always above ~.05 that even with a loss to Bama, Georgia should have been one of the best 4 in the final poll. Else, the committee was lying all those weeks 9-13. I mean in Week 9 when Georgia was #2 behind Ohio State their best wins were Kentucky, Auburn, and Florida. Teams that would finish 7-5, 6-6, and 5-7 respectively. I mean they beat Auburn by 7 when New Mexico State beat Auburn by 21. If the committee knew something and elevated them to #2 back then, they sure sold Georgia up the river in the final poll when they acted like they never had UGA in the top 5.

So what to do going forward? I could simply change the formula to have no bias and simply use the computer most deserving, but that does not work all years. Look at this over the last 10 years:

To read this take an example from 2022. The computer most deserving after week 14 was Georgia, Michigan, Ohio State, and TCU. That was #1, #2, #4, and #3 to the committee (the committee had TCU #3 and Ohio State #4). With correction for bias the method produced #1 #2 #3 #4. All 4 teams correct and the order correct.

Look at 2023. Computer most deserving are #1 #2 #3 #4. But add in bias and you get #1 #3 #5 #7. Not good. Again, the committee lied.

But look at 2017. Computer most deserving are #2 #8 #1 #6 — bad. But add in bias and you get #1 #2 #3 #5. Not perfect, but good. 2017 was the year USF went 12-0, and the committee was never going to get them to the top 4 even though the computer most deserving had them at #3.

So, I’ll keep my method for the 12-team playoff the same next year, but use a grain of salt. The method is sound, the computer is accurate, but the committee lies.

playoffPredictor.com

Final CFB playoff predictions 2023

November 30, 2023 Neville Aga 2 Comments

The games have been played up to conference championship weekend. The committee has spoken 5 times. On Sunday they speak for a 6th and final time for this 4-team format. The computer has churned out the answers — and the model likes Ohio State and not Alabama or Texas if one or more teams slip up this weekend.

I don’t like it, but the committee has over-liked Ohio State all these weeks, putting them at #1 when the model has them usually #3 or #4. Even last week after the UM defeat they only went to #6 and the computer has them at #5, so small negative bias, not enough to make up for weeks 1-4.

Right now the only path for Texas is Washington and Georgia winning, Michigan and FSU losing (about a 0.2% chance). The only path for Bama is Washington winning, Michigan and FSU losing (about 0.4% chance)

If the committee ends up agreeing with the computer I’ll be happy the committee did the right thing, but mad at the committee for being wrong weeks 1-5. Let’s see what happens. https://playoffpredictor.com

playoffPredictor.com

Betting on the CFB 2023 season

August 22, 2023 Neville Aga 2 Comments

I am going to use my model and track prediction accuracy for beating Vegas for 2023. Picks will be tracked at predictions.collegefootballdata.com. 2 accounts will be used to track the model:

@PlPredict_all for all FBS games, and
@playoffPredict for high-confidence games per the model, additionally
@cfb_vegas_line for the Tuesday (opening Vegas line)

Definition of high confidence is >= 7.5 points difference between Vegas line and computer model. Only applies after week 2 (weeks 3 till end of season)

Model will not be updated within a week (picks locked on Tuesday)

u/nevilleaga is going to pick all games using the trailing 15 weeks of data (meaning if it is week3 using week 1-2 data from 2023 and using week 3-17 data from 2022

use @CiscoNeville to mess with any personal ideas. One example is looking at the line and predicting something close to the line skewed by the method. @CiscoNeville will do that by using actual 2023 game results plus one extra week of the line as results. So for example if the OU-Iowa State line is OU – 14, a game will be entered with OU 14 IowaSt 0 like it actually happened.

Results published here as comments

update 9/28/23 – I realize that for weeks 2-4 I used and m of best for eta, not m of best against the spread. For week 5 and forward I will be using m=ATS model 6, which goes from m=-0.8 for a 1 point 1 to m=+0.5 for a 35+ point win.

Uncategorized

My SE internal links at Cisco

June 12, 2023 Neville Aga Leave a comment

Below is a list of my most useful links on the Cisco Intranet that would be handy to get a generalist SE/SA up and productive. Of course most of these links will not work without access to the Cisco internal network via VPN, and they will not work without logging into Cisco (which is done at id.cisco.com)

Generate estimates, see quotes

Cisco Commerce Workspace – https://apps.cisco.com/Commerce/home – Used to build and share estimates and quotes.

Enterprise Agreement management portal – https://eaccw.cloudapps.cisco.com/app/#/ – Used to see the quotes your software sales specialist puts for EAs – knowledge worker counts, suites covered, etc.

Salesforce – https://ciscosales.lightning.force.com/lightning/ – Where to see deals and AMs push commits.

Connect with product management

topic – topic.cisco.com – Possibly the most important tool for any Cisco presales SE/SA. Look up every question ever asked and any answer ever given in any forum to product management or TMEs. Set data sources to “Newsgroup” to search PM/TME communications and/or “CSOne” to search all previous TAC cases. Leave the other sources unchecked.

Salesconnect – salesconnect.cisco.com – Get internal product TDM decks, see VT slides

Post-sales

TAC case query – https://cae-xmlkwery.cisco.com/main/casekwery.php – This is the one I like the most. There are several portals that will get the same TAC data, but this one organizes the information best.

CCW-R – https://ccrc.cisco.com/ccwr/? – where you can input a serial number and find info on ship date, who it was sold to, and if it is still under E-LLW or TAC support

CCO lookup tool – https://cdca.cloudapps.cisco.com/cdca/lookup.do – Lookup your customers CCO status, see what contracts they are associated to, and even add contracts to their profile, self-service (sometimes works). Try doing a search not on email address (takes too long), but on customer name using exact (not contains). Searching is case-sensitive, so “university of Oklahoma” returns nothing, but “University of Oklahoma” returns the records. If you want to look up yourself use it without the @cisco.com – for example “neaga”, not “neaga@cisco.com”

Smart account info via Cisco Software Central – software.cisco.com – First get access to your customers smart accounts by using the link for “Request Access to an Existing Smart Account”, then use Smart Software Manager to see their licenses, including products registered and pulling licenses.

Smart account reporting – https://software.cisco.com/software/csws/smartaccount/internalReport – use this link when you don’t have access to their smart account

TAC automated scripts – scripts.cisco.com – link for the wireless analyzer

Compensation

visibility – https://visibility.cisco.com/ – See product quotas and attainment, also SPIFF information

sales compensation portal – https://sales-comp.cisco.com/employee/home – Show PE buckets, percentages, base/variable mix.

Numbers

My business reports – https://mbr.cloudapps.cisco.com – See product quotas and attainment for the region. See week to date or quarter to date numbers. Also hit dashboards->Bookings 360. Best way to see product bookings over the last 3 years by account. Hit advanced filters, change from “Sales Hierarchy” to either customer or BE.

Smartsheet – https://app.smartsheet.com/ – where we keep our weekly high/low, and account team members

Centro – https://centro.cisco.com/Americas – How an RM views the sales dashboard. Have to request access.

Installed base reporting

Tansels sheet – https://cisco.sharepoint.com/:x:/r/sites/SLEDArchitecture/_layouts/15/Doc.aspx – This gets it own category. Pulled monthly, I believe.

Cisco Ready – https://rewarddash.cloudapps.cisco.com/#/sales/analysis/asset – How to find out what is at a customer. Use “Install Base” “Total Asset View”. Change the filter to get one account, and then click on the right and download to email/teams. Once downloaded, open the excel and select the data and move it to a pivot table.

HR links

Workday – workday.cisco.com – Request PTO and see paystubs
Apps – appstore.cisco.com – A simplified platform for Cisco-approved software and tools.
Devices – devices.cisco.com – A simplified experience to easily find, order, and get support for your devices.
HelpZone – cisco.service-now.com/helpzone – A self-service and support portal for all your IT and HR requirements.

BGP / DNS resources

Looking glass – lg.he.net – understand how your customer peers to the internet

DNS lookup tool – dnslytics.com – see your customers registered domains and contact info

CIDR report – cidr-report.org/as2.0/ – see the size and weekly changes of the global internet routing table

Cisco tools

RunOn – runon.cisco.com – Find cloud infrastructure tools that enable multi-cloud and hybrid cloud services.

Public Sector tools

Erate commitment letters – apps.usac.org/sl/tools/commitments-search/AdvancedNotification.aspx – See if your school is funded

Erate eligibility – ciscoerate.com – Is that DNA-A license 100% eligible?

Training

training hours towards 200 – https://wwss.cisco.com/ –

learn – learn.cisco.com

continuing education – ce.cisco.com

digital learning – digital-learning.cisco.com – Good courses to re-certify IE

Cloud portals

console.amp.cisco.com

SecureX – visibility.amp.cisco.com

Webex control hub – admin.webex.com

Demo licenses and hardware

Demo licenses – ibpm.cisco.com – give it some time to come up

Demo hardware – dls.cisco.com – get your customers trial gear

Picklist – pklst.cloudapps.cisco.com – Get old, junked hardware. Sometimes get lucky!

Collab sandbox – collabtoolbox.cisco.com – Create a sandbox, register a DX

Other

Ariba buying – https://s1.ariba.com/gb/?realm=cisco-child&locale=en_US – purchase office supply stuff

Demo

dcloud – dcloud.cisco.com – demo Cisco solutions

Data

eDnA – edna.cisco.com – enterprise data analytics. Snowflake, BI on Cisco data sets

Internal Licenses

Toolbox – toolbox.cisco.com – VMware and Microsoft lab licenses

Meraki licenses – epp.merkai.net – buy Meraki internal licenses at 90% off

Internal Software

Internal builds – ftp://swds.cisco.com/swc/interim – Download gobs and gobs of internal and external software. FTP connection.

Smart account reporting – mce.cisco.com – my cisco entitlements

playoffPredictor.com

Best college football team of the cfp era

May 30, 2023 Neville Aga Leave a comment

Now that I have settled on a mathematical model for my playoffPredictor.com computer ratings for the 2023 season, I wanted to look back and see what team has been the highest rated team since my website and the playoff started in 2014.

In the 9 years of the cfp era, there have only been 4 teams to go undefeated. There are three 15-0 teams (2018 Clemson, 2019 LSU, and 2022 Georgia), and one 13-0 team as a result of the COVID year (2020 Alabama).

Popular wisdom would tell you that the 2019 LSU team is the best of the last nine years, but put it in the computer and it spits out a surprising result – that LSU team is only 3rd best.

The #1 team of the era? 2020 Alabama. Here is the full list of every team in the playoff era that managed to get greater than a 1.0 playoffPredictor.com rating:

2020 Alabama was an incredible team that does not get their due because of COIVD. No close games at all – closest win was 15-points to Ole Miss. 8 blowouts, including both playoff games. Did not even play anyone ranked lower than #82.

2018 Clemson is on top of 2019 LSU because they played only 2 close games (Texas A&M and Syracuse), only 2 games with a normal victory margin (South Carolina and Boston College, 20-21 points each, which is right on the edge of blowout), and every other game was a >= 28-point blowout, including against #3 Notre Dame and #2 Alabama. I mean, who in the cfp era beats Alabama by 28 points? That list has one entry on it. I mean, there are only 2 other entries on the list that have beaten Bama by more than 7 points in the 9 years of the CFP era (2021 Georgia-15 points, and 2017 Auburn-12 points).

The only teams to make the list outside of the Alabama/Georgia/LSU/Clemson/Ohio State leadership are Oregon, UCF, and Wisconsin.

So back to 2019 LSU, why are they relatively low? Because before blowing out Oklahoma and beating Clemson in the playoff, they played 3 games all decided by 7 points or less (Texas, Auburn, Alabama). That lack of margin-of-victory hurts their overall resume. In fairness, this is a clear situation where Margin-of-Victory does not tell the correct story. In all three of those games LSU was in control and simply recovered an on-side kick by the opposition in order to turn a late 2-score game into a 1 score game and took knees to end those games. Maybe in another year I’ll switch the formula to time of victory and that will put 2019 LSU higher, but that is for another blog post. 2019 Ohio State was an incredible computer team that year, even with the single loss to Clemson. Better than 6 of the last 9 national champions. It would have been something special to see that 2019 Ohio State team take on 2019 LSU if that wide receiver had not stopped his route short.

Uncategorized

Probabilities of a 7-game series

April 27, 2023 Neville Aga Leave a comment

Ever wondered how odds map from a single game contest to a best of 7 series? For example, say you knew in any single game between team A and team B that team A will win 60% percent of the time. If they were to play a one game series on a neutral floor team A obviously advances 60% of the time. But what if they play a 7 games series (all on a neutral floor). What do team A’s odds improve to?

This can be computed analytically and straightforward. ChatGPT says to use the binomial distribution, but I’m not sure that’s 100% correct yet. Binomial will get you the probability team A gets exactly 5 wins in a 7 trials, but of course that can’t happen as after you win 4 the series is over. So, let’s just exhaust all possibilities between team H(favored) and team A(underdog):

For team H to win in 4 games there is one way: HHHH, total probability is p⁴ where p is defined as the probability team H wins any one game (60% from our original question). 0.6⁴= .1296 = 12.96% ≈13%. Meaning there is a 13% team H will sweep.

Team H can win in 5 games 4 different ways: HHHAH, HHAHH, HAHHH, AHHH. Each has a probability of p⁴(1-p) = 0.05184 in our scenario. Multiplying by the 4 ways to achieve that result leaves a total probability of 4p⁴(1-p) = 20.736%.

Team H can win in 6 games 10 different way and in 7 games 20 different ways (reader can check those for themselves), so final probability for team H to win a 7 game series is:

Where q is defined as (1-p)

Run the numbers and you find that 60% for a single game translates into 71% for a seven-game series (71.0208% to be exact). So, a 7 game series will draw out the better team by a benefit of 11% for a team that is already 10% better than a 50/50 coin flip.

Running the number for other values of p you get:

So it gives a 2x boost initially, with diminishing returns as you get past 80-90%. Charted out it looks like this:

To me its compelling at p=80% or so. p=80% becomes 97%. p=80% means the team is -400 – bet $400 to receive $500. So—unless your team is a -400 favorite in game 1, don’t be too confident they will win the 7-game series. In fact, you can extrapolate the series winning odds from the odds from game 1.

Now, how about when team H wins the first 2 games? I’ve heard it said then for team A to win becomes impossible because they have to win 4 of 5 games. How likely is that? The formula there becomes:

For a p=.5 that works to 81% Not very impossible at all, in fact will happen 1 our of 5 times.

Well, then why in the NBA when a team goes up 2-0 they win the series 95%? Because of course the team that goes up 2-0 is better than 50/50 against team B. More like p=0.66 – that gets you to 95% series wins after going up 2-0. Note, p=0.66 gets you only to 81% for the series before the start of game 1.

Summarizing the math:

If team A is 2x better (p(A)=.666, will win 2 times out of 3), then they win the series 81%.

If team A is 2x better (p(A)=.666, will win 2 times out of 3) and they win first 2, then they win the series 95%

If team A and team B are equal (50/50), then team A wins the series 50% (duh)

If team A and team B are equal (50/50) and team A wins the first 2, then team A wins the series 81%

If team A is worse (p(A)=.333) but wins first two games, they win the series 54% — even though they are worse.

Bottom line – winning a 7-game series after going down 0-2 is not hard – as long as you are the better team! Be twice as good and you have a 47% shot of pulling off the comeback.

Uncategorized

My favorite song

October 23, 2022 Neville Aga Leave a comment

Is Free Bird by Lynyrd Skynyrd

playoffPredictor.com

Playoff probabilities via Monte-Carlo

October 11, 2022 Neville Aga Leave a comment

I have added a product to the playoffPredictor.com site that visualizes the percentage chances a team has to make the college football playoff.

PlayoffPredictor.com was launched with the express belief that if you knew who was going to win future games, you could accurately predict the top 4 in the final poll. If you know (or can predict) that Alabama is going to beat Georgia in the SEC championship game, can you definitively predict if Georgia will still make the college football playoff? Yes, you can answer that if you know how all the other games pan out (like Baylor beating Oklahoma State for example, leaving that slot open for Georgia).

The next logical place to use that data would be to iterate through each future scheduled game and using the probabilities of each team to win, exhaustively calculating the probability of each team to win the college football playoff. Unfortunately exhaustive scenario modeling is virtual computational impossibility. If you tried to enumerate every possibility (just for wins and losses, not even for margin of victory) from week 8 till week 14, you have about ~450 games to model. Given that a game only has two possibilities for the winner, this works out to 2⁴⁵⁰ = 2.9*10¹³⁵ scenarios to model. How big is 10 raised to the 135? Well, I have seen estimates of the number of atoms in the know universe anywhere from 10⁸⁵ – 10¹¹⁰ , so 10¹³⁵ is many orders of magnitude larger than the number of atoms in the universe, and would take trillions of years to compute.

So do we give up modeling future probabilities? No, we introduce Monte-Carlo simulations. The idea is that if we know percentage chances for an individual trial (say Ohio State beats Penn State 85% of the time), we simulate that trial and the other 449 games a large number of times, and calculate who actually made the top 4. This is much simpler because you only have to compute 450 * 1,000 = 450,000 = 4.5 * 10⁵ computations, and that can be done in a matter of minutes.

So, the new product is on playoffPredictor.com. After week 6 I like how my calculated data matches with reasonable expectations. We have Ohio State and Clemson most likely to reach the cfp at ~65% (because the ease of their championship games), followed by Alabama and Georgia at ~50% and 40% respectively. Same boring top 4. The challengers around the ~25% mark are Mississippi, TCU, and USC, and then 13 teams under 20%, including Syracuse at 14%. Could Syracuse make it? Sure – imagine this scenario:

Tennessee beats Alabama twice (week 7 and SEC championship
Tennessee, Florida, and Mississippi State beat Georgia (weeks 9, 10 & 11)
Texas Tech and Baylor beat TCU (weeks 10 & 12)
UCLA and Oregon beat USC (week 12 and Pac12 championship)

You could end up with undefeated Syracuse, Ohio State, and Tennessee coupled with winners of the Big12 and Pac12 at 2-3 losses each, and a 2-loss Alabama. In that (unlikely) scenario, you put in Tennessee, Ohio State, Alabama, and Syracuse. And variations of that are exactly what the computer came up with 140 times out of the 1,000 simulated seasons.

Enjoy the new tool, compare it with ESPNs probabilities through the season, and drop me a line at @CiscoNeville if you have any thoughts on this new visualization.

playoffPredictor.com

Elo predictions for college football – base and divisor

September 9, 2022 Neville Aga Leave a comment

Ever thought about chess ratings? The highest chess rating ever for a human is held by Magnus Carlsen at 2882. The lowest is shared by many people at 100. The United States Chess Federation initially aimed for an average club player to have a rating of 1500. There is no theoretical highest or lowest possible Elo rating (the best computers are at ~3500 and the negative ratings are theoretically possible, but those people will get kicked from tournaments, so they arbitrarily set 100 as a lowest possible rating).

This particular range of numbers from 100 – 3500 is a consequence of the base and divisor that Arpad Elo chose. The Elo rating system for chess uses a base of 10 and a divisor of 400. Why? According to wikipedia the 400 was chosen because Elo wanted a 200 rating point difference to mean that the stronger player would win about 75% of the time, and people assume that he used 10 as a base because we live in a base 10 world. Interestingly if he would have used base 9 then 200 points would exactly be a 75% chance of win, where base 10 is more like 76% :

1 /(1+9^(200/400)) = .25

1 /(1+10^(200/400)) ≈ .2402

But enough on Chess. I like the playoffPredictor mathematical formula that puts the teams between roughly 0 and 1 instead of 100 and 3500. But what base and divisor do I use with that system?

Last year I went with base equal to the week number (1-17) and a divisor of .4. The reason for the week number is because bases 2-3-4 don’t exponent out to crazy scenarios, so in week 2 when there is very little data it does not make such drastic judgements. The divisor I picked empirically, it seemed to fit the data and it was also a callout to Elo’s 400.

Really though I have been doing some fiddling and I think a base of 1000 and a divisor of 1 will work better. Here are some spreadsheet results.

Elo probabilities for different bases with constant divisor of 1

To read the above table, look at the row for 1000. This reads so if a team is better by .2 (like computer ratings of .9 and .7 for the 2 teams) then the .9 team as a 79.9% chance of winning. I need to model these against real life, but I feel this is a start.

Further for choosing a base and divisor the following are all equivalent pairs:

So 1000 and 1 give the same result as 10000 and 1.3333, or 100 and .6666. In the same way 1000 and 3 give the same result as 100 and 2, or 10 and 1. Log math.

So I’m going to move to a base of 1000 and a divisor of 1. The divisor of 1 make sense — take that out of the equation. Then, what base should you use? Empirically 1000 seems to fit well. I need to backtest with data from week 13-14 when ratings are pretty established to see how the percentages mesh with Vegas. To do.

At a base of 1000 and divisor of 1, a team with a rating +.16 more than an opponent will have a 75% chance of winning. So in this sense +.16 corresponds to 200 points in chess Elo.

In my rating the teams will be normally distributed with a mean at 0.5 and a standard deviation around 0.25. Meaning teams that are separated by 1 standard deviation, the better team has a 85% chance at success. For example, the following teams are all about 1 sigma apart:

#1 Georgia (~1)
#20 BYU (~.75)
#70 Illinois (~.5)
#108 Tulane (~.25)
#129 1AA (FCS) (~0)

So Georgia has a 85% chance of beating BYU, BYU has a 85% chance of beating Illinois, Illinois has an 85% chance of beating Tulane, and Tulane has a 85% chance of beating a FCS school. Is that right? Not sure

Keeping with the logic, Georgia would have a 97% chance of beating Illinois [1/(1+1000^(-.5))]. BYU would have a 97% chance of beating Tulane. Are those right? I think so. Would need to check against Vegas.

tldr; PlayoffPredictor.com used to use week number and .4 for the base and divisor, but now uses 1000 for the base and 1 for the divisor.

Uncategorized

Just how fast was Secretariat?

April 11, 2022 Neville Aga Leave a comment

I’m a fan of Secretariat. I’m not sure why, but a lot of people are fascinated by this horse. I think that when you see greatness – something that is just clearly apart from all others, it just brings emotions out. Even Jack Nicklaus cried watching Secretariat win the Belmont in 1973, that should tell you something.

You can google quite a lot about how fast Secretariat was (37.7 mph / 2:24 flat for the Belmont), or even how big his heart was (22 pounds, when the average horse heart is about 9 pounds, and the next biggest horse heart on record is ~15 pounds), but those numbers, especially the speed numbers, are clinical. They don’t give the context to let you appreciate. Enter statistics:

It is a very easy statistical problem to look at all the Belmont winner times since 1925 (ever since the track was at its current 1.5 mile length). Secretariat is the record holder at 2 minutes and 24 seconds flat. The next closest horse is 2 minutes and 26 seconds flat. There are about 90 horses between 2:26 to 2:33. Here is the list:

 YEAR	HORSE	         time (seconds)	Z score	percentage
1973	Secretariat *	        144.00	-3.01	99.870%
1992	A.P. Indy	        146.00	-1.83	96.674%
1989	Easy Goer	        146.00	-1.83	96.674%
2001	Point Given	        146.40	-1.60	94.515%
1988	Risen Star	        146.40	-1.60	94.515%
1957	Gallant Man	        146.60	-1.48	93.080%
2015	American Pharoah *	146.70	-1.42	92.263%
1994	Tabasco Cat	        146.80	-1.36	91.373%
1978	Affirmed *	        146.80	-1.36	91.373%
1985	Creme Fraiche	        147.00	-1.25	89.370%
2021	Essential Quality	147.10	-1.19	88.250%
1990	Go And Go	        147.20	-1.13	87.049%
1984	Swale	                147.20	-1.13	87.049%
1968	Stage Door Johnny	147.20	-1.13	87.049%
2004	Birdstone	        147.40	-1.01	84.400%
2009	Summer Bird	        147.50	-0.95	82.950%
1999	Lemon Drop Kid	        147.80	-0.78	78.102%
1983	Caveat	               147.80	-0.78	78.102%
2006	Jazil	        	147.90	-0.72	76.325%
1991	Hansel	        	148.00	-0.66	74.472%
1972	Riva Ridge	       	148.00	-0.66	74.472%
2018	Justify *		148.20	-0.54	70.549%
2003	Empire Maker		148.20	-0.54	70.549%
1987	Bet Twice		148.20	-0.54	70.549%
1982	Conquistador Cielo	148.20	-0.54	70.549%
1948	Citation *		148.20	-0.54	70.549%
1943	Count Fleet *		148.20	-0.54	70.549%
1975	Avatar			148.20	-0.54	70.549%
2019	Sir Winston		148.30	-0.48	68.489%
1965	Hail To All		148.40	-0.42	66.370%
1964	Quadrangle		148.40	-0.42	66.370%
1959	Sword Dancer		148.40	-0.42	66.370%
2016	Creator	        	148.50	-0.36	64.197%
2014	Tonalist		148.50	-0.36	64.197%
2005	Afleet Alex		148.60	-0.30	61.977%
1979	Coastal	        	148.60	-0.30	61.977%
1953	Native Dancer		148.60	-0.30	61.977%
1950	Middleground		148.60	-0.30	61.977%
1937	War Admiral *		148.60	-0.30	61.977%
2007	Rags to Riches (f)	148.70	-0.25	59.717%
1997	Touch Gold		148.80	-0.19	57.424%
1996	Editor's Note		148.80	-0.19	57.424%
1969	Arts And Letters	148.80	-0.19	57.424%
1967	Damascus		148.80	-0.19	57.424%
1962	Jaipur	        	148.80	-0.19	57.424%
1998	Victory Gallop		149.00	-0.07	52.770%
1981	Summing	        	149.00	-0.07	52.770%
1976	Bold Forbes		149.00	-0.07	52.770%
1955	Nashua	        	149.00	-0.07	52.770%
1951	Counterpoint		149.00	-0.07	52.770%
1974	Little Current		149.20	0.05	48.078%
1961	Sherluck		149.20	0.05	48.078%
1942	Shut Out		149.20	0.05	48.078%
1934	Peace Chance		149.20	0.05	48.078%
1947	Phalanx	        	149.40	0.17	43.412%
1938	Pasteurized		149.40	0.17	43.412%
2002	Sarava	        	149.60	0.28	38.836%
1977	Seattle Slew *		149.60	0.28	38.836%
1966	Amberoid		149.60	0.28	38.836%
1960	Celtic Ash		149.60	0.28	38.836%
1940	Bimelech		149.60	0.28	38.836%
1939	Johnstown		149.60	0.28	38.836%
1931	Twenty Grand		149.60	0.28	38.836%
2008	Da' Tara		149.70	0.34	36.601%
1993	Colonial Affair		149.80	0.40	34.411%
1986	Danzig Connection	149.80	0.40	34.411%
1980	Temperence Hill		149.80	0.40	34.411%
1956	Needles	        	149.80	0.40	34.411%
2017	Tapwrit	        	150.00	0.52	30.189%
1936	Granville		150.00	0.52	30.189%
1963	Chateaugay		150.20	0.64	26.217%
1958	Cavan	        	150.20	0.64	26.217%
1952	One Count		150.20	0.64	26.217%
1949	Capot	        	150.20	0.64	26.217%
1945	Pavot	        	150.20	0.64	26.217%
2012	Union Rags		150.40	0.75	22.532%
1971	Pass Catcher		150.40	0.75	22.532%
1935	Omaha *	        	150.60	0.87	19.159%
2013	Palace Malice		150.70	0.93	17.595%
1954	High Gun		150.80	0.99	16.115%
1946	Assault *		150.80	0.99	16.115%
2011	Ruler On Ice		150.90	1.05	14.718%
2000	Commendable		151.00	1.11	13.405%
1941	Whirlaway *		151.00	1.11	13.405%
2010	Drosselmeyer		151.60	1.46	7.207%
1930	Gallant Fox *		151.60	1.46	7.207%
1995	Thunder Gulch		152.00	1.70	4.495%
1944	Bounding Home		152.20	1.81	3.487%
1926	Crusader		152.20	1.81	3.487%
1927	Chance Shot		152.40	1.93	2.672%
1933	Hurryoff		152.60	2.05	2.023%
1932	Faireno	        	152.80	2.17	1.513%
1929	Blue Larkspur		152.80	2.17	1.513%
1928	Vito	        	153.20	2.40	0.815%
1970	High Echelon		154.00		(mud)

Its trivial in Excel to compute a mean of this data set (149.12 seconds) and a standard deviation (sample) of 1.699 seconds. From there you can see a Z score of each winner. I left out 1970 as the track was filled with mud (you can see that race here) . Leaving 1970 out moves Secretariat from a -2.93 to a -3.01, a true -3 Z score event. How rare is that? basic statistics says 99.7% of all data is between -3<Z<+3. So there is .3%, spread .15% in each tail — or that Secretariat happens less than .15% of the time. 99.87% of all Belmont winners will be slower. Put that in perspective with days: 1/0.13% is 770 — or it will take, on average, 770 years for a horse to eclipse Secretariat

Now this data is not perfect, normally you need 200 data points to have a good sample (What Carter Worth taught me). However, it is quite good. I’m sure we can bring in the 2nd and 3rd place finishers to get ~300 data points and still have about the same mean and standard deviation, but I’ll leave that exercise for someone else. Note this data is normally distributed, period. The central limit theorem states that no matter how horserace speeds are distributed, when I pull samples those are normally distributed.

For comparison here is how the top 45 finishers fare – note the 2:26 horses are a 1 in 30 year event. We will see 3 of those in our lifetime. But unless you are sticking around for the year 2750, you are not going to see Secretariat’s record taken down.

Neville Aga's Blog!

The committee lies

Final CFB playoff predictions 2023

Betting on the CFB 2023 season

My SE internal links at Cisco

Generate estimates, see quotes

Connect with product management

Post-sales

Compensation

Numbers

Installed base reporting

HR links

BGP / DNS resources

Cisco tools

Public Sector tools

Training

Cloud portals

Demo licenses and hardware

Other

Demo

Data

Internal Licenses

Internal Software

Best college football team of the cfp era

Probabilities of a 7-game series

My favorite song

Playoff probabilities via Monte-Carlo

Elo predictions for college football – base and divisor

Just how fast was Secretariat?

Deep thoughts that wont fit in a tweet