Data

This site is built from data on the closing lines available at https://www.covers.com/. I scraped their data using python/BeautifulSoup, processed it somewhat in python, then processed further and formatted using PHP/MySQL.

NOTE: The current default setting is for JUST regular season and conference tournament (i.e. NOT March Madness) data. The presence of Cinderella teams at the top of some of these rankings is much more interesting than it might otherwise be! I do have the March Madness data, and will include it eventually as an option.

The specific code I used for scraping and then the website code is availble at https://github.com/oreagan/respectindex .

Basic Descriptive Statistics

Is this data any good? Let's check how it looks overall, in terms of how often teams beat the spread:


This looks basically like we might expect. For 2017-18, two standard deviations from the mean is a little under 10 (~9.8) wins or losses vs. the spread.

9 teams (~3%) had 10 or more beats, 10 (~3%) lost 10+ net times to the spread.

Since 2011-2012, overall stats (with averages) include:

Games played w/ lines28,352
Home team beat the spread13,793
Home team lost to the spread13,913
Pushes646
Losses if bet $110 on every game$151,130 ($21,590 per year)
Games both won & beat spread2,990 (38%)
Teams beating spread more than not1051 (43.4%) (150 per year)
Teams losing to spread more than not1083 (44.7%) (154 per year)

Stand-outs

All Time Best Against the Spread: The 2013-14 Wichita State Shockers, who also set any number of records with their 31-0 season. They beat the spread an astonishing 18 times in the regular season.

All Time Worst Against the Spread: the 2011-2012 Louisiana-Lafayette Ragin Cajuns, who managed an astonishing 16 losses vs. the spread in the regular season. They went 16-16 in the season, 10-6 in conference, so they weren't awful - but they were consistently worse than expected.

Statistical Significance

Is this just for fun, or does this metric actually tell us anything about NCAA tournament success? Surprisingly, yes, percentage of net wins vs. the spread is correlated with tournament success in a statistically significant way!

If what we care about is NCAA tournament success (and we shouldn't care about that exclusively for team success!), then a relevant metric is the number of games you qualified to play. If you get into the tournament, you qualify for 1 game; if you win, 2 games; if you win the tournament, 7 games (though of course you're the only team left and won't play game 7). We'll call this desired result games_played

Not all games get lines, meaning not all teams get equal chances to beat the spread. We're interested in ability to beat the spread, not how high-profile teams are (and thus how often there's money actually at stake). Thus, we care about the percentage of wins vs. the spread. We'll call this variable perc_beat

Our simple question, then, is this: is the percentage of times beating the spread pre-tournament correlated with NCAA tournament games played/won?

We're looking at count varibales (you can only win a full game, not a percentage of one), so let's use a negative binomial regression with a simple model games_played = perc_beat.

YearEstimateChiSqProbChiSqStdErrLowerWaldCLUpperWaldCL
Overall 2.6123 263.29 <.0001 0.1610 2.2968 2.9279
2012 1.6707 21.47 <.0001 0.3606 0.9640 2.3774
2013 2.4975 36.80 <.0001 0.4117 1.6905 3.3044
2014 3.7635 57.38 <.0001 0.4968 2.7897 4.7372
2015 3.6797 54.06 <.0001 0.5005 2.6988 4.6606
2016 1.9941 26.78 <.0001 0.3853 1.2389 2.7493
2017 2.5976 39.86 <.0001 0.4115 1.7912 3.4041
2018 3.1457 47.86 <.0001 0.4547 2.2545 4.0370

So, how do we interpret these results? Obviously beating the spread doesn't cause tournament wins. However, it does seem like there is some combination of factors that have a meaningful effect on both how often you beat the spread, and how often you win in the NCAA tournament. The effect is small, but you would expect it to be, if only because luck is such a large factor in a single-elimination tournament.

What are these things being captured by the Disrespect Index? Probably lots of intangibles: pluck, determination, being frustrated by being underestimated, the effect of media narratives and conference/team prejudices keeping fans (and gamblers) from re-evaluating teams more objectively.