Coders Bracket 2015
7A year ago, I filled out my March Madness bracket using code: https://mediocre.com/forum/topics/code-bracket--because-science
The algorithm was based on the statistical winning percentage over the past 29 years of each seed broken down by round:
Seemed like a good idea in theory, but the results were just awful. I basically finished last in just about every pool I entered.
I still think the approach is valid, so I'm sticking with it this year but I'm making a small tweak that also takes into account the team's winning percentage (adjusted for strength of schedule using the Rating Percentage Index). Here's an example:
In the first round, 5th seeded Arkansas is playing 12th seeded Wofford. We know that in the first round the 5th seed wins 68% of the time. We also know that Arkansas has a winning percentage of 76.47% and an RPI of 0.6123. Wofford has a winning percentage of 81.25% and an RPI of 0.5744.
We calculate Arkansas' percentage chance of winning at:
68 + (((0.7647 * 0.6123) - (0.8125 * 0.5744)) * 100) = 68.152581%
Let's see if this improves the results this year. Generated bracket and code available here: https://www.codersbracket.com/code_bracket/5506e5e468f310590410eca4
If you think you can code up a better bracket than me, you should probably apply for one of our jobs: https://meh.com/jobs
- 7 comments, 4 replies
- Comment
Never mind that, it's going to be UNF Ospreys! SWOOP!
so does the tournament selection process produce a consitent-enough result set? it's an interesting question. rpi is definitely a good add.
By strict definition your small tweak can has a theoretical max possible adjustment of +-1% and that scenario is imposible because it would require one team to have a 100% winning percentage and .1000 RPI, and the other team to have a zero in either of the two (and somehow still make it to tournament).
But extreme cases show us the flaws, your small tweak would only give a 1% bonus to an unbeaten team (who only played opponents with unbeaten records too get get a perfect RPI) playing against a team that hasn't won a game all season but got into tournament on some strange technicality....
I like do like the approach however. I would make a couple changes to the math. I would change the Tweak % to be an estimate of a team winning based on the two teams win% and RPI such that
Tweak%=(TeamAWin%*TeamARPI)/(TeamAWin%*TeamARPI + TeamBWin%TeamBRPI)
In your case above:
TweakTeamA%=(.7647.6123)/(.7647*.6123+.5744*4667)=50.08%
Giving a final solution of
TeamAOddsOfWinning = (SeedWin%.5)+(TweakTeamA.5)
Which yields a winning % of 59.04% the 5th seed...
Still not 100% happy with that math, will have to think about it a little more.
@fishzine if you take the Kentucky vs Manhattan matchup:
Kentucky has winning percentage of 100% and an RPI of 0.678. Manhattan has a winning percentage of 59.38% and an RPI of 0.5038. So the bonus would be:
(((1 * 0.678) - (0.5938 * 0.5038)) * 100) = 37.884356
So it works out in many cases to adjust things more than the +/- 1% you were suggesting although I did find myself wishing the bonus was amplified a bit more when the teams were more equally matched.
You should totally make a code bracket so we can compare results.
@shawn this is awesome, but fatally flawed. Your constant is the seed placement, but humans place those teams in those spots, and there is a massive amount of subjectivity to seeding. Is there a way to modify the algorithm to account for these stats independent of seed? possibly by schedule? disclaimer: I manage a team of business analysts, so I'm completely comfortable making declarations regarding how something should work without having any fucking idea whether it's going to work or not.
@marklog this, basically. it's a model based on another model. there's a certain amount of consistency (e.g. 1-seeds never lose to 16-seeds), but that gets muddy real quick.
@marklog makes an elegant statement self-awareness. I've been on both sides of that fence and you are rare. Well done.: "disclaimer: I manage a team of business analysts, so I'm completely comfortable making declarations regarding how something should work without having any fucking idea whether it's going to work or not."
@marklog I think I know where you work 'cause you just described our analysts!
What's annoying is 3 of the 4 leaders currently just used some example functions, and their random numbers just happened to work out so that they're in the lead.
I had a taco for dinner. That is all.
Um believe it was God Richard that said for a successful technology public relations cannot be substituted for facts. And sadly the facts of the matter are your code idea is just another dogmatic religion much like Libertarianism only stupider.