Predicting QB Success in the NFL

Last year I wrote and submitted a paper for the MIT Sloan Sports Analytics Conference. While my abstract was accepted my paper was not. The title of my paper was Reducing Risk in the NFL Draft: Using Machine Learning Algorithms to Predict Success in the NFL. You can read the full paper here

In it I describe a decision tree model that predicts a college QBs success in the NFL. To train the model I used over 40 variables including college stats, school competitiveness, combine performance, and text mining of pro scouting reports. Ultimately, the final model used 4 variables: college win %, body mass index (BMI), college games started per season, and age. The final model was 88% accurate in predicting whether a college player would be a success or a bust in the NFL. This model can be used to predict whether the top prospects in this year's draft will be successful in the NFL.

Below is an interactive version of that final QB model.

NFL Combine & Triangle/Ternary Plot

Triangle/Ternary plots are a good way of displaying the relative positioning of points across three variables. This can be used to cluster and classify points based on these three variables. After the break I have outlined a way of creating an interactive version of a triangle/ternary plot in Tableau.

Here is an example of a triangle/ternary plot on the performance of collegiate players at the 2017 NFL combine across three variables: size (BMI), speed (40 time) and strength (bench press).


Spark Bar Chart

A spark bar chart, at least that is what I am calling it for now, combines a sparkline and a bar chart into one chart. The length of the bar is a value corresponding to the end of the sparkline which represents the last period or current value. In the example below the bar represents sales in December 2015 and the sparkline is sales by month for the last 4 years.

Art & Political Entrenchment

I recently visited the Phillips Collection gallery here in DC and saw the work of one of my favorite artists: Camille Pissarro. In one of his paintings, The Seine Valley at Les Damps, he uses an impasto technique in the clouds with bold, hatch brushstrokes. I wanted to try re-create this hatch effect in a viz.

This viz shows how every state voted in the Presidential election since 1964. Each mark is a state where the angle is the degree to which they voted democrat (left) or republican (right). The sharper the angle the more heavily they voted for one party. The thickness of the mark is how many people voted and the color is which party won the state. I didn't quite achieve the effect I wanted but am happy with the result nonetheless. See findings below.

As you can see party shifts were much more common in the past. In 1964, 45 states voted for Johnson (Democrat) and in 1972, 49 states voted for Nixon (Republican). However, since 2000 party shifts have been increasingly less likely. In the past five elections only six states have voted with either party more than once: Colorado, Florida, Iowa, Nevada, Ohio, and Virginia. Seemingly, political division and entrenchment are up.

Fan Gauge

The second biggest disaster on election night might have been the New York Time's jittery gauge for their live presidential forecast. Some people took issue with the random jitter effect used to display uncertainty even calling it "irresponsible". Gregor Aisch, who works at the NYT and co-created the viz, explained their rationale. I appreciate their desire to explain uncertainty and generally love the work of the NYT Graphics Department but agree the randomness effect was confusing.

Displaying uncertainty is tricky but sometimes very important to a data visualization. Below is my proposed alternative: a fan gauge. This approach is like a typical gauge in that it displays a single point relative to a range of points or targets. The addition is the uncertainty displayed by the "fan" around the single point. The angle of the fan is the degree of uncertainty. The fan is like the jitter effect but static and easier to interpret. 




Update: Here is another version in a bullet, non-gauge format. This has a better data-to-ink ratio but still display a point relative to a range of targets as well as the degree of uncertainty. The point is the circle and the length of the "pill" is the degree of uncertainty. Let me know what you think?

div#ContactForm1 { display: none !important; }