I wanted to start by discussing an article that came out a couple months ago in the Orthopedic Journal of Sports Medicine entitled “Relationship Between Pitching a Complete Game and Spending Time on the Disabled List for Major League Baseball Pitchers.”
tl;dr: This article makes a claim that is completely unsupported by its analysis. Please don’t use it. Read on below for a deeper dive into (some of) the big reasons why.
Before I get into that, though, I just want to say that I’m only writing about this article in particular because it happened to slide across my Twitter timeline on a morning when I had some spare time and motivation to write about it. I do think the errors here are particularly egregious, but the same types of mistakes are extremely common in a lot of sports injury papers. I hate to feel like I’m singling out just one research group, so please understand that my criticisms will likely apply to a lot of what you read, not just these folks.
OK, onward!
Research Question: The authors sought to “to determine the relationship between pitching a CG and time on the DL”. Reading into this a bit, they suspect that pitching a complete game puts greater stress on/creates more fatigue for pitchers, leaving them at higher risk for future injury.
There’s been a lot of research on how many and what kinds of pitches can be “safely” thrown, but a “complete game” (CG) is an odd proxy marker to use. Why would a CG in and of itself be dangerous? Wouldn’t the number of pitches or some measure of cumulative stress be better?
But let’s set that aside and say this is a fine question to investigate. The study is still not good.
Conclusions: In the abstract (which is all most people read and what most news reports on articles will be based off of), the authors focus on one main result: “Overall, 74% of pitchers who threw a CG spent time on the DL, as compared with 20% of [matched] controls.”
Holy smokes, Batman! This statement implies that pitchers who throw a CG are nearly 4 times as likely to end up on the DL as matched controls (otherwise similar pitchers who did not toss a CG). That’s a huge effect…if true. Unfortunately, nothing in their paper supports this conclusion. At all.
Let’s dig in to the meat of the paper (Methods and Results) and see where these numbers came from.
The problems:
1. Who was included in the study? This is rather hard to dig out of the Methods (it shouldn’t be), but as near as I can tell there were 1,016 pitcher-seasons included in the study. Of these, 501 were for 246 individual pitchers who threw a CG at some point from 2010-2016 (which I’ll term the “exposed” group of pitcher-seasons) and then, presumably, 515 from [an unknown number of] pitchers who did not throw a CG at any point from 2010-2016 (which the authors term “controls” but I’ll term “unexposed”). This is already a weird breakdown that leaves several elementary questions unaddressed, among them:
-Were only starters included in the non-CG group?
-How many non-CG pitchers were used?
2. “74% of pitchers who threw a CG spent time on the DL”: Where does this come from? Of the 501 pitcher-seasons from pitchers who pitched a CG at some point from 2010-16, 370 (74%) included time on the DL. This is very different from the statement provided in the abstract (and also, frankly, seems very high, but I’m not going to fact-check this right now).
3. The matched controls: The authors state they created a comparison group of pitchers who never threw a CG in their careers and matched them to pitchers who did and ended up on the DL in the same year they threw a CG (which they term the “CG/DL group”). Hold that thought. We are never told how many pitchers are in either group – just how many pitcher-seasons are in the CG/DL group (115).These pitchers were matched on age (cool), “year” (presumably the years they pitched, though this is never stated), and…IP during the “index season” (for the CG/DL pitchers, the year [or years? never stated] they threw a CG and went on the DL).
So. You’re selecting control pitchers who threw a comparable number of innings without getting a complete game. Which means they made up the innings somewhere else. Where’s the most likely place to make up that ground? How about not going on the DL?
This is selecting on the dependent variable. You’re choosing a control group that was by definition less likely to be hurt and then trying to turn around and use that as evidence that CGs are risky. Bad.
They also stated that they were able to create a matched control group for the CG/DL pitchers, but not for any pitcher who threw a CG because they couldn’t find appropriate matches. This is further evidence for selecting on the dependent variable: they could only find matches on IP for CG pitchers who also went on the DL. Without a DL trip, there were no matches.
Oh, and the matching on the IP wasn’t even good! Controls pitched on average 38 fewer innings per year (136 vs. 174). So why even claim you matched? You didn’t do so successfully, anyway.
4. “Overall, 74% of pitchers who threw a CG spent time on the DL, as compared with 20% of [matched] controls”: This statement, which is repeated over and over as the article’s primary conclusions, is a false comparison totally unsupported by the article’s analysis and text. First off, the 20% figure is for a single season only (the “index season” described above). In fact 50%, not 20%, of controls spent time on the DL at some point during the study period.
Alright, so the 74% vs. 20% is already bust-o. But we’re not done yet.
The authors also never matched their controls to the full CG group. They matched them to the CG/DL group, where by definition 100% of the pitcher-seasons involved time on the DL. This is textbook selection on the dependent variable and makes any comparison with the CG/DL group worthless. 100% vs. 20% or 50% means nothing here.
But let’s assume the controls are somehow a decent comparison for the full group of CG pitchers. We’re told that at some point throughout the study period 50% of control pitchers spent time on the DL. What is that corresponding figure for the CG pitchers? Well, conveniently, we’re never given it. We’re only told 74% of pitcher-seasons involved time on the DL.
So the authors are trying to compare the % of control pitchers who spent time on the DL in a single season with the % of CG pitcher-seasons over 7 years involving time on the DL. And we don’t have the % of pitcher-seasons for the controls or the % of pitchers for the CG group. Oh, and the two groups aren’t actually matched and we have no idea whether they’re comparable (in fact, we have evidence from the IP figures that they are not). Am I missing anything?
Summary: This paper is a mess. It’s unclear, incomplete, lacks basic details, makes invalid comparisons left and right, and appears to repeatedly select on the dependent variable. If I had reviewed this paper I would have never recommended it be published in its current state. I urge you to disregard its findings. It provides zero evidence for CGs in and of themselves being dangerous. I am genuinely alarmed this was published in OJSM.
]]>So here are the four simple but broad questions that I ask with any new technology, in the order in which I ask them. I’ll use a single example of a hypothetical portable tool that measures forces acting on the shoulder joint of baseball pitchers to illustrate my four questions.
1. “Does this technology measure anything?” This can be roughly mapped to the concept of reliability in psychometrics. Is your new tool consistently measuring something or just giving you noise? Ways to check this include repeated testing on the same player (e.g. for neurocognitive tools where the player’s underlying cognitive skills shouldn’t change massively day-to-day) or having multiple players run the same drill while checking for consistent results (e.g. a wearable that purports to measure steps, jumps, or other movements).
If we have a device that purports to measure shoulder force, we can first check if it gives reliable readings for a series of fastballs of similar velocities from the same pitcher. If the answer is YES, then you can ask…
2. “Does this technology measure what you hope it’s measuring and/or what it purports to measure?” This roughly corresponds to statistical validity. If you have a tool measuring reaction time or trajectory tracking, how can you make sure it’s measuring that and not something else? Can you check recorded jump heights from accelerometers against external height measures? Does an “energy expended” measure exhibit within-player correlation with ratings of perceived exertion (RPE)?
Sticking with our hypothetical shoulder tool, we could ask if its data correlates well with a gold standard measurement of shoulder forces such as a full biomechanical analysis. If the answer is YES…
3. “Can what this technology measures translate potentially to on-field value (i.e. wins)?” This is probably the most complex question and can require a lot of time and patience to properly test. Given the timeframe in which teams operate it may not be possible to get a clear answer to this question, but we should still do our best. What it boils down to is this: Devices give you data, but getting insights from this data is a totally separate question. Getting actionable insights is even harder. Making sure those actionable insights translate to more winning is harder still.
Say our new portable force measurement tool reliably and correctly measures the forces acting on a pitcher’s shoulder joint. That’s great. Now what can that information do for us? Could we potentially alter a pitcher’s mechanics to make him less injury prone? Can we catch small biomechanical changes to identify small health issues before they turn into big ones? Can we watch the forces build up in real time to better predict pitcher fatigue and on-field effectiveness, which might change our bullpen usage? If the answer is YES to at least something…
4. “Is your organization positioned to translate this potential into wins?” This is arguably the most important question and probably where I’ve seen teams make the most mistakes. Investing in a technology is one thing, but having the internal champions to use the technology regularly and the administrative and player buy-in to translate your insights to actual on-field improvements is critical. You need a plan for how you’re going to deploy any new tool which includes: who’s going to be using it? Who’s going to be responsible for its regular and continued use? How will you translate its data into communicable and actionable insights? Is your organization going to be willing to use these insights?
Sticking with our shoulder tool, let’s say the tool, remarkably, is reliable and accurate. Let’s say even more remarkably you’ve managed to develop a plan to use insights from the data to actually make your team win more. Unfortunately, that device still has no value for you if everyone involved in its success – in our case maybe the manager, pitching coach, pitchers, trainers, S&C staff, and analytics team – isn’t committed to making it happen.
These questions are not specific to the shoulder tool that I outlined. You can ask them of any sports performance technology. For example, a GPS chip: 1. Does it accurately measure and calculate a guy’s straight-line speed? Does it correctly identify sprints, decelerations, and directional changes? 2. Can you use the pre-calculated measures it spits out or the raw data underlying them to calculate a relevant workload or skill metric in your sport? 3. Can those metrics translate to on-field success? For example, can workload be modified to improve fitness and minimize fatigue and injuries? Or can you better identify players with the instincts to be where they need to be on the field above and beyond traditional scouting? 4. Will your organization actually do that?
Another example, cognitive testing tool: 1. Does it yield similar scores day after day in the same guy, or is it so sensitive to a hangover or how much sleep a guy got you can’t trust it? 2. If it purports to measure, say, reaction time, does it measure that in a way that’s relevant for your sport? 3. Could you use it to make decisions about which guys to draft or sign, or could you identify a weakness and design drills to improve a player’s skills? 4. Will you?
And if you ever need any help asking or answering these questions, I’m always happy to talk.
]]>As I’m working on my dissertation literature review I figured I’d put a small piece of it to good use for someone other than my committee. 4 people would normally read this, so I’m really hoping we can double that.
There is a wide-ranging belief that the NFL has the highest injury rates of the major North American sports, but by how much? Because game injury rates (rather than injury rates in practices) tend to be easier to calculate (and more available across studies) than those incorporating practices, we’ll focus exclusively on those.
We’ll need to introduce one concept to make the analyses below make sense: the idea of an “athlete-exposure” (AE). This is simply defined as 1 athlete participating in a practice or, in the case of this article, a game. Thus a single NFL game where every available player plays in the game would count for 92 athlete exposures – the 46 guys on the active game day roster on each of the two teams. All the injury rates below are presented per 1,000 AEs.
The studies compared below all have differences in how exactly injuries are defined (namely how much practice/competition time a player has to miss for something to count as an “injury”) and the years they covered, but in an effort to ensure high-quality data I’ve limited the studies to official league injury surveillance systems wherever possible.
Among the “Big 4” North American sports, the NFL has by far the highest game injury rate at 75.4 per 1,000 AEs. This means for every 1,000 players playing in a single game, 75.4 will suffer some sort of injury. So in a single game with 92 AEs, you might expect about 75.4 x 92/1,000 = ~7 injuries.
Compared to the other Big 4 North American sports, the NFL thus has roughly 4-5 times the in-game injury rate of the NBA, MLB, and NHL.
What about more international sports (MLS statistics weren’t readily available, so soccer goes here, stop complaining)? The NFL also has roughly 4 times the in-game injury rate of soccer and, interestingly, Australian rules football.
The only sport that comes close to touching the in-game injury rate of the NFL is rugby. The data are variable, but in general rugby seems to have either a similar or higher injury rate to the NFL. I wanted to look into concussions specifically in the NFL vs. rugby, but I’ve found some conflicting data and I want to do more research on this before saying anything.
I’ll spare you the gory details of the data sources unless you want to click below the jump, where I’ve included everything you should need to evaluate these on your own if you want!
I purposefully simplified or ignored quite a few issues in writing this post to try to communicate my main point, but some good questions from friends have come up and I want to briefly address a couple shortcomings of this analysis:
In a nutshell: this is just one way of measuring which sport is more “dangerous,” and it’s probably the way that casts the NFL in the most negative light. Injury rates per player-hour or 1-season risks would likely make football look less dramatically bad.
The best available estimate of overall NFL injuries comes from a 2017 report from Harvard Law School that used NFL Injury Surveillance System (ISS) data. In 2014-15 (the most recent 2 years available for all injuries) there were 3,553 injuries in regular season games and 737 in regular season practices (82.8% of injuries in games). Assuming 92 athlete exposures per game (46-man game-day roster x 2) and 256 games per regular season, this translates to 3,553/(256 x 2 x 92) x 1,000 = 75.4 injuries per 1,000 AEs in games in the NFL. There were 737 injuries in regular season practices over the same time period. If we assume 4 practices per week, 61 athlete exposures per practice (53-man active roster + 8-man practice squad), 32 teams, 17 weeks, and 2 seasons, this translates to 737/(61 x 32 x 17 x 4 x 2) = 2.8 injuries per 1,000 AEs in regular season practices. The combined game and practice regular season injury rate would then be 13.7 per 1,000 AEs.
What about the other of the “Big 4” North American sports? For MLB, the Harvard report began with data from MLB’s Healthy Injury and Tracking System (HITS) that showed 2,988 injuries in the 2011-12 seasons across spring training, the regular season, and postseason. Since ~71.7% of those games would have been in the regular season, we can assume 2,142 regular season injuries. Other data reveal 138,085 regular season player-games in 2011-12 , translating to a game injury rate of 15.5 per 1,000 AEs.
In the NBA, Drakos et al used data from the NBA Trainers Association database to estimate a rate of 19.1 injuries per 1,000 AEs in regular season games from the 1988-89 through 2004-05 seasons . Due to the age of this data and the fact that it did not come from a modern EMR system, this is likely something of an underestimate of the true NBA injury rate.
In the NHL, McKay et al used data from the Athlete Health and Management System (AHMS) to estimate a regular season game injury rate of 15.6 per 1,000 AEs from the 2006-07 to 2011-12 seasons.
For soccer, MLS data is unfortunately limited and old, but better data is available from the Union of European Football Associations (UEFA). From 2001-2008, Ekstrand et al estimated 27.5 injuries per 1,000 hours of match play. At 90 minutes per game for most players, this translates to approximately 27.5 / 1.5 hours = 18.3 injuries per 1,000 AEs in UEFA games. (More recent data from the 2014 UEFA Report on injuries reported 23.2 injuries per 1,000 hours of match play in 2013-14, translating to 15.5 injuries per 1,000 AEs).
For Australian Rules Football, Orchard et al estimated a game injury rate of 25.7 per 1,000 hours from 1997-2000. At 80 minutes per game, this translates to approximately 25.7 / 1.33 hours = 19.7 per 1,000 AEs. Due to its age, this number may also, like the NBA’s, be a substantial underestimate.
One study of the highest level of English rugby (Brooks et al) found a game injury rate of 91.4 per 1,000 player-hours from 1,534 injuries in 420 matches from the 2002-03 to 2003-04 seasons . In this league there are 15 players per team with up to 8 replacements; if we assume the maximum played in each game (23 x 2 = 46), this would translate to 1,534 injuries / (420 matches x 46 players per match) = 79.4 injuries per 1,000 AEs in English Premiership rugby games. A systematic review of 15 men’s senior rugby studies found an overall game injury rate of 81 per 1,000 player-hours (95% CI 63-105). If we take the same ratio of AEs to player-hours as in the English Premiership study, this would translate to 70.4 injuries per 1,000 AEs in games (95% CI: 54.7-91.2). Additionally, a comparison of football and club rugby teams at Ohio State University from 2012-2014 found game injury rates of 23.4 per 1,000 AEs for football and 39.6 per 1,000 AEs for club rugby – though this football rate was lower than the 35-40 per 1,000 AEs reported in other college studies. In the end, it seems likely that at similar levels of competition rugby has game injury rates at least similar to, if not higher than, the NFL.
]]>Now that seemed like a huge drop to me. I did some quick calculations – Teddy Bridgewater (16 games) plus Tony Romo (10 games) plus Jay Cutler (11 games) alone gets us to 37 games missed by starting QBs in 2016. That’s already more than the number reported by King.
Epidemiologists spend a large chunk of our time just counting things. As it turns out, that’s not grade school math. It’s really, really hard, and a correct count relies on counting a.) the right things and b.) doing so in a consistent manner. So I wanted to go into my injury data, sourced from Pro-Football-Reference (pro-football-reference.com), and see if I could replicate the numbers reported by King (and, ostensibly, the NFL). Spoiler alert: I couldn’t really.
King’s column says the following about the numbers: “The NFL defines this statistical category as being games missed by the declared starting quarterback of a team. So even though, for example, Cody Kessler did not open 2016 as the starting quarterback, he was knocked out of two games that he started (concussions) and missed a total of four games because of them. Those count on this list.”
Well, “declared starting quarterback” is a little vague, but it seems clear that you don’t have to be the week 1 starter to count in these stats. I’m missing some information, but I’m going to now define what I think the goal of this statistic is:
A count of games missed due to injury by a quarterback who, but for their injury, would have started the game that they are counted as missing.
This leads me to count both in-season injuries to starters and certain pre-regular-season injuries to major QBs. Consider Tony Romo and Teddy Bridgewater this year.
Romo: Even though Romo never started a game this year despite being healthy late in the season, it’s pretty clear that he would have been the starter for Dallas after his recovery but for the amazing performance of Dak Prescott, so his games missed due to injury (i.e. the games he was unavailable/inactive for due to his back) should probably be counted.
Bridgewater: He was the clear starter for this year before he suffered a horrifying knee injury in training camp. You’ve got to count him as missing 16 games unless you’re explicitly just counting in-season injuries, which it’s not clear the NFL was.
Every week there have to be 32 starting QBs in the NFL (not counting byes). So consider the Vikings this year: we’re already dinging Bridgewater with 16 missed games, but other Vikings QBs, such as Sam Bradford, were still at risk of missing games due to injury every week. Meanwhile, a team like the Titans that only lost Mariota late in the year didn’t have those added chances for injuries. In epidemiologic terms, the Titans had less “time at risk” for accruing starting QB injuries.
In layman’s terms, there’s the potential for a sort of multiplier effect on games missed of big injuries like Bridgewater and Romo, especially when they occur before the season begins. That’s a caveat that we need to consider.
The best approach would be to standardize to an injury rate measure that accounts for this fact (e.g. missed games per 100 possible played games), but to maintain equality with the NFL’s simple counts I’m not going to do that here. Ultimately I think such an adjustment would have a relatively small effect, anyway: there are going to be 32 teams x 16 games per team = a minimum of 512 opportunities for a starting QB to get hurt, and we’re talking maybe 30-50 additional exposures from early season-ending injuries in an awful year.
Using PFR’s team injury report pages (e.g. here), I calculated the number of games missed for starting QBs from 2013-2016. Because definitions are important, here’s my definition of a starting QB:
“A starting QB is any quarterback who started at least one game in the season for which we’re counting injuries. Likely starters who did not start a game the entire season as a result of their injury (2014: Sam Bradford; 2016: Tony Romo and Teddy Bridgewater) were also included.” This definition has limitations (see below), but here’s what I found:
Year | # Games Missed by Starting QBs – NFLInjuryAnalytics | # Games Missed by Starting QBs – MMQB/NFL |
2013 | 90 | 76 |
2014 | 98 | 77 |
2015 | 89 | 59 |
2016 | 85 | 35 |
Using my definition of a starting QB, we don’t see the dramatic decline reported by MMQB and PFT. Indeed, 2016 in our data is not out of line with any of the previous 3 seasons. (Ed. Note: this post and data was updated on 3/31/2017 to reflect the correction of a coding error that incorrectly listed Alex Smith the TE as Alex Smith the QB. Be careful, kids!). 2014 was the worst season of the last 4, when early major injuries to Sam Bradford, Matt Cassel, Carson Palmer, and Nick Foles ran up the score.
That brings up another important point: say we even take the MMQB/NFL data at face value. A couple catastrophic injuries earlier in the season is all it takes to bring those numbers back in line with those from previous years. Imagine if Carr or Mariota’s freak injuries came in week 2 instead of week 16? How would these numbers look? That said, there’s a reasonable case to be made that increased QB safety delays as well as prevents QB injuries, so I don’t think this particular argument undermines the assertion that the QB protection rules are working.
Regardless, I don’t find this to be particularly compelling evidence for the rules’ effectiveness. For one thing, why would they just really kick in in 2016?
A far more likely explanation for the MMQB/NFL data are a combination of random chance and some exclusion criteria that conspired to artificially depress this year’s stats. I don’t have access to the raw data, but if I had to guess I’d say the NFL data probably excluded Bridgewater, Romo, and maybe RGIII and Geno Smith? That would drop my figure to 39, close to their figure of 35.
My numbers are substantially higher across-the-board than the NFL’s, though the difference is starker in 2015 and especially 2016. I’m clearly tracking a somewhat different group than the NFL is in this stat, with the largest discrepancies (percentage-wise) in the last two years.
Who’s tracking the “right” group? To be honest, it’s a thorny issue and I’m not willing to stand here and say my numbers are right and the ones reported in MMQB are wrong. The advantage of my approach is it uses clean, simple criteria that can be evenly applied across years.
However, my numbers clearly include some backups (SEE table below). I couldn’t come up with a satisfactory, consistent criterion for inclusion beyond “started at least one game during the year or was an obvious starter but for a pre-season injury,” so I reluctantly left them in with the exception of a few egregious long-term cases (e.g. Jacoby Brissett in 2016).
I am convinced that there’s some quirk in how the NFL tracks this stat that makes the 2016 drop look bigger than it is. They likely took a subtler, and perhaps a more judgement-dependent, approach than I did to classifying starting QBs. The advantage is you get to clear out the backups better, but are the criteria applied consistently overall? Do you accidentally catch some guys who should really be considered starters?
Regardless, there’s just no way that 35-game figure is accurate; it’s too low to pass even the weakest smell test. Something is off there.
I have major concerns about how “starting QB games missed due to injury” are counted in the MMQB and PFT reports; 35 games missed in 2016 is implausibly (impossibly?) low. I also don’t buy these numbers as strong evidence for the QB protection rules working for the reasons above as well as the unclear timing effect – why would they really kick in this year? That’s not to say they’re not working or I’m against the rules; I don’t know and I’m not, respectively.
But I am against extrapolating beyond what the data can tell us, and any differences between years here are more likely the combined effects of random variation and inconsistent counting criteria.
In the spirit of complete transparency, below is the table of QBs by year that I used to calculate the above figures. Play around with them! Add and delete players until you get to a group you’re comfortable with. I’m confident it would take quite a bit of torturing to get 35 games missed in 2016 but 70+ in 2013-14. As Jayne Cobb might say, “Best o’ luck, though.”
The table below is all QBs with at least one start in the indicated season. I have marked players that I explicitly excluded, but you can add them back in if you want!
QB | Year | Games Missed | Excluded from count? |
Bradford,Sam | 2013 | 9 | |
Campbell,Jason | 2013 | 1 | |
Cousins,Kirk | 2013 | 1 | |
Cutler,Jay | 2013 | 5 | |
Flynn,Matt | 2013 | 1 | |
Foles,Nick | 2013 | 1 | |
Freeman,Josh | 2013 | 2 | |
Gabbert,Blaine | 2013 | 5 | |
Hoyer,Brian | 2013 | 11 | |
Keenum,Case | 2013 | 2 | |
Lewis,Thaddeus | 2013 | 2 | |
Locker,Jake | 2013 | 9 | |
Manuel,EJ | 2013 | 6 | |
Ponder,Christian | 2013 | 5 | |
Pryor,Terrelle | 2013 | 4 | |
Rodgers,Aaron | 2013 | 7 | |
Romo,Tony | 2013 | 1 | |
Schaub,Matt | 2013 | 2 | |
Vick,Michael | 2013 | 5 | |
Wallace,Seneca | 2013 | 7 | |
Weeden,Brandon | 2013 | 4 | |
Bradford,Sam | 2014 | 16 | |
Bridgewater,Teddy | 2014 | 1 | |
Cassel,Matt | 2014 | 2 | |
Clausen,Jimmy | 2014 | 1 | |
Fitzpatrick,Ryan | 2014 | 2 | |
Foles,Nick | 2014 | 8 | |
Griffin,Robert | 2014 | 6 | |
Hill,Shaun | 2014 | 5 | |
Hoyer,Brian | 2014 | 1 | |
Locker,Jake | 2014 | 6 | |
Mallett,Ryan | 2014 | 5 | |
Manziel,Johnny | 2014 | 1 | |
McCown,Josh | 2014 | 5 | |
McCoy,Colt | 2014 | 2 | |
Mettenberger,Zach | 2014 | 3 | |
Newton,Cam | 2014 | 2 | |
Palmer,Carson | 2014 | 10 | |
Romo,Tony | 2014 | 1 | |
Smith,Alex | 2014 | 1 | |
Smith,Geno | 2014 | 2 | |
Stanton,Drew | 2014 | 3 | |
Vick,Michael | 2014 | 1 | |
Whitehurst,Charlie | 2014 | 3 | |
Bradford,Sam | 2015 | 2 | |
Brees,Drew | 2015 | 1 | |
Clausen,Jimmy | 2015 | 2 | |
Cutler,Jay | 2015 | 1 | |
Dalton,Andy | 2015 | 3 | |
Flacco,Joe | 2015 | 6 | |
Hasselbeck,Matt | 2015 | 1 | |
Hoyer,Brian | 2015 | 3 | |
Jones,Landry | 2015 | 2 | |
Kaepernick,Colin | 2015 | 7 | |
Keenum,Case | 2015 | 2 | |
Luck,Andrew | 2015 | 9 | |
Manning,Peyton | 2015 | 6 | |
Manziel,Johnny | 2015 | 6 | |
Mariota,Marcus | 2015 | 4 | |
McCown,Josh | 2015 | 8 | |
McCown,Luke | 2015 | 8 | X |
Roethlisberger,Ben | 2015 | 4 | |
Romo,Tony | 2015 | 12 | |
Schaub,Matt | 2015 | 2 | |
Taylor,Tyrod | 2015 | 2 | |
Vick,Michael | 2015 | 3 | |
Weeden,Brandon | 2015 | 1 | |
Yates,T.J. | 2015 | 2 | |
Anderson,Derek | 2016 | 1 | |
Bridgewater,Teddy | 2016 | 16 | |
Brissett,Jacoby | 2016 | 12 | X |
Carr,Derek | 2016 | 1 | |
Cutler,Jay | 2016 | 11 | |
Garoppolo,Jimmy | 2016 | 2 | |
Griffin,Robert | 2016 | 11 | |
Hoyer,Brian | 2016 | 9 | |
Kessler,Cody | 2016 | 2 | |
Luck,Andrew | 2016 | 1 | |
Mariota,Marcus | 2016 | 1 | |
McCown,Josh | 2016 | 3 | |
Newton,Cam | 2016 | 1 | |
Palmer,Carson | 2016 | 1 | |
Petty,Bryce | 2016 | 6 | X |
Romo,Tony | 2016 | 8 | |
Savage,Tom | 2016 | 1 | |
Siemian,Trevor | 2016 | 2 | |
Smith,Alex | 2016 | 1 | |
Smith,Geno | 2016 | 9 | |
Tannehill,Ryan | 2016 | 2 |
I’ve been a bit distracted with side projects lately – buying a house, co-teaching a high-level statistics course, my dissertation…you know, little things – so sorry for not updating this blog that no one reads for a few months.
BUT! I’m back with a very exciting post: I’m updating my prior investigation into the effects of the NFL’s decision to remove “Probable” from its injury report this past season, now that we have a full season to see how teams adapted (the original analysis had only weeks 1-8). Let me tell you, it’s been miserable for NFL injury analysts and honestly…probably pretty much fine for everyone else.
Since my previous two posts lay out all the relevant background, methods, and data sources in detail, we’re gonna skip right to the results update!
The above chart is a stacked bar chart of the number of player-weeks on the injury report with each game status from 2013-2016. I’ve also added (exact Poisson) 95% confidence limits to each category to help give us an idea of whether year-to-year changes in counts are beyond what we might expect from random chance – by and large, they’re not.
Obviously the probable bar went to zero in 2016 as the designation was eliminated. In previous years, the total number of probable and questionable player-weeks was 3,767 (2013), 3,571 (2014), and 3,484 (2015). On average, these years saw 2,524 probable and 1,084 questionable player-weeks. In 2015 there were 2,424 probable and 1,060 questionable player-weeks.
In 2016, meanwhile, we saw just 1,746 questionable player-weeks. If we assume we would’ve seen the same data in 2016 as 2015 if “Probable” hadn’t been removed (likely not a perfectly reasonable assumption because of random variation from year to year), it looks like about 35-40% of “Probable” player-weeks went into the “Questionable” category and the remainder fell off the report. This is just a touch higher than our mid-season estimate of 30-35%.
I was also curious whether there would also be a tendency to push everyone down the game status list (e.g. 2015 questionable becomes 2016 doubtful). Doubtful designations are basically unchanged while Out designations are up somewhat (747, 877, 920, and 1,009 player-weeks from 2013-2016), but overall I’m not seeing a ton of evidence for this kind of trend.
The official League definition of “Questionable” now is “uncertain that the player will play.” Well, that’s…vague. Previously the League said 75% of Probable, 50% of Questionable, and 25% of Doubtful players should play, but these percentages – which weren’t all that accurate, anyway, to be fair – are now gone. So what does Questionable mean in 2016?
Turns out a “Questionable” designation used to mean that, on average, you had about about a 65-70% chance of playing that week, though this varies a lot by team ^{1}. In 2016 players had about a 75% chance of playing each week; that’s a substantial difference.
Given the relative numbers of Probable and Questionable designations in the past few seasons, this is just a tick lower than what we’d expect to see if 35-40% of Probables were now Questionable – but that makes sense since many of the Probables who became Questionables were probably “more hurt” Probables (i.e. their risk of missing a game was likely above the overall Probable average).
These numbers are also largely in line with our mid-season analysis, though the proportion of Questionables playing in 2015 is somewhat higher.
Regardless, the Questionable designation was already a tough-to-predict hodgepodge before this year, and it’s only gotten more heterogeneous.
Since the Questionable players are now a mix of players who were previously both “Questionable” and “Probable” a natural question is: can we predict which “Questionable” players in 2016 are going to play (we’ll ignore the question of whether they’ll play well, for now)? Let’s check a few different predictors: injury type, practice status, and the player’s team.
Injury Type:
Most injuries seem to be clustered right around the overall average of 75%, marked with a red line in the above chart (the mid-season average was 73%). Players with back injuries (85% active) and concussions (82% active) were the most likely to play. Concussions (which were also above-average in the mid-season analysis) in particular makes sense to me since the return-to-play process portion of the NFL’s concussion protocol, while it has no set timeline, is often hard to complete in 5 days but easier to complete in 7, leaving a lot of players as “Questionable” on the game status report but ready to suit up on game day.
Back injuries were basically average in our mid-season analysis, but they’re substantially higher now; quadriceps/thigh injuries followed a reverse pattern. I’m not sure what to make of either of these, but if any trainers or physical therapists or physicians want to weigh in in the comments, please do!
On the other end, players with groin injuries (70% active) were somewhat below average both at mid-season and now. I’m not sure I have an explanation for that but, again, I would value any input trainers or medical folks can provide down in the comments!
I’ve also included error bars, which represent 95% confidence intervals. The intervals mostly overlap and all but back injuries include the overall average, so it’s possible there are no true differences among injuries and all we’re seeing is random noise. But if I had to guess, I think we’re seeing a real higher active proportion among concussions and maybe back injuries (though that’s tempered by the fact that they were average at mid-season). I’m not so sure about groin injuries being truly lower, either.
Practice Status:
Details on the trajectories are available here. In a nutshell: I stratified by the player’s final practice report status (Full/FP, Limited/LP, or none/DNP) before their game each week and the trajectory (down/flat/up) their statuses followed throughout the week.
Overall, if your final practice status was FP/LP/DNP, you played 83/76/47% of the time (the DNP figure is a bit higher than it was at mid-season). We often hear fantasy experts saying they’re concerned if a player doesn’t practice late in the week; this data bears that out, but it’s also not a death sentence.
As far as the trajectories go there’s less, but still some, variation. Overall, Down/Flat/Up trajectories meant you played 72/70/82% of the time. So if you’re trending up (as I’ve defined it) you’re more likely to play, whereas flat and down trajectories are similar at a bit below the overall average.
When we stratify by both, we see that among player-weeks with full participation in their last practice, those who were trending up (86% active) were somewhat more likely to play than those who stayed flat (80%), though this difference is much smaller than it was at mid-season.
Among players with limited participation at their last practice report, there was virtually no variation by trajectory; all groups were 76-77% likely to play.
Among players who did not participate in their final practice, those who had declined from mid-week actually had a decent chance to still play: 61%. That’s kind of surprising to me – it may be semi-injured guys just being given Friday off? Meanwhile, if a guy doesn’t practice all week his chances of playing are pretty low at 32%. Honestly, that’s still higher than I was expecting!
Except as noted above, these figures are largely in line with our mid-season analysis.
Team:
Teams have historically exhibited quite a bit of variation in what proportion of their “Questionable” players suit up on Sunday. Did this continue in 2016?
Overall you see quite a bit of variation by team – from Cincinnati with 93% of questionable players suiting up to Tennessee with just 35%. This is about the same range we saw in the mid-season analysis, as well.
You see a pretty even distribution of proportions across teams – they follow a gradation rather than being in “high” or “low” clusters. This suggests to me teams have settled on a natural range of definitions for what constitutes a “Questionable” player – this is to be expected when the official League definition of “Questionable” is the vague “uncertain that the player will play.”
Notably, this is about the same pattern of Questionable usage we saw before Probable was eliminated, so this isn’t really a new phenomenon of confusion. Indeed, some teams at least seem to have stumbled onto a definition they like and stuck with it, particularly at the low end: the bottom 3 teams in questionable active percentage in 2016 (Titans, Seahawks, and Jaguars) were 1st, 3rd, and 9th lowest in 2015, as well (they were 5th, 9th, and 11th lowest in 2014). These teams could be thought of as resistant to reporting minor injuries as “Questionable,” upping the chance that someone who is listed as such won’t play.
The Steelers are an example of a team that seems especially tight with listing Questionable players: in 2016 they had the 8th lowest in percentage and listed the second fewest players as Questionable. In 2015 and 2014 they listed the 3rd fewest and fewest, respectively; their active percentage was lowest in both those years, too. So if you’ve got a Questionable Steeler in your fantasy lineup, check their status carefully.
As a sidenote, New England – where Bill Belichick has a reputation for…strategic use of the injury report – did list a league-high 113 player-weeks as Questionable in 2016. However, they have played at a near-average clip, suiting up 73% of the time. I’m a Dolphins fan, and even I have to admit that doesn’t support the shady Belichick narrative.
Our numbers were largely in line with what we saw through the first 8 weeks of the season, though our larger sample size reduces some of the uncertainty we had at mid-season.
Overall, a little over a third of players previously marked as Probable appear to now be Questionable, with the remainder falling off the injury report. Questionable players are more likely to play now than they were in the past: the proportion has risen from the low-to-mid-60s to around 75% (which was, interestingly, always the stated proportion of Probables who should have played).
As far as which Questionables will suit up, the type of injury doesn’t help a whole lot. Certain patterns of practicing and not practicing during the week can certainly identify high- and low-risk players. Finally, teams exhibit a lot of variation in the severity of injuries that lead them to mark a player as Questionable, a fact that could be leveraged to make better predictions about which players will hit the field on game day (with the major limitation that we’re not predicting anything about how well they’ll perform if/when they get off the sideline).
Footnotes:
1. Full disclosure: this 67.3% figure might be somewhat overestimated (5-10% or so) due to some problems I had linking to the historical active status data.
]]>On Bill’s December 5th show the two were discussing Roberto Aguayo, the kicker for the Tampa Bay Buccaneers that the team drafted in the 2nd round this year. Due to his high draft position Aguayo’s struggles – he is currently just 15/22 on field goals in the NFL – have been highly publicized. I’m heavily paraphrasing, but they basically came to the conclusion that it’s obviously far too soon to judge whether Roberto Aguayo is, in fact, a good, bad, or mediocre kicker.
Now Bill and Daniel are super smart guys, but I wondered if the statistics would bear them out…
Aguayo is currently 15/22 on field goal attempts, making him a 68.2% kicker overall. One (overly) simple way to estimate how good or bad he really is would be to basically treat each of Aguayo’s kicks as a weighted coinflip and see what range of weights (made field goals) could reasonably yield a 68.2% success rate. This is close to what I did with preseason concussion numbers in a previous post.
However, this approach presents a number of problems. First of all, Thomas Jefferson never said nothin’ ’bout no field goals, and every attempt is not created equal. The most obvious driver of field goal success is distance, so we need to account for that.
Second, this approach would put Aguayo in a vacuum and ignore everything we know from NFL history about true kicker success rates. It doesn’t make sense to say a priori that every possible field goal percentage is equally likely. We should find some way to incorporate that prior knowledge.
Third, it may be naiive – particularly in a case like Aguayo’s – to treat each field goal like an independent coin flip. The poor guy has been pilloried for every single miss…you’ve got to imagine that has some sort of compounding psychological effect making another miss more likely. On the other side, Justin Tucker is feelin’ it this year, so every successful kick might be making his next kicks more likely.
The approach I take below corrects for the first two issues. I’ll also explain the likely effects of the third (independence) issue.
I scraped individual field goal data from 2011 through 2016 from Pro-Football-Reference using pages like this one. I only did the last 5 years because we know kickers have gotten better over time and I wanted an apples-to-apples comparison.
After excluding anybody with fewer than 20 attempts over this time period, we were left with 53 kickers including Aguayo.
I took a Bayesian approach to our question of whether it’s too soon to judge Aguayo. To provide a conceptual Bayesian introduction I’m going to shamelessly steal from Nate Silver’s The Signal and the Noise.
Say you want to estimate the chance of your partner cheating on you. You probably have some thought about how likely that is right now; in Bayesian terms, this is your “prior.” Fortunately for me, I love my wife (hi Amy!) and my prior for her cheating is extremely low, but not zero because if it were zero this example wouldn’t work.
But now let’s say I come home after a hard day at the blog mines and in the bedroom I find a pair of underwear that very clearly aren’t mine. This is some new “data” or “evidence” that I now need to incorporate into how likely it is that my wife is cheating on me.
I could do that by the following: I take my “prior” probability of my wife cheating on me, multiply it by how much more likely it is for me to find the underwear if my wife is cheating on me than it is to just find the random underwear, and voila! I now have a “posterior” probability for how likely it is my wife is cheating on me given that I found the underwear. This is basically how humans reason, though not quite as explicitly or mathematically. The basic process is illustrated below:
Now let’s apply this example to Aguayo. I want to estimate Aguayo’s true field goal “skill”. Before this season started I had the past 5 years of NFL data on kicker success rates – this is one reasonable “prior” for Aguayo’s skill.
Ah, but now I have some data (underwear) on how Aguayo has actually done. I can combine this data with our prior for his skill to get a new posterior estimate of Aguayo’s skill!
The first thing I looked at was the distribution of overall field goal percentages for 53 kickers with 20 or more kicks from 2011 through now. That’s the orange bars in the figure below:
However, first we need to adjust for the fact that Aguayo has actually faced an easier set of kicks than the average kicker. Only 1 of his 22 attempts has come from 50 or more yards away, with 14 of them from under 40. The blue bars are a weighted average of what each of the other 52 kickers’ percentages would have been had they had the same 0-39, 40-49, and 50+ yard distribution as Aguayo’s.
By the way, that blue bar over there on the left by his lonesome, 10% lower than anyone else? Sadly, Aguayo. The bad news is…that’s historically bad. The good news is it’s unlikely to continue to be that bad unless he’s in an unbeatable mental funk of some kind.
This distribution doesn’t look quite binomial (notice there was no one with an adjusted field goal percent below 65%), so we’re going to use a slightly fancier distribution to describe kicker success percentages: the beta distribution. The beta distribution is useful for a couple reasons.
First, it’s super flexible – by just varying a couple of numbers that define the curve (alpha and beta), you can get a whole lot of different shapes (see the figure on the right). The orange curve looks kinda like what we want, except we want it on the right and not on the left.
Second, it’s super easy to incorporate new information to get a posterior distribution. I’ll spare you all the math, but if we take our prior – which it turns out is best defined by alpha = 69.3 and beta = 10.6 – our posterior estimate is also a beta distribution with alpha = 69.3 + 15 Aguayo makes and beta = 10.6 + 7 Aguayo misses so far. Pretty badass, huh?
Here are our prior and posterior estimates for what Roberto Aguayo’s “true” field goal percentage is on a set of kicks similar to the ones he has attempted so far. Importantly, this is not an estimate of what his field goal percentage will be moving forward because he could make very different kicks! (NOTE: This chart and the numbers below were updated roughly 3 hours after the original post due to a coding error I discovered, but the main thrust of the results is unchanged.)
The orange line is our prior; this represents our best estimate of Aguayo’s skill on kicks of the type he’s taken before he ever takes a kick in the NFL. The higher the line is, the more likely that the value on the x-axis is his “true” skill.
The gray line incorporates the 22 kicks we’ve seen from him in 12 games so far. The blue line is what our estimate of his skill would be if we saw 4 more games of exactly the kind of kicks and makes/misses he’s had so far – I just multiplied the number of kicks so far by 4/3. The yellow line is the gray line, but after 2 full seasons.
Here are our means and credible intervals (similar to confidence intervals, but with a different interpretation and philosophy) for all distributions:
Guess Type | “Best Guess” | 95% Credible Interval | |
Lower | Upper | ||
Prior (No Aguayo Kicks) | 87.6% | 78.5% | 93.2% |
Posterior, Current Aguayo Kicks | 83.4% | 74.8% | 89.4% |
Posterior, Full Season of Aguayo Kicks | 82.3% | 74.0% | 88.4% |
Posterior, 2 Full Seasons of Aguayo Kicks | 79.3% | 71.7% | 85.2% |
Basically, before the season started our best guess is on 22 kicks of this type Aguayo would’ve nailed 87.6% of them if he’s basically like the group of kickers we included in our prior (everybody with >20 kicks from 2011-now). Instead, he’s made 68.2% of these kicks – poor kid.
If we consider that we’ve only seen 22 of his kicks and have data on 5,653 other kicks since 2011, our best guess for how he does in the future on these types of kicks is 83.4%. Not as bad as he is now, but that still only puts him better than about 20-25% of NFL kickers. Yikes. Still, there’s a decent chance his skill turns out to be better than that – there’s a 95% chance his true skill on these kicks is between 74.0% and 89.4%. 89.4% would put him in the top fifth or so.
If we continue to see this same kind of performance from Aguayo through the rest of the year, our best guess for how he does in the future would drop to 82.3%. After another full season on top of that, 79.3%, with only a small chance that he’s truly even a League-average kicker. This is an example of the “data swamping the prior” – we put progressively more and more weight on what Aguayo actually does as he, well, does more and more of it.
I don’t yet disagree with Bill and Daniel that it’s too early to tell, but it’s not looking good. The information Aguayo has given us so far already suggests he’s more likely to be a below-average kicker than not, driven especially by how comparably easy his kicks so far have been. On the plus side, though, it’s not likely he continues to be this bad, either. If we see this performance continue through a second season we will pretty much be able to say he’s a bad NFL kicker.
However, our analysis may be unduly cruel to Aguayo for a couple reasons. First, we used the information from all NFL kickers from 2011-2016 to create our prior. Aguayo – drafted in the second round – was supposed to be better than most kickers. If we’d used a prior distribution of just the top 20 or 30% of kickers – which may have been more appropriate – his posteriors would also look better now. Second, Jesus, the guy’s just had a rough year, and our model doesn’t account for the non-independence of his kicks (i.e. the compounding effects of pressure and missing kick after kick). If he can get out of any nasty mental spiral his percentage could increase a lot more than our model anticipates. That would be great! I’m certainly rooting for that.
One other limitation worth noting is the prior is only built on kickers who “survived” to make 20 NFL kicks. So we’re not including the possibility that Aguayo is actually bad enough to have never made it this far without having been drafted in the second round – something Bill and Daniel pointed out on the podcast.
So our analysis isn’t looking great for Aguayo, but it also represents a pretty pessimistic estimate of his future chances.
]]>“Out,” like it always did, means the player is certain to not play. Per our data, “Doubtful” continues to mean essentially the same thing.
“Questionable” is where things get interesting. According to my analysis, about 1/3 of players who would have previously been marked “Probable” in earlier years are now marked “Questionable,” while the other 2/3 simply aren’t listed (i.e. they’re considered “not injured”). This has altered what “Questionable” means in terms of how likely a player is to suit up on game day – in previous years 60-65% played in the next game, but so far in 2016 it’s 73%!
That means the “Questionable” players – already a hard-to-predict group – got even more heterogeneous. But can we look a little deeper and try and identify those more or less likely to suit up for their next game? I’m going to stratify by team, injury type, and practice status to try and find out!
Almost everything is explained in my previous post. The only new information is the practice reports, which came from weekly CBS practice report pages such as this one.
Let’s check on the percent of Questionables that play each week for a few of the major injury types. We’re only using information from the NFL injury reports as outlined in this post, so the categories are going to very broad and a mix of many different kinds of injuries that may not belong together. My apologies for that.
Most injuries seem to be clustered right around the overall average of 73%. Players with quadricep/thigh injuries (82% active) and concussions (85% active) were the most likely to play. Concussions in particular makes sense to me since the return-to-play process portion of the NFL’s concussion protocol, while it has no set timeline, is often hard to complete in 5 days but easier to complete in 7, leaving a lot of players as “Questionable” on the game status report but ready to suit up on game day. I’m not sure what to make of quad injuries – if any trainers or physical therapists or physicians want to weigh in in the comments, please do!
On the other end, players with groin (67% active) and calf (65% active) injuries were the least likely to be active. I’m not sure I have an explanation for that but, again, I would value any input trainers or medical folks can provide down in the comments!
I’ve also included error bars, which represent 95% confidence intervals.^{1} The intervals all overlap and most of them include the overall average of 73%, so it’s possible there are no true differences among injuries and all we’re seeing is random noise. But if I had to guess, I think we’re seeing a real higher active proportion among concussions and maybe quad/thigh injuries, too. I’m not so sure about the lower ones.
As I outlined in the previous post, teams issue 3 (only 2 if they play on Thursday) “practice reports” during the week that tell us if injured players had “Full Participation” (FP), “Limited Particiation” (LP), or that they “Did Not Participate” (DNP). If a player only had 1 or 2 of the 3 practice statuses (for example, if a guy first strains his hamstring in a Thursday practice) we would expect for a non-Thursday game, we assumed the other practices were “Full Participation.”
I wanted to stratify these statuses in a couple ways:
So, let’s take a look at the data!
Heeeeeyyyyy, now we’re cookin’! Overall, if your final practice status was FP/LP/DNP, you played 87/74/39% of the time. We often hear fantasy experts saying they’re concerned if a player doesn’t practice late in the week; this data bears that out, but it’s also not a death sentence.
As far as the trajectories go there’s a bit less variation. Overall, Down/Flat/Up trajectories meant you played 70/68/82% of the time. So if you’re trending up (as I’ve defined it) you’re more likely to play, whereas flat and down trajectories are similar at a bit below the overall average.
When we stratify by both, we see that among player-weeks with full participation in their last practice, those who were trending up (93% active) were way more likely to play than those who stayed flat (76%). That’s kind of surprising since players who were able to practice fully the whole week despite their injury seem like they should be good to go on Sunday, but the data shows that those whose practice status indicates a recovery over the course of the week are even more likely to suit up.
Among players with limited participation at their last practice report, there wasn’t a whole lot of variation by trajectory. Player-weeks with down/flat/up trajectories played 74/75/70% of the time. So it looks like if a player is limited at practice late in the week, they’re basically of average likelihood to play.
Among players who did not participate in their final practice, those who had declined from mid-week actually had a decent chance to still play: 66%. That’s kind of surprising to me – it may be semi-injured guys just being given Friday off? Meanwhile, if a guy doesn’t practice all week his chances of playing are pretty low at 26%. Honestly, that’s still higher than I was expecting!
Teams have historically exhibited quite a bit of variation in what proportion of their “Questionable” players suit up on Sunday. Did this continue through the first 8 weeks of 2016?
Apologies for the small text, but I thought it was still easiest to look on one chart. Overall you see quite a bit of variation by team – from Cincinnati with 100% of questionable players suiting up to Tennessee with just 33%. Not surprisingly, both of these extremes have extremely small sample sizes (9 and 8 player-weeks, respectively), so I imagine we can expect some regression to the mean as the season drags on.
You see a pretty even distribution of proportions across teams – they follow a gradation rather than being in “high” or “low” clusters. The teams don’t break into camps where some have 90% of the questionable players play and others 40%. They may very well all have a common understanding of “Questionable” but just be experiencing random variation. So I would caution against using this data – at least just this 8-week set – to make any reliable predictions about what teams will do next with their Questionable guys.
As a sidenote, New England – where Bill Belichick has a reputation for…strategic use of the injury report – has listed a league-high 69 (nice) player-weeks as Questionable through 8 weeks. However, they have played at a near-average clip, suiting up 70% of the time. I’m a Dolphins fan, and even I have to admit that doesn’t support the shady Belichick narrative.
We looked at the proportion of Questionable players that play each week by injury type, practice status, and team. While we saw variation by team, practice status seems by far the most useful for projecting actual game day status. If your final practice status was FP/LP/DNP, you played 87/74/39% of the time; if you followed a Down/Flat/Up trajectory (as I’ve defined it), you played 70/68/82% of the time.
Your best source of information is always going to be individual reports on game day about whether a guy is playing or not. Also, my data says nothing about whether a guy plays his regular number of snaps or not. But these data help provide some context that can inform rough guesses earlier in the week about whether a player is going to play, and that’s not not useful!
Footnotes:
1. For the stats nerds, all intervals in this article were calculated using the normal approximation to the binomial distribution.
]]>I wasn’t sure how this change would affect NFL injury reports, so I’ve been eagerly waiting to amass enough data to examine this rule change. Now that we’ve got a half season let’s take a look at the data!
To start, though, let me make sure everyone is on the same page about a few intricacies of the NFL’s injury report. The “NFL Injury Report” is actually three separate documents:
1. Practice Reports – these are reports given by teams on Wednesday, Thursday, and Friday (for teams with Sunday games)^{2} that lists the practice status of all players with “significant or noteworthy injuries.” This language does give teams some wiggle room on exactly whom they put on their reports. Some teams, but not all (as far as I can tell), even regularly list players (mostly veterans) who just miss a practice for scheduled rest. Each injured player gets one of the following designations each day:
*Did Not Participate
*Limited Participation
*Full Participation
You could previously also have been listed as “Out,” but that was also eliminated in 2016 to avoid confusion with the Game Status Report.
2. Game Status Reports – these are reports given by teams on Friday for Sunday games (or Wednesday/Saturday for Thursday/Monday games). They list a projection for how likely an injured player is to play in the team’s upcoming game. Of note, a player listed on the practice report does not have to appear on the game status report if they are certain to play. The game status designations are:
*Questionable
*Doubtful
*Out
As noted above, through 2015 “Probable” was a fourth option, but the NFL eliminated that this year.
3. In-Game Injury Report – exactly what it sounds like. We won’t waste more time on it here as it’s not pertinent to our questions.
Impact of Change: There were a lot of prognostications about the effects of this change. I was…uncertain. The players previously named as “Probable” could have followed one of two paths: “Questionable” or off the game status report entirely.
Path 1: The new rule could result in a lot more “Questionable” tags, since the NFL can get a bit investigate-y if a player not on the game status report unexpectedly doesn’t play.
Path 2: On the other hand, 90% or more of players marked “Probable” in a given week did in fact play in the team’s next game, so maybe they’d fall off the injury report entirely without a “Probable” designation.
I really wasn’t sure what to expect, and I thought different teams would probably take different tacks. (Unfortunately, I don’t think we have enough data to stratify by teams yet).
I scraped injury report information from Fox Sports weekly pages like this one for the first 8 weeks of the 2013-2016 seasons.^{3} In addition to the Questionable/Doubtful/Out designations we would expect, Fox has a “Day-to-Day” listing which, from cross-referencing with other sources, appears to be a weird mix of “not badly hurt enough to appear on the game status report” and “on injured reserve” (IR) (a special list that teams use to stash severely hurt players they would like to keep but not spend an active roster spot on). Because we just wanted to focus on the three categories for the game status report, I excluded all Day-to-Day and IR listings from my analysis.
I also wanted to calculate the percent of players with each designation who were active or inactive in a given week. The NFL has data on inactive players readily available for 2016, but historical inactive data is harder to come by. I settled on scraping the 2015 data from FFToolbox (for example, here). I did not pull data for 2013 or 2014. I know I’m breaking my own rule for not presenting data from only two years, but it will still provide an instructive comparison between the old and new injury reports, and I don’t think we’ve seen a lot of year-to-year variation in the inactive percentages of each designation.
Obviously the probable bar went to zero in 2016 as the designation was eliminated. In previous years, the total number of probable and questionable player-weeks was 1,706 (2013), 1,592 (2014), and 1,427 (2015). On average, these years saw 1,083 probable and 492 questionable player-weeks. In 2015 there were 970 probable and 457 questionable player-weeks.
In 2016, meanwhile, we saw just 805 questionable player-weeks. If we assume we would’ve seen the same data in 2016 as 2015 if “Probable” hadn’t been removed (likely not a perfectly reasonable assumption because of random variation from year to year), it looks like about 30-35% of “Probable” player-weeks went into the “Questionable” category and the remainder fell off the report. Neat.
As a sidenote, you might notice that the number of player-weeks on the game status reports shrank each year from 2013-2015, especially 2014-2015. It’s a bit hard to see in this graph, but I’ve also added 95% confidence limits to each category to help give us an idea of whether this is beyond the random year-to-year variation we might expect.^{4} The biggest decrease was a 12.5% drop in “Probable” designations from 2014-15. This might mark a trend of less reliance on the Probable category before this season, but I’m still inclined to say it’s random variation.
I was also curious whether there would also be a tendency to push everyone down the game status list (e.g. 2015 questionable becomes 2016 doubtful). Doubtful designations are basically unchanged while Out designations are up somewhat (377, 452, 459, and 493 player-weeks from 2013-2016) but 2016’s numbers are within the confidence intervals for 2014-15. That doesn’t provide much evidence for this kind of trend.
The official League definition of “Questionable” now is “uncertain that the player will play.” Well, that’s…vague. Previously the League said 75% of Probable, 50% of Questionable, and 25% of Doubtful players should play, but these percentages – which weren’t all that accurate, anyway, to be fair – are now gone. So what does Questionable mean in 2016?
Turns out a “Questionable” designation used to mean that, on average, you had about about a 60-65% chance of playing that week^{5}, though this varies a lot by team. Now it means, on average, you have about a 73% chance of playing each week. That’s a substantial difference. Given the relative numbers of Probable and Questionable designations in the past few seasons, this is just a tick lower than what we’d expect to see if 30-35% of Probables were now Questionable – but that makes sense since many of the Probables who became Questionables were probably “more hurt” Probables (i.e. their risk of missing a game was likely above the overall Probable average).
Regardless, the Questionable designation was already a tough-to-predict hodgepodge before this year, and it’s only gotten more heterogeneous.
From the data so far, my best guess is about 30-35% of the player-weeks that would have previously been categorized as “Probable” went into the “Questionable” category and the remainder fell off the game status report. Furthermore, Questionable players now play closer to 75% of the time rather than the 55-65% we’ve seen historically.
There’s still a lot more we can do with this data! My next post, for example, will focus on whether, in this brave new world without a “Probable” category, we can differentiate between the “Questionable” players more and less likely to suit up on game day.
It would also be interesting to see how different teams are reacting differently to these changes (in 2015 teams ranged from 31-85% of their Questionable designations ultimately playing, and that may vary even more now), but I’d really like a full season before doing that stratification.
Footnotes:
1. The NFL also eliminated the “Out” category from the practice status reports to avoid confusion with the game status reports. Additionally, they tweaked the rules for the injured reserve list: teams were previously allowed to recall one player from the list to the active roster each season, but they had to specify the player at the time they went on injured reserve. Now they can designate them for return when they actually want them back.
2. For Thursday games, these reports are issued Tuesday and Wednesday. For Monday games, Thursday, Friday, and Saturday.
3. I initially tried to pull data from Pro-Football-Reference (PFR) team pages (like this one) for 2009-2016. However, PFR’s injury data was a.) incomplete for 2016’s first 8 weeks and b.) looked a bit funky for 2014 and earlier. The data exhibited a huge sudden jump in overall player-weeks on the injury report between 2011 and 2012, which I’m concerned had to do with changes to PFR data collection/tracking procedures rather than a true change. Also, from 2009-2014 there are 50-100% more questionable than probable designations each year, which doesn’t seem right. There should be way more probable designations.
4. For the stats nerds, I considered each injury category as a count and calculated exact Poisson confidence intervals.
5. As noted above I didn’t pull similar data for 2013 and 2014, but cross-referencing with data at Football Outsiders (FO) and HSAC suggests that at least the Questionable and Probable percentages for 2015 were relatively in line with 2013 and 2014. The Questionable numbers from FO showed only 56% played in 2014, but that rebounded to the historical low-60s in 2015.
It’s an intriguing finding, but unfortunately the conclusions we can draw from it are limited. Specifically, the study cannot tell us whether professional football raises or lowers suicide rates. There are several reasons for this, but we’ll focus on a couple of the bigger ones below.
It’s always good to start analyzing a study^{1} by figuring out just what question the researchers were trying to answer. In their own words, the study’s purpose was “To compare the suicide mortality of a cohort of NFL players to what would be expected in the general population of the United States.” Notice their stated purpose was NOT to determine whether playing in the NFL causes or prevents suicides. This is a responsible omission because their study can’t answer that question.
They took a cohort of 3,439 retired NFL players who played 5 or more seasons from 1959-1988 using an NFL pension fund database. They watched these players through 2013 and counted 12 suicides.
They then asked how many suicides would we have expected among a sample of 3,439 people chosen from among all U.S. adult males, followed over the same time period, and having similar ages and races as these NFL retirees. Using national mortality rates and cause-of-death data^{2}, the researchers estimated they would have seen 25.3 suicides in this group.
12/25.3 = 0.47, the standardized mortality ratio (SMR) reported in the study. This corresponds to a 53% lower suicide rate in the NFL players versus all U.S. men of similar ages and races. In epidemiology we refer to this whole process as indirect standardization. ^{3} Got it? Cool.
Now, it’s really important to understand what this SMR means and, even more importantly, what it DOESN’T mean. Specifically, in the context of this study this is NOT strong evidence for football preventing suicides. (To their great credit the authors do not overstate their conclusions in their paper. To my great surprise, neither did the majority of media outlets reporting on the study.) To understand why the study doesn’t allow us to draw this conclusion, let’s appeal to the great scientist Keanu Reeves.
I’m going to subject you all to a version of an exercise I love to run my students through.^{4} Say you wanted to know whether playing NFL football for 5+ years (versus no NFL football) caused or prevented suicides. In this study our “outcome” is suicide rate and “playing NFL football for 5+ years”^{5} is our “exposure” of interest (the thing whose effect on suicides we want to measure).
OK, so also say you had no financial, time, or even reality-based limitations on what you could do. How would you study this question? Go ahead, take a minute. I’ll wait.
…
…
…
OK, want my answer? Well you’re getting it anyway. I’d take the 3,439 retired NFL players from the pension database, watch them for suicides through 2013, then…
…hop in a time machine, go back to when each of them was born, and not change anything except to stop them from playing in the NFL. This would then become our unexposed group. I would then travel back to 2013 and check to see how many of them committed suicide under these new, no-NFL conditions.
This whole time-traveling scenario is something epidemiologists call a counterfactual. This is simply a scenario we might have liked to observe but didn’t. Here, I’ll let the Big Bang Theory explain. It’s useful when we’re thinking about how to measure the causal effect of an exposure, such as whether playing in the NFL raises or lowers suicide rates.
This study, setting aside all concerns about feasibility, would let us isolate the effect of “playing NFL football for 5+ years” (versus not playing in the NFL) on suicide rates. If we see more than 12 suicides in the unexposed group that was stopped from playing in the NFL, then playing NFL football for 5+ years prevents suicides; fewer than 12 suicides would suggest the opposite.
The simple answer is something epidemiologists call exchangeability. In layman’s terms, it means our unexposed group that was manipulated to not play in the NFL otherwise stands in perfectly for – “is exchangeable with” – our exposed group with respect to suicide rates. Surely the exact same men who were only stopped from playing in the NFL stand in well for the actual players from the pension database in every way except playing in the NFL for 5+ years, right? If this is the case, then we can be certain that any differences in suicides we observe were because of playing in the NFL for 5+ years!
If your exposed and unexposed groups are not exchangeable, however, you have what scientists call confounding. Simply put, this means the effect of one thing is mixed up with the effect of another thing(s).
NFL players are vastly different from the general U.S. male population in any number of ways. Although the NIOSH study made sure these general males were the same ages and races as the NFL players, the general males may be, for example, less wealthy on average. They’re certainly in much worse shape than these elite athletes (well, besides the arthritis and chronic pain in NFL retirees, maybe). They probably smoke more and have very different health habits and behavioral patterns than NFL players. For clear evidence of this, look no further than Table 1 in the paper (paywall): while suicide rates were 53% lower, overall death rates were also 40% lower in NFL retirees than would have been expected in the general population. Cancer deaths were 41% lower and cardiovascular deaths 25% lower; these all suggest better baseline health for the elite athletes. Assaults and homicides were a whopping 83% lower than expected, suggesting possible differences in socioeconomic status. Other data from this same cohort, interestingly, do show much higher death rates for neurodegenerative diseases, such as Alzheimer’s, among NFL retirees.
The point is any of these could be associated both with playing in the NFL and suicide. If these variables are also not the result of playing NFL football (perhaps income is, but lifelong healthy habits may not be), they aren’t something whose effects we want to measure. In the NIOSH study, the effects of these “extraneous” variables cannot be teased out from those of actually playing on a professional football team, severely limiting our ability to draw conclusions about professional football’s effects on suicide rates.^{6}
But it’s not as simple as all that. One important thing we waved our hands at above is the exact definition of our exposure (the thing whose effect on suicide rates we want to estimate). We can say it’s “playing in the NFL for 5+ years,” but what exactly does that entail? It includes a lot of bangs to the head, sure, but it also (at least today, less so from 1959-1988) includes a nice payday. These could have contradictory effects on suicide rates, but if we just compare a group of long-term NFL players with men who didn’t play football we can’t tease these out: any difference we see could be due to brain disease from football-related head trauma, or having a few hundred thousand dollars in the bank, or a combination of both.
Think about it like this: if you went back in time to alter something about the NFL retirees from the NIOSH study, whatever thing(s) you change are responsible for the differences you observe. If stopping someone from playing in the NFL prevents head trauma but also cuts their paycheck, you’re going to see the effects of both of those in 2013.
If what we really wanted to know was whether the head trauma associated with playing in the NFL for 5+ years influenced suicide risk, perhaps we could compare NFL retirees to other professional athletes, such as MLB or NBA retirees, who may look similar to (that is, are exchangeable with) NFL players on many health and socioeconomic factors but just haven’t had those hits to the head.^{7}
Of course, maybe we want to estimate the NFL’s total effect on suicide risk, instead. Then income, and perhaps some other health factors influenced by playing professional football, would be intermediate “steps” on the path between playing in the NFL and suicide. Anything that is a result of playing in the NFL cannot, in this case, be a confounder, because it’s part of the effect we said we wanted to measure – and maybe comparing NFL retirees to a general population group is more valid.
So the lesson here is always be very careful and explicit when defining the thing whose effects you want to measure, because even something as seemingly clear as “playing in the NFL for 5+ years” is, upon closer inspection, actually pretty muddy.
Not at all! It did exactly what it set out to do: it compared suicide rates in the NFL to a hypothetical cohort of all U.S. males with similar ages and races. This is a valuable data point that suggests maybe we’re biased by high-profile news reports of famous athletes killing themselves. It just simply can’t tell us much, in my mind, about the effect of professional football on suicide rates. Better but still realistic studies might entail comparing NFL retirees to a more similar group, such as other elite professional retired athletes, or statistically adjusting for some of the confounders I outlined above.
Such a study was actually done for college athletes; it found that while NCAA athletes as a whole had much lower suicide rates than college students in general, NCAA football players had more than twice the suicide rate of other male college athletes. Though college football players are unpaid so income can’t be the reason, this data speaks strongly to the “double-edged sword” hypothesis above: football might have a bad effect on suicide rates, but the benefits of being an athlete overall may outweigh those, resulting in a net lower rate.
Whether playing football is good or bad for suicides, then, depends on how exactly you want to define “playing football.”
Footnotes:
1. The study was actually published online and received quite a bit of press back in May, but to my great shame I missed it then.
2.Cause of death, almost always retrieved from death certificates, can be unreliable (e.g. if someone died of an infection due to an underlying cancer weakening their immune system, did the cancer kill them or the infection?), but it’s the best we have.
3. This is a very common approach in occupational epidemiology (e.g. comparing the cancer mortality of factory workers exposed to asbestos with those of the general population), so I wasn’t surprised to see it used by NIOSH.
4. And taught to me first by Penny Howards, a wonderful epidemiology professor at Emory.
5. Don’t worry, for now, about what exactly “playing in the NFL” means. It’s an extremely important question that we’ll get to in a moment.
6. I stated this above but it’s worth re-emphasizing: the study authors and most of the media did not try and draw this conclusion from the study. I’m just explaining why they didn’t.
7. Another way might be to do what the authors of this study did and look at whether, within NFL players, suicide rates are lower among positions with lower concussion risk (in the study these were termed “speed” vs. “nonspeed” positions). However, this approach has two substantial drawbacks: by stratifying by position you get down to very small numbers of suicides (6 in each group in this study); and diagnosed concussions are far from the only source of head trauma in football, which involves hundreds of subconcussive hits that could be just as big a problem for long-term negative outcomes.
The headline figure is that concussions dropped from 83 in the 2015 preseason to 71 in the 2016 preseason. Let’s dig in a little more deeply and see what we think once we take our uncertainty into account.
Here are the actual League-reported concussion numbers^{1} for the 2012-2016 preseason, put together by a company called Quintiles that also tracks the NFL’s other injury data. These numbers break out the regular season and postseason for 2012-2015, too, but let’s just zero in on the preseason numbers for now.
We could approach quantifying our uncertainty in a statistical way or a “non-statistical” way. Let’s ease in by starting with the latter:
Just look at the numbers, reproduced below:
For overall concussions (games + practices), the 2016 number actually looks pretty good – lower than any previous year, and the numbers for 3 of the previous 4 years were remarkably consistent in the mid-80s. We’ll leave the question of “statistical significance” for the next section (sort of), but for now, just ask yourself: Is that an impressive drop? A meaningful one? I don’t know. I’m less impressed than I was just looking at the 2015-2016 comparison considering there were only 6 fewer concussions in 2016 than 2013. Maybe 71 is within expected random historical variation and not indicative of a sustained or trending drop; maybe it’s an impressive 15% drop from 2012 and 2014-15. I’m not sure (you’ll read that a lot here).
If we stratify by concussions in games versus practices, things become a bit more interesting. Let’s start with practice concussions, which actually look substantially lower the last couple of years. Jeff Miller, the NFL’s senior vice president of health and safety policy, said “…it is a trend, we think, now after a couple of years that there are fewer concussions in practice.” OK, two sustained years of a 25% reduction in practice concussions is kind of compelling (though calling 29 to 26 from 2015-16 a “drop,” as John Clayton does here, is a stretch).
The game numbers are where I would have a problem with interpreting 2016’s preseason concussion figures as a drop. Look at those numbers; 45 concussions in 2016 is a 16% drop from 2015 (54), but it’s basically in line with the numbers from 2012-2014. Is that something to trumpet? Looking at those numbers, I’d say you could at best say that preseason game concussions are basically flat, with 2015 as a high outlier.
We can see the historical variation in these numbers, which we should expect a good bit of given, as Miller correctly points out, we’re dealing with a fairly small sample of concussions each preseason. Now let’s try and quantify how much variation we should be expecting…
Skip to the end of the box in this section if you don’t like statistical details. But I’m going to try and make this very accessible regardless of your math background, so stick around if you want to learn something!
We saw 71 concussions in the 2016 preseason, but that number is subject to something called random error that can cause the number of concussions to bounce around from year-to-year, even if the underlying rate of concussion (the thing the NFL has power over) isn't changing!^{2} Quantifying how much might we expect these numbers to bounce around year to year can help us figure out whether 6 or 12 or 30 fewer concussions is actually a meaningful drop or not. So let's get on that. Coinflips and Concussions The number of concussions we observe in a preseason is what we call a random variable. The statistical definition of this is quite dense, but what it boils to is it's a number that we observed that could have been another number. Think of it like this. Take a coin out of your pocket. Flip it 10 times. How many heads did you observe? Let's say you observed 5 heads, the most likely outcome for a fair coin. But you could have observed 6 or 8 or 1 by random chance - you just happened to observe 5. Flip it 10 more times and see if you get a different number of heads. The number of heads in 10 flips is a random variable! The number of concussions we saw in the 2016 preseason can also be described as a random variable. Think of each player in a single game or practice as a coin flip. How many flips did we have in the 2016 preseason? Let's assume each team has 90 players in each of the first 3 preseason games and 75 in the last one due to roster cutdowns. Then you have ((90*3+75)*32) = 11,040 "player-games" in the preseason. Let's also assume each team has 6 weeks of 5 practices per week, with the first 5 weeks featuring 90 players and the last week featuring 75 players. Then you have (90*5*5*32)+(75*5*1*32) = 84,000 "player-practices." Sum these two together, and we get 95,040 "athlete-exposures" (AEs), which is one athlete participating in one game or practice. This is equivalent to 95,040 flips of a coin which, if it comes up heads, means a concussion (thank you, I'm here all week). Fortunately for us, this is not a "fair" coin, but a coin weighted heavily against coming up heads, as demonstrated by the fact that we only saw 71/95,040 flips come up heads (concussion). In fact, we can infer that the most likely value for the probability of heads on this unfair coin is 71/95,040 = 0.0007. If we flip a coin so weighted 95,040 times, we are most likely to see 71 heads/concussions. But we might have seen 68, or 83, or 129 by simple random chance. Binomial Distribution But exactly how likely are each of these other values? Random variables such as the number of preseason concussions - where you have k "successess" (concussions) in n "trials" (athlete-exposures) - are described by a mathematical function known as the binomial distribution: Don't get stuck on the equation. It just describes the probability of observing k concussions in n athlete-exposures where p is the proportion of athlete-exposures that result in a concussion (i.e. the probability of heads on our weighted coin). In 2016, our best guess for p is our observed proportion, or 71/95,040 = 0.0007.^{3} k can range from 0 to n concussions. The probability will be largest for k = 71 and will grow progressively smaller the further away you move on either side. The exact figures are plotted below (with k truncated at 150 because observing more concussions than that is so vanishingly unlikely): Confidence Interval A common way to define a plausible range of values for any quantity is a 95% confidence interval. In our case, we'll construct an exact binomial 95% confidence interval for the number of preseason concussions. To do this, we need to take the binomial distribution from above and figure out the range of values around our observed value of 71 that sum up to 95% of the total probability. We do this by first finding a "lower bound" for the interval below 71, and then an "upper bound" for the interval above 71. The process is illustrated on the graph above. For the lower bound, we start summing up the probability that we observed 0 concussions, then 1 concussion, then 2 concussions, and up and up and up until we reach a total of 2.5% probability. Why 2.5%? Well, we need 5% total probability outside our interval, so we want 2.5% on the low end and 2.5% on the high end. It turns out we reach 2.5% probability somewhere between 54 and 55 concussions - let's say 54 to be safe. For the upper bound, we start summing up the probability that we observed 95,040 concussions, then 95,039, then 95,038, and down and down and down until we reach a total of 2.5% probability. It turns out we reach 2.5% probability way down around between 87 and 88 concussions - let's say 88 to be safe. So, we can say for the 2016 preseason we observed 71 concussions with a 95% confidence interval of 54-88 concussions. One proper interpretation of this is that if the "true" or "mean" number of concussions in the 2016 preseason was 71 (which it might not be because of random error from our finite sample!), then if we repeated the 2016 season a zillion times we would observe between 54 and 88 concussions 95% of the time.^{4}
After incorporating random error to account for the fact that we only observed a finite number of concussions, games, and practices, we get the following numbers and 95% confidence intervals for the total preseason concussion numbers (all numbers rounded to nearest concussion). I could’ve done it for the game and practice numbers, but this post is so long already:
2012: 85 (67-103)
2013: 77 (59-95)
2014: 83 (65-101)
2015: 83 (65-101)
2016: 71 (54-88)^{5}
All of these numbers are contained within each other’s 95% confidence intervals, suggesting that maybe we could have expected 71 concussions this year due to random chance rather than a true change in the underlying concussion rate.^{6} ^{7}
Taking into account our confidence intervals and the observed historical variation in concussion numbers, I’m not convinced 83 to 71 is a meaningful drop. There’s a decent shot we just observed a bit of a low outlier year due to random chance without any change in the underlying concussion rate – which is what the NFL can change and what it should be concerned about. However, I’m not totally convinced it’s not a meaningful drop, either! If I saw preseason concussions stay steady or continue to drop in 2017, I’d be more convinced. But even that might not be enough.
The regular season numbers offer another instructive tale about the difficulty of figuring out whether there’s a true trend in concussion frequency or not:
2012: 173 (95% confidence interval: 147-199)
2013: 148 (124-172)
2014: 115 (94-136)
2015: 183 (156-210)
The NFL looked like it was on the right track from 2012-2014 with two consecutive large decreases before concussions spiked again in 2015. Was that a real downward trend in 2013-14 or were they (especially 2014) just low outlier years? Was 2015 a high outlier year, or did that spike reflect some new true spike in the underlying concussion rate? It’s so hard to know, and we need to acknowledge that uncertainty rather than just comparing the numbers from two years to see whether they went up or down.
Footnotes:
1. Keep in mind these are diagnosed or reported concussions. These numbers are therefore sensitive not only to changes in the “true” concussion rate and random variation but also to any changes in reporting practices or tendencies such as tweaks to the League’s concussion protocol.
2.Random error comes from the fact that we’re only looking at a finite sample of games, practices, and concussions. Some might argue that because we observed the entire NFL preseason in 2016, there is no random error in our measurement – we got the “true” number of concussions in the 2016 preseason, period. It’s a statistical philosophy debate with points on both sides, but I think about it like this: you know more about a QB’s true performance after 160 games than after 16 games than after 1 game, right? It doesn’t matter whether you’ve observed 100% of his games-to-date or not. There’s still random error from observing a finite sample of games regardless of the proportion that you measured, and that uncertainty needs to be accounted for! I feel more confident in my assessment of well-known fancy dog Tom Bradythan I do Dak Prescott even though I’ve observed 100% of each of their games. I know more about Brady than I do about Dak and I need a way to quantify that lower uncertainty.
3. If we had some other outside information that p was actually a different value, such as 0.0001 or 0.1, we could then calculate the probability that we observed 71 concussions under those conditions. But here we’re assuming the true p is our observed proportion. You’ll see why shortly.
4. A very common misconception is the 95% confidence interval means you are 95% confident the “true” value lies somewhere in your interval. Technically, the correct interpretation is that if you measured your desired value over and over and over again and calculated 95% confidence intervals in the same way each time, 95% of those intervals would contain the “true” value you’re seeking to measure. An alternate interpretation – the one I used above – is that if you assume you have the “true” number of concussions, then you can say that in 95% of replications you would get a number within the 95% confidence interval (absent any other source of error or bias).
5. For the real stat nerds, I also tried treating concussion numbers as a count variable and constructed exact Poisson confidence intervals. They barely changed (shifted up by 1-2 concussions). Just wanted to cut that comment off at the knees. J/k nobody’s going to read this blog, I won’t get any comments.
6. For those who really want to know, the 2015 and 2016 numbers are not statistically significantly different at p=0.05. Hypothesis testing and p-values are the devil’s mathematics for many reasons I might expound upon in another post. You should always focus on proper estimation and uncertainty assessments, such as confidence intervals, rather than whether you get a p-value above or below an arbitrary threshold.
7. Keep in mind that these confidence intervals only account for random error. They do not account for additional uncertainty, such as whether concussion reporting or counting practices changed year-to-year, that could make our actual range of plausible values for each year even wider. Thus these intervals should be understood as the minimum uncertainty we have in these figures.