Episode 10: Hit Me Baby One More Time
(You can hear us talk about this topic in our July 24 episode, at the 11:16 mark.)
When Hannah and I decided that we were going to talk about fighting and physical play on a hockey episode of our podcast, I knew that I wanted to look into “hits” as a statistic. Hits don’t garner much respect (as a statistic, that is—as a style of play, that’s a whole different story, depending on who you're talking to), which is for two main reasons.
- People often consider hits to be a positive thing (such thoughts are usually accompanied by a lot of vague qualifiers like “physicality!” and “gritty, high-energy play!”), but that’s not really true. Teams often do better when they aren’t hitting as much, since by definition you don’t have possession of the puck if you’re the one doing the hitting. And as far as I’m aware, high hitting rates aren’t particularly associated with strong defense.
- The definition of a hit is fairly murky, which leads to more subjectivity among the scorers at each NHL arena.
The timing could not have been more perfect because just as I was thinking about this, I was working through my copy of Stat Shot (highly recommended, by the way!) and came across their chapter on hits. I really appreciated their process, as they talked through hits and various ways to make it more useful as a statistic, and I decided to replicate their work for the most recent 17-18 season. Since I’m a hockey analytics beginner, it was a useful exercise!
I’m going to walk through this process at a fairly high level, but to get the most details, you should definitely pick up a copy of Stat Shot. To see the data visualization, explore below or click here. (If it isn't showing up at the bottom of this page, refresh!)
- As is customary, we’re looking at even-strength situations only. Power plays and penalty kills are different beasts, so we’re eliminating differences in play in those specialized situations.
- The first step is to tackle the possession issue. Some players may hit a lot, but that isn’t as valuable if they have low possession numbers. We can adjust the hit total with Fenwick (also known as unblocked shot attempts), one of the most common proxies for puck possession.
- We also need to address the potential sample size problem by taking into account time on ice. The cutoff used here was 800 even-strength minutes: anyone with fewer minutes had their hits adjusted by taking a weighted average of their hits, relative to their time on ice, and the average amount of hits for their position.
- In order to (partially) eliminate the subjectivity problem in actually recording the statistic, we can look at the number of hits registered for each team at home versus on the road for the past few years. Using average possession time, we can determine how the actual hits recorded at home correspond to the expected values each year, and then use a weighted average to calculate an overall “bias factor” for each team that’s applied to their player’s hits.
- Lastly, we convert the adjusted hits to a per-60 rate.
I was also personally curious as to what the scatterplot looked like between this adjusted hitting rate and points per 60 (which was also adjusted to take into account TOI using the same procedure as above). In the visualization, I added dotted lines to represent the top 20 percent for both adjusted hitting rate and points per 60. It’s fascinating to see who shows up in or near that quadrant (e.g., Evander Kane, Dustin Brown, David Backes).
You can see all of these numbers in the visualization. The first tab shows the scatterplot (with the ability to find a particular player and choose how you want the data to be color-coded), and the second tab shows the summary per team. (I did restrict the games played to 30, just for simplicity’s sake. And due to my time constraints, the underlying data aren’t as accurate as they could be when it comes to players who spent time on multiple teams. Players were assigned to the team for which they played the majority of their games.)