Be sure to join the Discord channel to talk hockey, and everything else, with me and fellow subscribers.
Today’s post was written by C.J. TURTORO. You can find him on Twitter @CJTDevil.
Disclaimer: Some of the links in this piece will bring you to sites that will not show data if you are not a subscriber. I personally recommend subscribing to both Evolving-Hockey and Hockeyviz, but if you don’t the piece will still make sense based on the text and images alone.
If you spend time on hockey twitter, you probably see folks like me often share images like these to express the overall quality of a player. Here’s an example of Damon Severson.
In the public hockey analytics world, the main tenets of player evaluation stem from the impact a player has on the frequency and danger of shot attempts for and against in all situations. Commonly, this is broken down into even-strength play (EVO and EVD) and special teams play (PPO and SHD).
Depending on the model, there are different roles to play for penalties and conversion rate (i.e. shooting percentage) as well, but it’s not too difficult to do a one-to-one comparison of most models on these 4 basic features. Though, for a few reasons, you may find yourself confused as to how to compare apples to apples, or doubt whether or not both of these things are even fruits.
For today’s article I’ll be referencing some frequently cited viz from Micah McCurdy’s Hockeyviz, and Luke and Josh Younggren’s Evolving-Hockey -- particularly the even-strength components. I’ve chosen to focus on those you can check any player yourself (if you subscribe), and because, unlike some visual aggregations of various stats you may have seen over the past few years, these are intended to be somewhat MECE (mutually-exclusive, collectively exhaustive) and were created by the actual people who built the models so are more likely to have utilitarian integrity.
In order to try to compare apples-to-apples here, we have to do some minor unit conversions. First of all, the Evolving-Hockey metric is a z-score (standard deviations above/below the mean) so we have to go to their website to find the actual raw impact here. When we do that, we still have some work to do because Hockeyviz portrays impact as a percent increase/decrease. If we want to find Severon’s percent impact on offense, we have to add his impact (+0.185 expected goals per hour) and add it to the league average xG rate (2.29) and see what percent it increased: (2.29 + 0.185)/2.29 = 1.08 = 8% increase. If we do this for offense and defense for the Devils’ entire defense, we get the following results.
Some of these are big differences -- Ryan Murray goes from a good defensive player in Hockeyviz to a bad one in Evolving-Hockey, for instance. Kulikov looks significantly better offensively in Evolving-Hockey. We’ve fixed the issue of the units, how come the models still don’t match? Aren’t they both just context adjusted estimates of a players impact on shots this season? Well, yes, but the way these models determine that is VERY different.
Evolving-Hockey uses RAPM, a method borrowed from NBA and whose methodology is explained here. It operates on the philosophy that the more evidence we have on a player, the more confident we can be that they are significantly different from league-average (0-impact). If players are very far from zero, they receive a “penalty” that pushes them back. The penalty is bigger if they’ve played fewer minutes. The model also adjusts for score, venue, zone, rest, teammates and opponents at each strength state.
Hockeyviz uses a model McCurdy called Magnus 3 -- customized version of the same modeling mechanism as RAPM (writeup here) but with a few notable differences. The first and, by far, most important difference in the models is that players are not penalized for their departure from 0, but the departure from their own history. This is called using a “prior” -- if a player was really good last year, you expect them to be better than a player that was not. If you’re interested in some of the other, more minor differences, between the models, I’ll include a bulleted list at the end, but I really want to examine this point specifically because it is such a big chance -- especially early on in the season when the “prior” is doing the most work.
If you look at the Devils’ defenders listed above, you can see some evidence for the differing model approaches. Ty Smith is a first year player so his prior is doing the least amount of work, which is why his estimates are nearly identical. The model see’s a good-offense, bad-defense player. But his partner for most of the year, Kulikov, had been bad defensively the past few seasons so Magnus figures most of the offense of that pair is likely due to Smith. RAPM doesn’t know about Kulikov’s offensively-inept past and so gives him credit for his results this year. We can see an even more salient depiction of this in Ryan Murray.
The RAPM model doesn’t know that Murray was really good defensively last year and so it sees someone who has the 2nd highest xGA rate on the team and decides he must have done poorly. Magnus, however, thinks it’s unlikely that he’d be bad this year given how good he’d been before and so it likely blames Murray’s main partner Subban (who it knows has struggled defensively the last 2 seasons) for those results moreso than RAPM.
To finish off, I’d like to clarify that I am not advocating for one model or the other, but merely explaining a key difference between them. You might prefer RAPM, Magnus, or a different one for each player depending on the “story” you think fits best.
The Devils have a new GM, new coach, new system, and several new players so it’s fair to wonder how much credence you should give the prior at the moment. On both on-ice results, and to the eye, Miles Wood has been one of the Devils’ best players and a natural fit for Ruff’s system -- RAPM agrees.
Magnus, however, thinks that’s a very unlikely description of what has happened since he was so bad the last few years under Hynes and players don’t often “flip the switch” that fast.
Is one of those stories more “right” than the other? I’ll leave that up to you guys to decide. Merry modeling everyone!
I mentioned that at the end of the piece I’d give you more of the differences to anyone who is interested so here are some (list not exhaustive):
Magnus includes all the RAPM variables plus a coaching one (which also involves priors). Coaching impacts available here.
RAPM directly predicts aggregate response variables like xGs, shot attempts, and goals, whereas Magnus creates a map of shot impacts at all locations and derives xG from that impact map.
Magnus a “weird” penalty to account for the fact that the NHL is an “apex” league where players who are way better than average may be common, but players way worse than average probably didn’t stay in the league. The distribution of “talent” is symmetrical in Evolving-Hockey’s RAPM but skewed-right in Magnus. Therefore, it is harder to be very good in the model than it is to be very bad.
If you enjoyed this post from C.J. Turtoro, why not share and help Infernal Access grow?
nice write up CJ - interesting things to consider. I thought some of these models are supposed to take into account who the defensive partner is and who else is on the ice with them, or is that WAR/GAR?
anecdotally, while waiting for the real NHL season to start, I was playing a season with the devils in EA sports NHL 21 on the playstation. Wood was lighting it up for me, and I thought to myself, wouldn't it be nice if he did this in real life. and now..he is!
I imagine that all of the data that ea sports collects and the models they use to fine tune the players and their individual performance must be worth something. Has anyone ever looked into the ratings the players get in the video game and how accurate their model is for predicting actual performance in real life?
This article is very informative... SOOOOOO over my head, but a helpful glimpse into a lesser known hockey world.