Quantifying the NBA’s Most Impactful Rebounders Using Tracking Data


This article is the second in a series on rebounding in the NBA.


Introduction

Who are the best rebounders in the NBA today? In Drummond, the Joker, and the Worm, I discussed the importance of collecting marginal rebounds and how rebounding numbers, even advanced figures that adjust for pace and shot volume1Rebounding %, for instance., can be misleading. In simplest terms, there are 2 broad aspects to basketball, the efficiency game2Averaging more points per shot than the opponent. and the possession game3Attempting more shots than the opponent. As a key part of the possession game, the goal of a rebounder is to maximize their team’s possession of the ball, by collecting rebounds that the opponent would have otherwise grabbed4Which is why offensive rebounds are considered more valuable than defensive rebounds.. A rebound that would have otherwise been collected by a teammate is far less valuable. In other words, the most effective rebounders are those that help their teams collect as many rebounds as possible, not necessarily just those with the most individual boards. Therefore, when comparing Rodman’s rebounding ability to current players, I leaned heavily on On/Off rebounding rate, which compares team rebounding rate when a given player is on the court vs on the bench. However, this metric is blind to the ability of a player’s teammates and replacements, making it extremely noisy and only suitable for use when evaluating a career, not a single season.

Thankfully, there is a solution that can reduce the variability of the On/Off method. Just as adjusted plus-minus adjusts a player’s raw plus-minus for the quality of their teammates and opponents, the same methodology can be used to calibrate a player’s rebounding impact for the rebounding abilities of their teammates. An implementation of this idea is ridge regressed rebounding rate, which can be found on nbashotcharts.com. However, despite these adjustments, ridge regressed statistics remain relatively noisy in a single season. Consider the weak year-over-year correlation of ridge regressed offensive rebounding rate5R2 of 0.16. Minimum 500 minutes played.:

The YoY stability of ridge regressed defensive rebounding is even worse6R2 of 0.1 Min 500 MP:

While ridge regressed rebounding does pick up some signal over a single season, there remains too much noise to be of significant use. Utilizing a multi-season sample is possible, but doing so makes a statistic far less nimble in evaluating changes to player performance from season to season7Or from one part of the season to another.. Therefore it is preferable to utilize another approach, used in a litany of statistical plus-minus techniques, which is to model adjusted rebounding rate using performance statistics, in this NBA tracking data. This estimate can potentially be far more stable than ridge regressed rebounding rate, particularly in small samples. It also has the benefit of providing insight into which aspects of rebounding are most important, including those that don’t show up in the box score, such as boxing out. This statistic is called “Estimated Rebounding Impact” (ERI).


Estimated Rebounding Impact: How It’s Calculated

(This section is somewhat technical in nature. Feel free to skim it or skip to the Results section if that’s what you’re interested in)

Estimated Rebounding Impact was modeled on a dataset of ~180 players8The exact number varied slightly as different approaches were used for offensive and defensive rebounding with >= 4000 minutes from 2020-2023, with 3-year ridge regressed rebounding acting as the dependent variable9It appears that NBA.com/Second Spectrum dramatically changed their methodology for measuring box outs after the ’18-19 season, making it impossible to use a preferred 5-year sample. . Advanced rebounding and box out data for all 3 seasons were acquired from the NBA stats API using the hoopR package. Player data for all 3 seasons were grouped and scaled on a per-36 minute basis10Per possession data was not available..

For offensive rebounding, the model type that best balanced performance and interpretability was simple multiple linear regression. The initial regression model’s independent variables were contested offensive rebounds per 36 minutes, uncontested offensive rebounds per 36, offensive boxouts per 36, average offensive rebound distance, adjusted offensive rebound chance percentage11A measure of what percentage of offensive rebounds that a player had a chance to possess did they collect., as well as the player’s position. Of these variables, only contested and uncontested rebounds per 36, as well as the player’s position, were of significant predictive value. In the final model, position was replaced by player height, which performed almost equally well and is less variable from source to source. The offensive rebounding model demonstrated excellent performance in fitting 3-year ridge regressed rebounding rate, both in sample and out of sample testing.

Estimated Offensive Rebounding Impact Model

The resulting equation used to calculate Offensive Estimated Rebounding Impact is: oERI = (0.89 * contested O-Reb/36) + (2.18 * uncontested O-Reb/36) – (0.13 * Height) + 7.74. Interestingly, the coefficient for uncontested rebounding is significantly higher than for contested boards. This is both because uncontested offensive rebounds are slightly rarer12~42% of all Offensive Rebounds and because it is quite difficult to grab many of them. The sample leader in contested O-Reb/36 was Mitchell Robinson at 4.5, while Dwight Powell led the pack by grabbing 1.5 uncontested O-Reb/36. One more thing of note is that the coefficient of player height is negative, meaning that taller players are penalized relative to short ones. I tend to think of this as a measure of responsibility, with taller rebounding slackers, typically centers and power forwards, having a greater negative impact on their teams’ rebounding rate, relative to smaller guards.

Modeling defensive ERI was much more of a challenge than offensive rebounding impact for a number of reasons, most of which are connected to the relative ease of collecting a defensive rebound vs an offensive one1376% of rebounds were grabbed by the defense in ’22-23 . First of all, the range of ridge regressed rebounding rate values is significantly smaller for D-Reb (~5.6) vs O-Reb (~7.8). In other words, it is far more difficult to stand out as an impactful defensive rebounder than an offensive rebounder. Secondly, non-rebounding actions, like boxing out, are of greater importance on the defensive end than while offensive rebounding. It is far more effective to prevent an offensive player from rebounding the ball and allowing a teammate to grab it than the reverse, given that there are typically more defenders near the basket. Last of all, defensive roles have a massive impact on rebounding duties, regardless of height or position.

To mitigate these issues, I used 2 additional techniques that improved model performance, though at the cost of increased complexity. The first was to approximate defensive role using k-means clustering. Players in the sample were grouped into one of 8 clusters, based on their performance in 6 defensive statistics: steals, blocks, defensive rebounds, defensive boxouts, contested 3-pointers, and contested 2-pointers. Here is how each cluster’s players performed in these metrics:

Defensive Clusters

The players with the highest ridge regressed defensive rebounding rates were typically part of cluster 6 and 7, while the lowest RR D-Reb rates were typically from players in cluster 8 and 1. While not all the clusters made a statistically significant difference in multiple regression, overall model performance was improved with the addition of the player clusters, beyond rebounding and box out statistics. The final defensive ERI linear model was as follows:

Estimated Defensive Rebounding Impact Model

In contrast to oERI, defensive rebounding impact does not seem to be affected by uncontested rebounds. Also, boxing out is critical to impacting the defensive glass, unlike the offensive boards.

Because the interactions between role and defensive rebounding statistics seem to be relatively complex, I decided to build a second dERI model, using XGBoost, a popular decision tree algorithm. The statistical inputs were identical to the previous models, with height and defensive clusters also included. After optimizing the model’s parameters, the root mean squared error (RMSE) of the test data14A randomly selected 20% of the total sample. was 0.81 compared with a standard deviation of 0.97. The model’s importance matrix shows which variables were most critical in estimating ridge regressed D-Reb rate:

dERI XGBoost Model Importance Matrix

As with the linear defensive Estimated Rebounding Impact model, boxouts and contested rebounds were the most important variables. While the importance matrix shows the value of each variable in constructing the decision tree, it doesn’t show the directionality of each variable’s impact. Overall, boxouts/36 and contested rebounds/36 had a strong positive relationship with dERI, uncontested rebounds/36 a moderate positive relationship, adjusted D-Reb Chance% a small positive one, and average D-Reb distance a moderate negative one15Meaning that more impactful rebounders typically grab rebounds closer to the basket..

Overall, both the linear and XGBoost models perform relatively well16Though worse than the offensive rebounding model in sample and out of sample, though the XGBoost model does slightly better. However, as the difference in performance isn’t large and their prediction “styles” vary somewhat17The XGBoost model tends to be a bit more conservative., both are averaged to create dERI.

To summarize, a single linear model utilizing contested O-Reb/36, uncontested O-Reb/36 and height is used for offensive Estimated Rebounding Impact, while the average of 2 models, a linear and XGBoost model, both using defensive role clusters, is used for dERI.


Estimated Rebounding Impact: Results

The primary objective of ERI was to create a rebounding value metric that is stable in just a single season, unlike ridge regressed rebounding rate. How does ERI fair in accomplishing this goal? Pretty darn well. First let’s take a look at offensive ERV. Using the identical benchmark of ’21-22 vs ’22-23 year over year correlation18Minimum 500 Minutes Played, oERI boasts a coefficient of determination (R2) of 0.77 (compared with 0.16 for single year ridge regression):

Not only is oERI stable from year to year, so are both versions of dERI. While ridge regressed defensive rebounding rate has a YoY R2 of just 0.1, the linear and XGBoost dERI models boast R2 of 0.6 and 0.64 respectively:

Perhaps even more impressingly, oERI and dERI remain very stable YoY even for players who play sparingly. For players with between 150 and 1000 minutes played in both seasons19A 52 player sample, oERI retained a robust 0.58 R2, while dERI (XGBoost) had a 0.5 R2 and dERI (linear model) an R-squared of 0.4. Here’s a look at the 2022-2023 leaderboard20Minimum 1000 Minutes Played:

The most valuable rebounder in the ’22-23 season per ERI was Steven Adams, a player known for his tremendous strength and toughness. Adams grabbed a remarkable 5.1 offensive rebounds per game this past season, the best mark in the NBA by over 1/2 a rebound21Though he technically did not qualify for leaderboards as he missed 40 games.. Adams is also first in 3-year ridge regressed rebound rate. Indeed, most of the ’22-23 ERI leaders with significant minutes the past 3 years are present near the top of the 3-year leaderboard. This is good anecdotal support that ERI is capturing a player’s rebounding impact, just with a much smaller necessary performance sample.

Another thing to take note of is the magnitude of the effect that top rebounders can have. A team with Steven Adams on the floor is likely to grab an additional 5.8 offensive rebounds per 100 chances and another 0.7 defensive boards. Furthermore, looking at 3-year ridge regressed rebound rate22Which ERI is attempting to map onto., Adams’s rebounding value is near 7% and 20 players are expected to add over 4% of total rebounding rate. These figures seem to be a bit larger than those I found when analyzing Rodman and other top rebounders. The reason for this is simple. When Rodman, Adams, or another top rebounder is removed from the game, another good rebounder typically replaces them. As a result, an elite rebounder’s On/Off rebounding rate slightly undersells their value23Versus being replaced by an average player. In contrast, ridge regression and by extension ERI adjust for this fact by design and therefore credit a top rebounder for their full potential On/Off value vs an average NBA player. For this reason, a top rebounder’s On/Off results will typically trail their ERI by 1-1.5%.

Lastly, it is interesting to note that dERI is impressed by Andre Drummond’s defensive rebounding24At least in recent seasons, despite his tiny On/Off D-Reb splits throughout his career. As I described, defensive ERV attempts to make adjustments for role and boxouts, but Drummond still manages to fair quite well. The Drummond defensive rebounding mystery lives on.


Estimated Rebounding Impact leaderboards for recent seasons can be found here on the metrics page. Much of the code and data used to create ERI can be found on Github. The next and likely final article in the rebounding series will be on offensive rebounding value and strategy.

Subscribe To be Notified When I Post


Leave a Reply

Discover more from A Jump Shot in the Dark

Subscribe now to keep reading and get access to the full archive.

Continue reading