What Statistics Can Tell Us About NBA Coaches
Who gets hired as an NBA coach? How long does a typical coach last? And does their coaching background play any part in predicting success?
This analysis was inspired by several key theories. First, there has been a common criticism among casual NBA fans that teams overly prefer hiring candidates with previous NBA head coaches experience.
Consequently, this analysis aims to answer two related questions. First, is it true that NBA teams frequently re-hire candidates with previous head coaching experience? And second, is there any evidence that these candidates under-perform relative to other candidates?
The second theory is that internal candidatesare often more successful than external candidates. This theory was derived from a pair of anecdotes. Two of the most successful coaches in NBA history, Gregg Popovich of San Antonio and Erik Spoelstra of Miami, were both internal hires. However, rigorous quantitative evidence is needed to test if this relationship holds over a larger sample.
This analysis aims to explore these questions, and provide the code to reproduce the analysis in Python.
The Data
The codeand dataset for this project are available on Github here. The analysis was performed using Python in Google Colaboratory.
A prerequisite to this analysis was determining a way to measure coaching success quantitatively. I decided on a simple idea: the success of a coach would be best measured by the length of their tenure in that job. Tenure best represents the differing expectations that might be placed on a coach. A coach hired to a contending team would be expected to win games and generate deep playoff runs. A coach hired to a rebuilding team might be judged on the development of younger players and their ability to build a strong culture. If a coach meets expectations, the team will keep them around.
Since there was no existing dataset with all of the required data, I collected the data myself from Wikipedia. I recorded every off-season coaching change from 1990 through 2021. Since the primary outcome variable is tenure, in-season coaching changes were excluded since these coaches often carried an “interim” tag—meaning they were intended to be temporary until a permanent replacement could be found.
In addition, the following variables were collected:
VariableDefinitionTeamThe NBA team the coach was hired forYearThe year the coach was hiredCoachThe name of the coachInternal?An indicator if the coach was internal or not—meaning they worked for the organization in some capacity immediately prior to being hired as head coachTypeThe background of the coach. Categories are Previous HC, Previous AC, College, Player, Management, and Foreign.YearsThe number of years a coach was employed in the role. For coaches fired mid-season, the value was counted as 0.5.
First, the dataset is imported from its location in Google Drive. I also convert ‘Internal?’ into a dummy variable, replacing “Yes” with 1 and “No” with 0.
from google.colab import drive
drive.mountimport pandas as pd
pd.set_option#Bring in the dataset
coach = pd.read_csv.iloccoach= coach.map)
coach
This prints a preview of what the dataset looks like:
In total, the dataset contains 221 coaching hires over this time.
Descriptive Statistics
First, basic summary Statistics are calculated and visualized to determine the backgrounds of NBA head coaches.
#Create chart of coaching background
import matplotlib.pyplot as plt
#Count number of coaches per category
counts = coach.value_counts#Create chart
plt.barplt.titleplt.figtextplt.xticksplt.ylabelplt.gca.spines.set_visibleplt.gca.spines.set_visiblefor i, value in enumerate:
plt.text)*100,1)) + '%' + '+ ')', ha='center', fontsize=9)
plt.savefigprint.sum/len)*100,1)) + " percent of coaches are internal.")
Over half of coaching hires previously served as an NBA head coach, and nearly 90% had NBA coaching experience of some kind. This answers the first question posed—NBA teams show a strong preference for experienced head coaches. If you get hired once as an NBA coach, your odds of being hired again are much higher. Additionally, 13.6% of hires are internal, confirming that teams do not frequently hire from their own ranks.
Second, I will explore the typical tenure of an NBA head coach. This can be visualized using a histogram.
#Create histogram
plt.histplt.titleplt.figtextplt.annotate', xy=, xytext=,
arrowprops=dict, fontsize=9, color='black')
plt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.savefigplt.showcoach.sort_values#Calculate some stats with the data
import numpy as np
print) + " years is the median coaching tenure length.")
print.sum/len)*100,1)) + " percent of coaches last five years or less.")
print.sum/len*100,1)) + " percent of coaches last a year or less.")
Using tenure as an indicator of success, the the data clearly shows that the large majority of coaches are unsuccessful. The median tenure is just 2.5 seasons. 18.1% of coaches last a single season or less, and barely 10% of coaches last more than 5 seasons.
This can also be viewed as a survival analysis plot to see the drop-off at various points in time:
#Survival analysis
import matplotlib.ticker as mtick
lst = np.arangesurv = pd.DataFramesurv= np.nan
for i in range):
surv.iloc=.sum/lenplt.stepplt.titleplt.xlabel')
plt.figtextplt.gca.yaxis.set_major_formatter)
plt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.savefigplt.show
Lastly, a box plot can be generated to see if there are any obvious differences in tenure based on coaching type. Boxplots also display outliers for each group.
#Create a boxplot
import seaborn as sns
sns.boxplotplt.titleplt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.xlabelplt.xticksplt.figtextplt.savefigplt.show
There are some differences between the groups. Aside from management hires, previous head coaches have the longest average tenure at 3.3 years. However, since many of the groups have small sample sizes, we need to use more advanced techniques to test if the differences are statistically significant.
Statistical Analysis
First, to test if either Type or Internal has a statistically significant difference among the group means, we can use ANOVA:
#ANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols
am = ols+ C', data=coach).fitanova_table = sm.stats.anova_lmprintThe results show high p-values and low F-stats—indicating no evidence of statistically significant difference in means. Thus, the initial conclusion is that there is no evidence NBA teams are under-valuing internal candidates or over-valuing previous head coaching experience as initially hypothesized.
However, there is a possible distortion when comparing group averages. NBA coaches are signed to contracts that typically run between three and five years. Teams typically have to pay out the remainder of the contract even if coaches are dismissed early for poor performance. A coach that lasts two years may be no worse than one that lasts three or four years—the difference could simply be attributable to the length and terms of the initial contract, which is in turn impacted by the desirability of the coach in the job market. Since coaches with prior experience are highly coveted, they may use that leverage to negotiate longer contracts and/or higher salaries, both of which could deter teams from terminating their employment too early.
To account for this possibility, the outcome can be treated as binary rather than continuous. If a coach lasted more than 5 seasons, it is highly likely they completed at least their initial contract term and the team chose to extend or re-sign them. These coaches will be treated as successes, with those having a tenure of five years or less categorized as unsuccessful. To run this analysis, all coaching hires from 2020 and 2021 must be excluded, since they have not yet been able to eclipse 5 seasons.
With a binary dependent variable, a logistic regression can be used to test if any of the variables predict coaching success. Internal and Type are both converted to dummy variables. Since previous head coaches represent the most common coaching hires, I set this as the “reference” category against which the others will be measured against. Additionally, the dataset contains just one foreign-hired coachso this observation is dropped from the analysis.
#Logistic regression
coach3 = coach<2020]
coach3.loc= np.wherecoach_type_dummies = pd.get_dummies.astypecoach_type_dummies.dropcoach3 = pd.concat#Drop foreign category / David Blatt since n = 1
coach3 = coach3.dropcoach3 = coach3.loc!= "David Blatt"]
print)
x = coach3]
x = sm.add_constanty = coach3logm = sm.Logitlogm.r = logm.fitprint)
#Convert coefficients to odds ratio
print) + "is the odds ratio for internal.") #Internal coefficient
print) #Management
print) #Player
print) #Previous AC
print) #College
Consistent with ANOVA results, none of the variables are statistically significant under any conventional threshold. However, closer examination of the coefficients tells an interesting story.
The beta coefficients represent the change in the log-odds of the outcome. Since this is unintuitive to interpret, the coefficients can be converted to an Odds Ratio as follows:
Internal has an odds ratio of 0.23—indicating that internal candidates are 77% less likely to be successful compared to external candidates. Management has an odds ratio of 2.725, indicating these candidates are 172.5% more likely to be successful. The odds ratios for players is effectively zero, 0.696 for previous assistant coaches, and 0.5 for college coaches. Since three out of four coaching type dummy variables have an odds ratio under one, this indicates that only management hires were more likely to be successful than previous head coaches.
From a practical standpoint, these are large effect sizes. So why are the variables statistically insignificant?
The cause is a limited sample size of successful coaches. Out of 202 coaches remaining in the sample, just 23were successful. Regardless of the coach’s background, odds are low they last more than a few seasons. If we look at the one category able to outperform previous head coachesspecifically:
# Filter to management
manage = coach3== 1]
print)
printThe filtered dataset contains just 6 hires—of which just oneis classified as a success. In other words, the entire effect was driven by a single successful observation. Thus, it would take a considerably larger sample size to be confident if differences exist.
With a p-value of 0.202, the Internal variable comes the closest to statistical significance. Notably, however, the direction of the effect is actually the opposite of what was hypothesized—internal hires are less likely to be successful than external hires. Out of 26 internal hires, just onemet the criteria for success.
Conclusion
In conclusion, this analysis was able to draw several key conclusions:
Regardless of background, being an NBA coach is typically a short-lived job. It’s rare for a coach to last more than a few seasons.
The common wisdom that NBA teams strongly prefer to hire previous head coaches holds true. More than half of hires already had NBA head coaching experience.
If teams don’t hire an experienced head coach, they’re likely to hire an NBA assistant coach. Hires outside of these two categories are especially uncommon.
Though they are frequently hired, there is no evidence to suggest NBA teams overly prioritize previous head coaches. To the contrary, previous head coaches stay in the job longer on average and are more likely to outlast their initial contract term—though neither of these differences are statistically significant.
Despite high-profile anecdotes, there is no evidence to suggest that internal hires are more successful than external hires either.
Note: All images were created by the author unless otherwise credited.
The post What Statistics Can Tell Us About NBA Coaches appeared first on Towards Data Science.
#what #statistics #can #tell #about
What Statistics Can Tell Us About NBA Coaches
Who gets hired as an NBA coach? How long does a typical coach last? And does their coaching background play any part in predicting success?
This analysis was inspired by several key theories. First, there has been a common criticism among casual NBA fans that teams overly prefer hiring candidates with previous NBA head coaches experience.
Consequently, this analysis aims to answer two related questions. First, is it true that NBA teams frequently re-hire candidates with previous head coaching experience? And second, is there any evidence that these candidates under-perform relative to other candidates?
The second theory is that internal candidatesare often more successful than external candidates. This theory was derived from a pair of anecdotes. Two of the most successful coaches in NBA history, Gregg Popovich of San Antonio and Erik Spoelstra of Miami, were both internal hires. However, rigorous quantitative evidence is needed to test if this relationship holds over a larger sample.
This analysis aims to explore these questions, and provide the code to reproduce the analysis in Python.
The Data
The codeand dataset for this project are available on Github here. The analysis was performed using Python in Google Colaboratory.
A prerequisite to this analysis was determining a way to measure coaching success quantitatively. I decided on a simple idea: the success of a coach would be best measured by the length of their tenure in that job. Tenure best represents the differing expectations that might be placed on a coach. A coach hired to a contending team would be expected to win games and generate deep playoff runs. A coach hired to a rebuilding team might be judged on the development of younger players and their ability to build a strong culture. If a coach meets expectations, the team will keep them around.
Since there was no existing dataset with all of the required data, I collected the data myself from Wikipedia. I recorded every off-season coaching change from 1990 through 2021. Since the primary outcome variable is tenure, in-season coaching changes were excluded since these coaches often carried an “interim” tag—meaning they were intended to be temporary until a permanent replacement could be found.
In addition, the following variables were collected:
VariableDefinitionTeamThe NBA team the coach was hired forYearThe year the coach was hiredCoachThe name of the coachInternal?An indicator if the coach was internal or not—meaning they worked for the organization in some capacity immediately prior to being hired as head coachTypeThe background of the coach. Categories are Previous HC, Previous AC, College, Player, Management, and Foreign.YearsThe number of years a coach was employed in the role. For coaches fired mid-season, the value was counted as 0.5.
First, the dataset is imported from its location in Google Drive. I also convert ‘Internal?’ into a dummy variable, replacing “Yes” with 1 and “No” with 0.
from google.colab import drive
drive.mountimport pandas as pd
pd.set_option#Bring in the dataset
coach = pd.read_csv.iloccoach= coach.map)
coach
This prints a preview of what the dataset looks like:
In total, the dataset contains 221 coaching hires over this time.
Descriptive Statistics
First, basic summary Statistics are calculated and visualized to determine the backgrounds of NBA head coaches.
#Create chart of coaching background
import matplotlib.pyplot as plt
#Count number of coaches per category
counts = coach.value_counts#Create chart
plt.barplt.titleplt.figtextplt.xticksplt.ylabelplt.gca.spines.set_visibleplt.gca.spines.set_visiblefor i, value in enumerate:
plt.text)*100,1)) + '%' + '+ ')', ha='center', fontsize=9)
plt.savefigprint.sum/len)*100,1)) + " percent of coaches are internal.")
Over half of coaching hires previously served as an NBA head coach, and nearly 90% had NBA coaching experience of some kind. This answers the first question posed—NBA teams show a strong preference for experienced head coaches. If you get hired once as an NBA coach, your odds of being hired again are much higher. Additionally, 13.6% of hires are internal, confirming that teams do not frequently hire from their own ranks.
Second, I will explore the typical tenure of an NBA head coach. This can be visualized using a histogram.
#Create histogram
plt.histplt.titleplt.figtextplt.annotate', xy=, xytext=,
arrowprops=dict, fontsize=9, color='black')
plt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.savefigplt.showcoach.sort_values#Calculate some stats with the data
import numpy as np
print) + " years is the median coaching tenure length.")
print.sum/len)*100,1)) + " percent of coaches last five years or less.")
print.sum/len*100,1)) + " percent of coaches last a year or less.")
Using tenure as an indicator of success, the the data clearly shows that the large majority of coaches are unsuccessful. The median tenure is just 2.5 seasons. 18.1% of coaches last a single season or less, and barely 10% of coaches last more than 5 seasons.
This can also be viewed as a survival analysis plot to see the drop-off at various points in time:
#Survival analysis
import matplotlib.ticker as mtick
lst = np.arangesurv = pd.DataFramesurv= np.nan
for i in range):
surv.iloc=.sum/lenplt.stepplt.titleplt.xlabel')
plt.figtextplt.gca.yaxis.set_major_formatter)
plt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.savefigplt.show
Lastly, a box plot can be generated to see if there are any obvious differences in tenure based on coaching type. Boxplots also display outliers for each group.
#Create a boxplot
import seaborn as sns
sns.boxplotplt.titleplt.gca.spines.set_visibleplt.gca.spines.set_visibleplt.xlabelplt.xticksplt.figtextplt.savefigplt.show
There are some differences between the groups. Aside from management hires, previous head coaches have the longest average tenure at 3.3 years. However, since many of the groups have small sample sizes, we need to use more advanced techniques to test if the differences are statistically significant.
Statistical Analysis
First, to test if either Type or Internal has a statistically significant difference among the group means, we can use ANOVA:
#ANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols
am = ols+ C', data=coach).fitanova_table = sm.stats.anova_lmprintThe results show high p-values and low F-stats—indicating no evidence of statistically significant difference in means. Thus, the initial conclusion is that there is no evidence NBA teams are under-valuing internal candidates or over-valuing previous head coaching experience as initially hypothesized.
However, there is a possible distortion when comparing group averages. NBA coaches are signed to contracts that typically run between three and five years. Teams typically have to pay out the remainder of the contract even if coaches are dismissed early for poor performance. A coach that lasts two years may be no worse than one that lasts three or four years—the difference could simply be attributable to the length and terms of the initial contract, which is in turn impacted by the desirability of the coach in the job market. Since coaches with prior experience are highly coveted, they may use that leverage to negotiate longer contracts and/or higher salaries, both of which could deter teams from terminating their employment too early.
To account for this possibility, the outcome can be treated as binary rather than continuous. If a coach lasted more than 5 seasons, it is highly likely they completed at least their initial contract term and the team chose to extend or re-sign them. These coaches will be treated as successes, with those having a tenure of five years or less categorized as unsuccessful. To run this analysis, all coaching hires from 2020 and 2021 must be excluded, since they have not yet been able to eclipse 5 seasons.
With a binary dependent variable, a logistic regression can be used to test if any of the variables predict coaching success. Internal and Type are both converted to dummy variables. Since previous head coaches represent the most common coaching hires, I set this as the “reference” category against which the others will be measured against. Additionally, the dataset contains just one foreign-hired coachso this observation is dropped from the analysis.
#Logistic regression
coach3 = coach<2020]
coach3.loc= np.wherecoach_type_dummies = pd.get_dummies.astypecoach_type_dummies.dropcoach3 = pd.concat#Drop foreign category / David Blatt since n = 1
coach3 = coach3.dropcoach3 = coach3.loc!= "David Blatt"]
print)
x = coach3]
x = sm.add_constanty = coach3logm = sm.Logitlogm.r = logm.fitprint)
#Convert coefficients to odds ratio
print) + "is the odds ratio for internal.") #Internal coefficient
print) #Management
print) #Player
print) #Previous AC
print) #College
Consistent with ANOVA results, none of the variables are statistically significant under any conventional threshold. However, closer examination of the coefficients tells an interesting story.
The beta coefficients represent the change in the log-odds of the outcome. Since this is unintuitive to interpret, the coefficients can be converted to an Odds Ratio as follows:
Internal has an odds ratio of 0.23—indicating that internal candidates are 77% less likely to be successful compared to external candidates. Management has an odds ratio of 2.725, indicating these candidates are 172.5% more likely to be successful. The odds ratios for players is effectively zero, 0.696 for previous assistant coaches, and 0.5 for college coaches. Since three out of four coaching type dummy variables have an odds ratio under one, this indicates that only management hires were more likely to be successful than previous head coaches.
From a practical standpoint, these are large effect sizes. So why are the variables statistically insignificant?
The cause is a limited sample size of successful coaches. Out of 202 coaches remaining in the sample, just 23were successful. Regardless of the coach’s background, odds are low they last more than a few seasons. If we look at the one category able to outperform previous head coachesspecifically:
# Filter to management
manage = coach3== 1]
print)
printThe filtered dataset contains just 6 hires—of which just oneis classified as a success. In other words, the entire effect was driven by a single successful observation. Thus, it would take a considerably larger sample size to be confident if differences exist.
With a p-value of 0.202, the Internal variable comes the closest to statistical significance. Notably, however, the direction of the effect is actually the opposite of what was hypothesized—internal hires are less likely to be successful than external hires. Out of 26 internal hires, just onemet the criteria for success.
Conclusion
In conclusion, this analysis was able to draw several key conclusions:
Regardless of background, being an NBA coach is typically a short-lived job. It’s rare for a coach to last more than a few seasons.
The common wisdom that NBA teams strongly prefer to hire previous head coaches holds true. More than half of hires already had NBA head coaching experience.
If teams don’t hire an experienced head coach, they’re likely to hire an NBA assistant coach. Hires outside of these two categories are especially uncommon.
Though they are frequently hired, there is no evidence to suggest NBA teams overly prioritize previous head coaches. To the contrary, previous head coaches stay in the job longer on average and are more likely to outlast their initial contract term—though neither of these differences are statistically significant.
Despite high-profile anecdotes, there is no evidence to suggest that internal hires are more successful than external hires either.
Note: All images were created by the author unless otherwise credited.
The post What Statistics Can Tell Us About NBA Coaches appeared first on Towards Data Science.
#what #statistics #can #tell #about
·35 Views