In Part 3, I’ll present some results and go over what can be done to expand on the idea.
Results and Findings
There were no network characteristics that differed among winning teams in a statistically significant way (threshold p-value of .05). A summary of the relevant p-values is provided below in Table 2. Percent of successful power plays and penalty kills are also included which are well known to be associated with team success. In analysis, winning teams, defined as Conference Champions, were set aside to compare against the rest of the league. For statistical analyses, all of these results were combined over the ten year period.
Possibly other relationships would be found if we measure success on a scale more granular than winning championships or points. There are team statistics available now that are much more advanced than data that was used for this time period in the 2000s. There is a good chance that the network characteristics here would correlate well with those modern metrics. Those modern metrics correlate well with success so that type of analysis would add that intermediate step rather than look directly for a relationship between network centrality and championships. In the correlation matrix in Figure 15, there are indeed interesting correlations of this kind that are worth exploring. For example, network characteristics are moderately well correlated with Power Play and Penalty Kill success rates which in turn are related to Points % and Team Ranking.
PageRank had a p-value of .216 which is not statistically significant but it’s boxplot showed that winning teams generally have lower PageRanks. This is confirmed by a regression plot shown in Figure 16 below. There is a negative relationship between PageRank and Points % where a higher Points % is desirable. This indicates that a team might want to have a lower PageRank.
Also interesting in Figure 16 on the right is the cluster of teams in the bottom right corner when PageRank was plotted against Pts %. Each team is colored according to its geographic preference in trading partners. This plot shows clearly that teams with no geographic preference tend to outperform teams that are biased.
Local Reaching Centrality had a p-value on the lower end of the range so it’s worth exploring further. Figure 17 shows a boxplot where we see that the interquartile range, median, and mean of the Championship teams are lower than that of all other teams. It also shows the minimum and maximum values of both groups are similar.
Local Reaching was also fairly well correlated with Power Play %, Penalty Kill %, and Average Age. All three of those relate well to success so secondary or tertiary network effects are worth investigating. Figure 18 shows Reaching Centrality plotted against Points %. This plot indicates that a successful team would want to have a lower Reach. This confirms what was seen with other metrics.
Net Players In/Out
Net Players In/Out was the attribute that was closest to being statistically significant with a p-value of 0.07. The boxplot in Figure 19 confirms this, showing all summary statistics lower for the Champion teams. This is confirmed again when we look at the strategies of winners. Selling is far more popular among winners which is going to drag down the average.
This preference for selling among winners is really counterintuitive and goes against the narrative about transactions at the Trade Deadline each year. Typically, teams who think they have a chance at winning the Stanley Cup will trade away prospects or draft picks in exchange for fully developed players who can contribute immediately to their effort.
In Figure 20 we also see the relationship. This analysis suggests a successful team would want to have Net Players near zero or even potentially be a seller. One outlier on the championship side had close to 15 players net out which is quite a turnover for a championship team.
One third of all championship teams were classified as National Sellers out of 12 possible classifications. Unbiased Sellers account for another 3 of the 18 championship teams. There were only 3 buyers in the championship group.
This is an interesting result especially given that some plots seem to show that being unbiased, especially with respect to geographic preference, is the best strategy. This is apparently just the case for the regular season. National Sellers and Unbiased Sellers also seem to outperform in Power Play %, which is a metric correlated with success. This is demonstrated as a swarm plot in Figure 21 where we see National Seller outperforming National Buyer with a cluster above 20% which is considered a strong Power Play %.
Figure 22 below shows two plots of strategies and their prevalence in the league. A quick comparison of these two plots shows most noticeably that National Buyers are very underrepresented in the Championship group. What’s clear from looking at the lower plot is that the majority of teams transact on a national level. It’s also important to keep in mind that there are four possible designations for geographic preference and that 0 teams through 10 seasons preferred to transact locally (within the same division). This doesn’t mean that there was no trading between divisional rivals. It just means that no teams traded in such a way that the preference threshold was met as laid out in Section 3B17.
|Figure 23. Violin plot showing Grand Strategy and Regular Season Points %|
|Figure 24. Violin plot showing Grand Strategy and Power Play success rates.|
Figure 23 is a violin plot which shows strategies plotted against Points %. There is not an obvious trend and all strategies look to be of about equal effectiveness when measured by Points %. Figure 24 is strategy again this time plotted against Power Play %. Here there is more of a trend that might be interesting to investigate further. What’s noticeable is that National Buyer is once again underperforming. If there was no advantage of one strategy over another, all of the plots would be nearly identical.
Finally, Figure 25 displays a barplot where we can see a side by side comparison of the expected number of Championships for each strategy compared with the actual number of Championships. The expected number of Championships is simply the proportion of each strategy throughout the entire league multiplied by the total number of Championships in the sample (18).
|Figure 25. The Final Result. We see National Seller and Balanced exceeding expectations while National Buyer only wins 50% of the Championships that is expected.|
The results of the analysis are encouraging and merit further research. It’s unclear that there are strong correlations between network metrics and success in the regular season, but there are indications that combinations of metrics will be related to secondary measures of success. This is best exemplified here by the correlation between network metrics and NHL special teams statistics. Additionally, it’s clear that not all strategies are equally effective. Figure 25 shows that some strategies outperform or underperform based on expectations.
The results here confirm what was found in other studies done in other sports and contexts. For example, it was found that the best strategy for the English Premier League is to have a small tight-knit network of trading partners and to minimize centrality. This seems to be precisely what’s reflected here. It was also shown that successful soccer franchises will transact globally but maintain relatively low turnover in the roster. At a minimum, the results from the related papers combined with the results here show that there are indeed optimal strategies at all. Since these strategies are always defined by some network characteristics, it follows that network analysis can be a tool used to help optimize a trading strategy for a professional sports organization.
It’s not immediately clear from this study that network analysis would be useful in a machine learning model to predict the outcome of games or matches. Vaz de Melo et al. did have success in predicting winning teams over the course of a season. I don’t doubt that this analysis could eventually yield similar results, but I’m unsure about the usefulness of this application. Sports betting futures markets are considerably smaller than events or propositions, and so while a model of this kind would be useful and potentially profitable, it wouldn’t be nearly as powerful as one that could be used every day.
Conclusion and Further Work
As noted throughout the paper, there are several factors that limit this study. Some of the most salient are listed below.
- The time period is limited to 2000-2010.
- Player acquisitions by means other than trade are not considered.
- Player details (position, skill, experience) are not considered.
Despite these limitations, the study was successful in showing that not all strategies are equal. This confirms what was found in other similar studies and justifies further exploration of the topic. In my view, the study failed to generate any useful features for a predictive machine learning model. The most promising use of the analysis seems to be for helping an organization strategize. Any improvements in the analysis toward the end of organizational strategy will almost certainly improve the prospects of a sports betting use case as well. Going forward, the focus will be on team strategy with the hope of building an event prediction model along the way. A few ideas to expand on the organizational/strategic use are below:
- Increase the sample size by including more seasons
- Consider player movement via free agency, waivers, and two-way affiliate movement
- Include player details such as those described above
- Include analysis of any lasting impacts from previous transactions. If a team acquires a new player, does his impact on his new team change if he is coming from a team of high centrality vs one of low centrality?
- Examine other leagues such as MLB, NBA, and MLS
- Include modern sports metrics and team statistics such as Fenwick, Corsi, and Possession Quality