The Impact on Team Performance of Participation in Transaction Networks

 

 

David Van Anda

Indiana University at Bloomington, Luddy School of Engineering

Summer 2021 DSCI-D590  with Dr. YY Ahn

dvananda@iu.edu

 

Abstract

 

Throughout the sports world some organizations have a reputation for growing and grooming players from within. In soccer, this can be through youth academies. In the MLB or NHL, this could be through minor league affiliations. In the NFL or NBA this could be through NCAA pipelines. Is this an effective strategy? It seems in conflict with the typical flurry of activity that one might see on trade/transfer deadline days as teams wrestle with each other making adjustments before the playoffs or cup play. This project consists of a series of visualizations that show the National Hockey League(NHL) as a transaction network. I plot network characteristics of teams against team performance metrics with the hope of finding a relationship. I also classify each team as having one of twelve possible strategies and look for patterns here. This is a novel approach to examining transactions in an American sports league and builds upon work previously done with European soccer teams. Generally, the result here is that teams that are less connected and less involved in the network have better performance results. None of the metrics generated statistically significant p-values, however there were trends that are worth exploring further.

This project is supported by an interactive network which is located at https://d141.github.io. The thickness of the edges that are drawn correspond to the weight of the link. Click on a node to see features about that team from the selected season. Recommended browsers (Safari and Firefox)

 

1. Introduction

 

 Considerable work has been done on the performance of individuals following a trade. Some work has also been done on team performance but primarily focuses on European football transfer networks.[1,2,3]   There is negligible work done on team performance dynamics of American leagues.[4] The question of team performance in America is more interesting because the leagues are closed. This will lead to more interesting dynamics as players are much more likely to play against former teams and teammates. It’s also quite common for groups of players to be traded together and so they can work together against their former team.

 Definitive results from this study would be useful in a number of different ways. The first and most obvious is that teams could use the information to trade more strategically and in a way that will generate better team performance metrics. Team performance metrics are very closely related to measures of success such as Championships, League Ranking, and Win Percentage, so the motivation in this case is obvious. Another application would be in the sports betting industry. Sports betting is a $203 billion industry that employs close to 200,000 people worldwide.[5] There is significant interest in creating methods and strategies that can more accurately predict the outcome of events. There is obvious interest among bettors but network analyses would be even more helpful to the sports books who have the advantage of being the party establishing the betting line.

 With a detailed enough dataset, the network analysis could be granularized and extended to include individual players. Examining individual players and their background has been done in European soccer leagues but has not been done in any American sports leagues.[6] European soccer networks are much more complex than American sports networks because of how the leagues are structured and organized. The first difference is that European soccer leagues have a hierarchical structure within each national system. Leagues within these structures promote and relegate teams every year which means that the teams within each league are constantly changing. It’s possible for two teams in different leagues to conduct a transaction and then be in the same league during a subsequent season. Of course the opposite is true as well. The other major difference is that there is competition between national systems. The most notable systems are in England, Germany, Italy, Spain, and France. The quality of teams in each of these systems is roughly the same when considered in the context of the game globally. This means that teams frequently transact with counter-parties in other countries. The odds of later playing against a foreign counterpart are quite low and would be limited to pre-season friendlies or competitions in a continental tournaments. This severely limits the possibility of a player or group of players competing against a former team, which is a very interesting dynamic to study.

 American sports leagues do not promote or relegate between flights and have little competition with leagues in other countries. The nearest league to an exception would be the KHL which is the professional hockey league in Russia. The quality of play is quite good, but there are very few transactions conducted between the NHL and KHL. There have been a total of three KHL vs. NHL games played in the last 30 years and none since 2010. So these leagues exist in near isolation. This means that the teams within these isolated leagues are much more connected than they would be otherwise. For example, in the NHL all teams are guaranteed to play against every other team at least twice per season. This means that any player who is traded or relocated for any reason within the NHL is guaranteed to play against their former team in the near future. This introduces many more opportunities for network effects and makes for a more interesting study, at least in establishing a baseline analysis that could later be extended to the more complex European networks.

 The conclusion in several papers on individual performance is that being traded has a positive effect on a player. The examination of individual players in this study is confounded by, among other things, the inability to examine the quality of team that players were traded from and to. The author attributes the improved performance of traded players to increased motivation (due to one or several factors).[7]

    The same phenomenon was observed in MLB players who showed increased batting percentages following a trade.[8] Again, the phenomenon repeated itself in the NBA.[9] Players that were traded mid-season had significantly elevated performance when playing against the former team in their former arena.

 With respect to viewing sports leagues as transaction networks, there is existing work but it primarily focuses on European football. Within these papers, there are few compelling visualizations. The papers reached interesting conclusions and provide a great starting point in thinking about this topic.

 The conclusion reached by Rossetti and Caproni is that in soccer competitions, it’s better to trade globally, recruit locally, and minimize turnover by being less active in the transaction network.[10] With respect to minimizing turnover, this is consistent with what was found in the NHL.

   The sole paper analyzing American sports leagues as networks or complex systems came from a group in Brazil who looked at the NBA and tried to predict team success by examining network characteristics.[11] The paper is quite interesting and they did have some success predicting success with the characteristics. Their work uses the data of individual players in order to evaluate transactions and then they predict performance in the following season. They use some basic features that are similar to what are used here such as number of nodes and measures of centrality. They also engineer new features such as roster volatility and experience metrics.

 

1.1. Contribution

 

 The contribution of this study will be to identify trading strategies that outperform or underperform what would be expected if there was no difference between strategy effectiveness. This will be achieved first by constructing the NHL as a transaction network and then by establishing a series of metrics and characteristics that can be combined and evaluated as a strategy.

 A successful analysis will yield results that will be informative to the management of transacting sports teams. The analysis can be informative in either the positive or negative. If a particular strategy obviously outperforms, then this strategy should be considered. Alternatively, if a strategy dramatically underperforms, this strategy should be avoided. The results will also be useful in sports betting and prediction markets on either side of the action. Results that can be used in predictive machine learning models will be useful to sportsbooks who set lines. Models of this type would also be useful to bettors who can use these features to exploit inefficiencies in the markets made by bookmakers.

 

2. Data and Methods

 

2.1. Data

 

The raw data is sourced from two websites:

 

    Hockeyreference.com was used for the team performance data and prosportstransactions.com was used for the transaction data. The data from hockeyreference.com was generally quite clean and required few transformations. The performance data consists of 32 columns. Two of those columns were engineered for this project (“Conference Champion” and “League Champion”). Hockeyreference.com provided options for downloading the relevant tables of statistics. Transaction data however was scraped using beautifulsoup.

    Since the transaction data is original and was procured specifically for this project, it required significantly more cleaning and transforming. The original version contained more than 85,000 rows with transactions dating back to 1908.

    In order to turn this into useful information for network analysis, the first step was to extract from each transaction ‘Team A’ and ‘Team B’. These correspond to two nodes where the transaction that they conduct is the edge that connects them. It was also important to account for the number of transactions conducted between two specific teams. This corresponds to an edge weight. The final version of the dataset also has columns for number of players sent/received. From this the inference about a preference for buying or selling was made. The date columns are also important as the NHL season runs through January 1 of every year, so one cannot simply use the year to determine what season a transaction took place in. Free agency begins every year on July 1. This was the cutoff date used to determine which season a transaction took place in. This makes the most sense as there is a trade deadline every year towards the end of the regular season. This means that conference champions will necessarily win their championships in June with the team that they constructed during the period July 1 to early April. This is similar to the transfer windows in European soccer where each window will have more or less of an impact on either the current season or subsequent one.

 The time period of this project is limited to the seasons 2000-01 to 2009-10 inclusive. This was a decision simply made because of the time constraint of the project. This 10 year period is the most recent period in which there were no major structural changes to the league such as divisional reorganization, franchise relocation, or league expansion. It’s certainly possible to conduct the same analysis done here on the complete dataset, however it was decided that the time required to account for these factors could have prevented any conclusion being reached. Particularly challenging to manage are the divisional reorganizations which is a critical component in determining a teams strategy. The major drawback to this period is that the 2004-2005 season was cancelled due to a collective bargaining disagreement. As such, there were few transactions and no games played.

 

2.2. Definitions

 

 There are several methods, formulas, and definitions that should be established before proceeding further with the analysis. Most of them can be found as node attributes in the network. A summary of the attributes can be found in Table 1.

 

2.2.1. Node Authority Value

 

 An authoritative node is one that is linked with many hubs.[12] A hub is a node that has many degrees. It’s calculated by using the HITS algorithm (Hyperlink Induced Topic Search). The node's raw authority score is the sum of the hub values of the nodes that it’s linked with.

 

2.2.2. Node Betweenness Centrality

 

 The betweenness centrality of a node is a number that corresponds to how frequently the node lies in the shortest path between two other nodes. It’s often used as a measure of influence since nodes with high betweenness centrality are influential in networks because more information flows through them.[13]

 

2.2.3. League Conferences and Divisions

 

 The NHL has two conferences, East and West. The league has since expanded, but during the time period in focus, each conference had 15 teams. 8 of those teams went to the playoffs. Each conference would produce one champion who would play the opposing conference champion in the Stanley Cup Final. For the purpose of this study, conference champions are examined rather than league champions. A conference championship is still quite an accomplishment and it doubles the sample size of “winners” from 9 to 18.[14,15]

 The NHL during the period in focus for this study had six divisions, three in each conference. They’re organized regionally and each contained five teams. The Eastern Conference had the Atlantic, Northeast, and Southeast divisions. The Western Conference had the Pacific, Northwest, and Central divisions.[16,17]

 

2.2.4. Node Hub Value

 

 A hub in a network is a node that has many degrees. It’s hub value specifically is calculated by using the HITS algorithm (Hyperlink Induced Topic Search). The node's raw hub score is the sum of the authority values of the nodes that it’s linked with.[18]

2.2.5. Node Local Reaching Centrality

 

Local Reaching Centrality is defined as “the proportion of all nodes in the graph that can be reached from node i via outgoing edges.”[19] The method is primarily used for directed graphs however it is generalizable to undirected graphs which is how the graphs in the study were constructed.

 

2.2.6. Page Rank

 

 
 

 PageRank ranks each node with respect to its importance as determined by the amount and quality of it’s incoming links.[20] It’s intended for use with directed graphs however it is generalizable to undirected graphs which is how the graphs were constructed for this project. The method and algorithm have been popularized by the founders of Google however I think it’s important to note that in their original papers and filings they cite the work of network science researchers Jon Kleinberg[21] and Massimo Marchiori[22] as well as the founder of Baidu, Robin Li.[23] PageRank and Betweenness Centrality are quite well related as shown in Figure 1 above.

 

2.2.7. Grand Strategy

 One of twelve possible strategies which are a combination of preferences for buying or selling and transacting nationally, regionally, or locally. No distinct preference would result in a classification of “Unbiased” or “No Preference”. A preference for buying or selling was assigned to teams that had a total number of players in or out that was outside the inclusive range of [-1,1]. Geographic preference was assigned by first calculating the standard deviation of the set of numbers that correspond to the number of transactions conducted in the three possible geographic locations (local, regional, national). If the standard deviation was less than 1.5, the geographic preference was determined to be “Unbiased”. If the standard deviation was greater than 1.5, the geographic preference was determined to be biased toward the location with the most trades.

 This study did not include any teams that had a locally biased strategy. A national bias was the most common while there was a slight preference for selling as shown in the Figure 10 above. Radar charts for one of the possible strategies (National Buyer) is below in Figure 2.

 
 

3. Results

 

As noted in the abstract, there were no network characteristics that differed among winning teams in a statistically significant way (threshold p-value of .05). A summary of the relevant p-values is provided below in Table 2. Percent of successful power plays and penalty kills are also included which are well known to be associated with team success. In analysis, winning teams, defined as Conference Champions, were set aside to compare against the rest of the league. For statistical analyses, all of these results were combined over the ten year period.

Possibly other relationships would be found if we measure success on a scale more granular than winning championships or points. There are team statistics available now that are much more advanced than data that was used for this time period in the 2000s. There is a good chance that the network characteristics here would correlate well with those modern metrics. Those modern metrics correlate well with success so that type of analysis would add that intermediate step rather than look directly for a relationship between network centrality and championships. There are indeed interesting correlations of this kind that are worth exploring. For example, network characteristics are moderately well correlated with Power Play and Penalty Kill success rates which in turn are related to Points % and Team Ranking.

 PageRank had a p-value of .216 which is not statistically significant but it’s box plot showed that winning teams generally have lower PageRanks. There is a negative relationship between PageRank and Points % where a higher Points % is desirable. This indicates that a team might want to have a lower PageRank.

    It’s also interesting to note that the results showed clearly that teams with no geographic preference tend to outperform teams that are biased.

 Local Reaching Centrality had a p-value on the lower end of the range so it’s worth exploring further. Local Reaching was also fairly well correlated with Power Play %, Penalty Kill %, and Average Age. All three of those relate well to success so secondary or tertiary network effects are worth investigating.

 Net Players In/Out was the attribute that was closest to being statistically significant with a p-value of 0.07. This is confirmed again when we look at the strategies of winners. Selling is far more popular among winners which is going to drag down the average.

    This preference for selling among winners is really counterintuitive and goes against the narrative about transactions at the Trade Deadline each year. Typically, teams who think they have a chance at winning the Stanley Cup will trade away prospects or draft picks in exchange for fully developed players who can contribute immediately to their effort. This analysis suggests a successful team would want to have Net Players near zero or even potentially be a seller. One outlier on the championship side had close to 15 players net out which is quite a turnover for a championship team.

 One third of all championship teams were classified as National Sellers out of 12 possible classifications. Unbiased Sellers account for another 3 of the 18 championship teams. There were only 3 buyers in the championship group.

    This is an interesting result especially given that some plots seem to show that being unbiased, especially with respect to geographic preference, is the best strategy. This is apparently just the case for the regular season. National Sellers and Unbiased Sellers also seem to outperform in Power Play %, which is a metric correlated with success. This is demonstrated as a swarm plot in Figure 3 where we see National Seller outperforming National Buyer with a cluster above 20% which is considered a strong Power Play %.

Capture.PNGCapture.PNG
F
igure 3: Swarm plot showing trading strategies with power play success rates.
X-axis is is a %.
 
 

 Buyers are very underrepresented in the Championship group. What’s clear from looking at the lower plot is that the majority of teams transact on a national level. It’s also important to keep in mind that there are four possible designations for geographic preference and that 0 teams through 10 seasons preferred to transact locally (within the same division). This doesn’t mean that there was no trading between divisional rivals. It just means that no teams traded in such a way that the preference threshold was met as laid out in the definitions.

 Finally, Figure 4 displays a barplot where we can see a side by side comparison of the expected number of Championships for each strategy compared with the actual number of Championships. The expected number of Championships is simply the proportion of each strategy throughout the entire league multiplied by the total number of Championships in the sample (18).

Figure
3
:
Side-by-side bar plots showing actual championships vs. expected championships for each trading strategy
 
 

4. Conclusion

 

The results of the analysis are encouraging and merit further research. It’s unclear that there are strong correlations between network metrics and success in the regular season, but there are indications that combinations of metrics will be related to secondary measures of success. This is best exemplified here by the correlation between network metrics and NHL special teams statistics. Additionally, it’s clear that not all strategies are equally effective. Figure 25 shows that some strategies outperform or underperform based on expectations.

The results here confirm what was found in other studies done in other sports and contexts. For example, it was found that the best strategy for the English Premier League is to have a small tight-knit network of trading partners and to minimize centrality. This seems to be precisely what’s reflected here.[2] It was also shown that successful soccer franchises will transact globally but maintain relatively low turnover in the roster.[1] At a minimum, the results from the related papers combined with the results here show that there are indeed optimal strategies at all. Since these strategies are always defined by some network characteristics, it follows that network analysis can be a tool used to help optimize a trading strategy for a professional sports organization.

It’s not immediately clear from this study that network analysis would be useful in a machine learning model to predict the outcome of games or matches. Vaz de Melo et al. did have success in predicting winning teams over the course of a season.[11] I don’t doubt that this analysis could eventually yield similar results, but I’m unsure about the usefulness of this application. Sports betting futures markets are considerably smaller than events or propositions, and so while a model of this kind would be useful and potentially profitable, it wouldn’t be nearly as powerful as one that could be used every day.

 As noted throughout the paper, there are several factors that limit this study. Some of the most salient are listed below.

 Despite these limitations, the study was successful in showing that not all strategies are equal. This confirms what was found in other similar studies and justifies further exploration of the topic. In my view, the study failed to generate any useful features for a predictive machine learning model. The most promising use of the analysis seems to be for helping an organization strategize. Any improvements in the analysis toward the end of organizational strategy will almost certainly improve the prospects of a sports betting use case as well. Going forward, the focus will be on team strategy with the hope of building an event prediction model along the way. A few ideas to expand on the organizational/strategic use are below:

 

10. References

 

 

[1] Rossetti, Giulio, and Vincenzo Caproni. "Football market strategies: Think locally, trade globally." 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 2016.

 

[2] Coates, Dennis, Iuliia Naidenova, and Petr Parshakov. "Transfer Policy and Football Club Performance: Evidence from Network Analysis." International Journal of Sport Finance 15.3 (2020).

 

[3] Lee, Sangmin, Inho Hong, and Woo-Sung Jung. "A Network approach to the transfer market of European football leagues." New Physics: Sae Mulli 65.4 (2015): 402-409.

 

[4] Anderson, Neil Timothy. An analysis of the effect of mid-season trades on team performance in the National Basketball Association. Diss. 2016.

 

[5] Key Data on the Global Sports Betting Industry 2020 Published by S. Lock-May 31- https://www.statista.com/statistics/1154681/key-data-global-sports-betting-industry/

 

[6] Rossetti, Giulio, and Vincenzo Caproni. "Football market strategies: Think locally, trade globally." 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 2016.

 

[7] White, Philip G., William G. McTeer, and Anne B. Vagi. "The effect on performance of being traded during the season: The case of the National Hockey League." Journal of Sport Behavior 14.3 (1991): 201.

[8] Bateman, Thomas S., Kirk R. Karwan, and Thomas A. Kazee. "Getting a fresh start: A natural quasi-experimental test of the performance effects of moving to a new job." Journal of Applied Psychology 68.3 (1983): 517.

 

[9] Wanic, Rebekah A., Nadav Goldschmied, and Mairead Nolan. "“I’ll show them”: Assessing performance in recently traded NBA players facing their former team." Motivation Science 5.4 (2019): 357.

 

[10] Rossetti, Giulio, and Vincenzo Caproni. "Football market strategies: Think locally, trade globally." 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 2016.

 

[11] Vaz de Melo, Pedro OS, Virgilio AF Almeida, and Antonio AF Loureiro. "Can complex network metrics predict the behavior of NBA teams?." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008.

 

[12] Kleinberg, Jon M. "Authoritative sources in a hyperlinked environment." SODA. Vol. 98. 1998.

 

[13] Freeman, Linton C. "A set of measures of centrality based on betweenness." Sociometry (1977): 35-41.

 

[14] Wikipedia contributors. “Eastern Conference (NHL).” Wikipedia, 6 July 2021, en.wikipedia.org/wiki/Eastern_Conference_(NHL).

 

[15] Wikipedia contributors. “Western Conference (NHL).” Wikipedia, 8 July 2021, en.wikipedia.org/wiki/Western_Conference_(NHL).

 

[16] Wikipedia contributors. “2000–01 NHL Season.” Wikipedia, 14 July 2021, en.wikipedia.org/wiki/2000%E2%80%9301_NHL_season.

 

[17] Wikipedia contributors. “2000–01 NHL Season.” Wikipedia, 14 July 2021, en.wikipedia.org/wiki/2000%E2%80%9301_NHL_season.

 

[18] Kleinberg, Jon M. "Authoritative sources in a hyperlinked environment." SODA. Vol. 98. 1998.

 

[19] Mones, Enys, Lilla Vicsek, and Tamás Vicsek. "Hierarchy measure for complex networks." PloS one 7.3 (2012): e33799.

 

[20] Page, Lawrence, et al. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, 1999.

 

[21] Kleinberg, Jon M. "Authoritative sources in a hyperlinked environment." SODA. Vol. 98. 1998.

 

[22] Marchiori, Massimo. "The quest for correct information on the Web: Hyper search engines." Computer Networks and ISDN Systems 29.8-13 (1997): 1225-1235.

 

[23] http://www.rankdex.com/about.html