NHL data analysis
Mu-Tien, Lee 2020,09.04
Discription
This project is going to talk about some data in NHL the franchise are
loaded from
https://gitlab.com/dword4/nhlapi/-/blob/master/records-api.md and
https://gitlab.com/dword4/nhlapi/-/blob/master/stats-api.md. You may
use endpoint
function and key in with correct endpoint’s name to get
whatever table you would like to get. At some point you can assign a
specific team by both name and team ID/franchise ID. After that I will
show you some data analysis among all active team and some selected
team. wish you can have some fun in this project!
Data cleaning
Building functions to reach statsAPI
#build up base dataset for switch team.names and team.ID
text <- content(GET("https://statsapi.web.nhl.com/api/v1/teams?expand=team.roster"),"text")
base <- fromJSON(text, flatten = TRUE)
base <- as.data.frame(base) %>% select(teams.franchise.teamName,teams.franchiseId,teams.teamName,teams.id)
#make a function to let user reach statsAPI
statsAPI <- function(x,teamID=NULL,season=NULL, ...){
#setting up base url for stats API
base_url <-"https://statsapi.web.nhl.com/api/v1/teams"
modifiers <- x
#construct the full path
if (x %in% "teamId"){
full_url <- paste0(base_url, "?", modifiers, "=", teamID)
}
#convert team name into teamID
else if(is.character(teamID)){
base <- base %>% filter(teams.teamName==teamID)
teamID <- base[4]
}
if (x %in% c("expand=team.roster","expand=person.names","expand=team.schedule.next","expand=team.schedule.previous","expand=team.stats","stats=statsSingleSeasonPlayoffs")){
if (is.null(teamID)){
full_url <- paste0(base_url, "?", modifiers)}
else {
full_url <- paste0(base_url, "/", teamID, "?", modifiers)}
}
else if (x %in% "expand=team.roster&season" ){
if (is.null(season)){
stop("seaon is missing")}
else if(length(teamID)>1){
stop("sorry, we can only show one team each time")}
else if(is.null(teamID)){
full_url <- paste0(base_url,"?", modifiers,"=", 20142015)}
else {
full_url <- paste0(base_url, "/", teamID, "?", modifiers,"=", season)}
}
# retrieve information in raw form
GET(full_url)
#transfer into JSON text form
text <- content(GET(full_url),"text")
#convert it to a list
mydata <- fromJSON(text, flatten = TRUE)
mydata<- as.data.frame(mydata)
return(mydata)
}
Building functions to reach recordAPI
Below is a function that can help you read whatever the endpoint you
would like to get, please key in the following option in the function.
NOTE: If you want to reach out to the teamId please put your teamID
as character, ex:"2,12,22"
to reach the table.
endpoints <- c("franchise","franchise-team-totals","franchise-season-records","franchise-goalie-records","franchise-skater-records","expand=team.roster","expand=person.names","expand=team.schedule.next","expand=team.schedule.previous","expand=team.stats","expand=team.roster&season","teamId","stats=statsSingleSeasonPlayoffs")
idOption <- c(rep("None",2),rep("franchiseID or teamName without location",3),rep("teamID or teamName without location",8))
seasonOption <- c(rep("No",10),"Yes", "No","No")
kable(cbind(endpoints,idOption,seasonOption), format = "html")
endpoints | idOption | seasonOption |
---|---|---|
franchise | None | No |
franchise-team-totals | None | No |
franchise-season-records | franchiseID or teamName without location | No |
franchise-goalie-records | franchiseID or teamName without location | No |
franchise-skater-records | franchiseID or teamName without location | No |
expand=team.roster | teamID or teamName without location | No |
expand=person.names | teamID or teamName without location | No |
expand=team.schedule.next | teamID or teamName without location | No |
expand=team.schedule.previous | teamID or teamName without location | No |
expand=team.stats | teamID or teamName without location | No |
expand=team.roster\&season | teamID or teamName without location | Yes |
teamId | teamID or teamName without location | No |
stats=statsSingleSeasonPlayoffs | teamID or teamName without location | No |
kable(base, caption = "ID table for teams", format = "html")
teams.franchise.teamName | teams.franchiseId | teams.teamName | teams.id |
---|---|---|---|
Devils | 23 | Devils | 1 |
Islanders | 22 | Islanders | 2 |
Rangers | 10 | Rangers | 3 |
Flyers | 16 | Flyers | 4 |
Penguins | 17 | Penguins | 5 |
Bruins | 6 | Bruins | 6 |
Sabres | 19 | Sabres | 7 |
Canadiens | 1 | Canadiens | 8 |
Senators | 30 | Senators | 9 |
Maple Leafs | 5 | Maple Leafs | 10 |
Hurricanes | 26 | Hurricanes | 12 |
Panthers | 33 | Panthers | 13 |
Lightning | 31 | Lightning | 14 |
Capitals | 24 | Capitals | 15 |
Blackhawks | 11 | Blackhawks | 16 |
Red Wings | 12 | Red Wings | 17 |
Predators | 34 | Predators | 18 |
Blues | 18 | Blues | 19 |
Flames | 21 | Flames | 20 |
Avalanche | 27 | Avalanche | 21 |
Oilers | 25 | Oilers | 22 |
Canucks | 20 | Canucks | 23 |
Ducks | 32 | Ducks | 24 |
Stars | 15 | Stars | 25 |
Kings | 14 | Kings | 26 |
Sharks | 29 | Sharks | 28 |
Blue Jackets | 36 | Blue Jackets | 29 |
Wild | 37 | Wild | 30 |
Jets | 35 | Jets | 52 |
Coyotes | 28 | Coyotes | 53 |
Golden Knights | 38 | Golden Knights | 54 |
Accessory for API endpoint
#build a wrapper function to let user reach endpoints easily
endpoints <- function(x,...){
record <- c("franchise","franchise-team-totals","franchise-season-records", "franchise-goalie-records", "franchise-skater-records")
stats <- c("expand=team.roster","expand=person.names","expand=team.schedule.next","expand=team.schedule.previous","expand=team.stats","expand=team.roster&season","teamId","stats=statsSingleSeasonPlayoffs")
if (x %in% record) recordAPI(x,...)
else if (x %in% stats) statsAPI(x,...)
else stop("Please enter the correct name of your enpoints")
}
We’ll use this function to reach out the API in the pollowing report
Data analysis
Numeric summarize
A contingency table (Division*Fist year of play)
Atlantic | Central | Metropolitan | Pacific | |
---|---|---|---|---|
1917-1942 | 4 | 1 | 1 | 0 |
1942-1967 | 0 | 1 | 2 | 1 |
1967-1992 | 3 | 0 | 3 | 4 |
1992-2017 | 1 | 5 | 2 | 3 |
In this table, we can notice that Atlantic has older team distribution when Central and Pacific have more younger team
Numeric summary table
winrate | winlossrate | overtimelossrate | goalpergame | |
---|---|---|---|---|
Min. | 0.3958 | 0.8051 | 0.0546 | 2.4833 |
1st Qu. | 0.4413 | 1.0471 | 0.0890 | 2.8356 |
Median | 0.4605 | 1.1643 | 0.1252 | 3.0541 |
Mean | 0.4639 | 1.1710 | 0.1508 | 2.9862 |
3rd Qu. | 0.4837 | 1.2694 | 0.2157 | 3.1771 |
Max. | 0.5660 | 1.6625 | 0.2750 | 3.3322 |
winrate | winlossrate | overtimelossrate | goalpergame | |
---|---|---|---|---|
Min. | 0.3506 | 0.5400 | 0.0000 | 2.2597 |
1st Qu. | 0.4536 | 0.8321 | 0.0000 | 2.5917 |
Median | 0.4891 | 0.9745 | 0.0000 | 2.7400 |
Mean | 0.4905 | 0.9962 | 0.0007 | 2.7761 |
3rd Qu. | 0.5354 | 1.1523 | 0.0000 | 2.9395 |
Max. | 0.5970 | 1.4815 | 0.0111 | 3.6791 |
In this two table we can see that the win rate and goal per game have no huge difference between regular season and play off season. However, the over time loss rate is very differet. Also, we can discover that the mean is higher that 755 percentile which meas that only a small group of team have over time loss in play off season.
Goal Against | Goals For | Points | Point Percentage | Wins | |
---|---|---|---|---|---|
Min. | 4425.00 | 3955.00 | 1500.00 | 0.4960 | 660.00 |
1st Qu. | 7782.00 | 7669.00 | 2806.75 | 0.5122 | 1229.00 |
Median | 11584.50 | 11607.00 | 3803.50 | 0.5201 | 1660.00 |
Mean | 10889.50 | 10952.50 | 3717.75 | 0.5248 | 1615.25 |
3rd Qu. | 12513.75 | 13564.75 | 4382.25 | 0.5304 | 1913.00 |
Max. | 19863.00 | 19864.00 | 6667.00 | 0.5759 | 2856.00 |
Goal Against | Goals For | Points | Point Percentage | Wins | |
---|---|---|---|---|---|
Min. | 5969.00 | 5476.00 | 2049.00 | 0.4990 | 852.00 |
1st Qu. | 6471.75 | 6078.50 | 2170.50 | 0.5074 | 948.75 |
Median | 14929.50 | 15878.00 | 5382.50 | 0.5228 | 2314.00 |
Mean | 13279.12 | 13966.12 | 4890.75 | 0.5304 | 2113.25 |
3rd Qu. | 18782.75 | 20080.75 | 6865.25 | 0.5429 | 2956.00 |
Max. | 19805.00 | 21632.00 | 7899.00 | 0.5868 | 3449.00 |
Goal Against | Goals For | Points | Point Percentage | Wins | |
---|---|---|---|---|---|
Min. | 1997.000 | 2039.000 | 776.000 | 0.5040 | 352.000 |
1st Qu. | 4264.500 | 4279.500 | 1757.500 | 0.5401 | 772.500 |
Median | 5325.000 | 5660.000 | 2175.000 | 0.5561 | 968.000 |
Mean | 7617.857 | 7736.571 | 2830.143 | 0.5499 | 1230.857 |
3rd Qu. | 8986.500 | 9261.000 | 3394.500 | 0.5628 | 1481.500 |
Max. | 19501.000 | 19376.000 | 6556.000 | 0.5833 | 2788.000 |
Goal Against | Goals For | Points | Point Percentage | Wins | |
---|---|---|---|---|---|
Min. | 669.000 | 748.000 | 288.00 | 0.4521 | 133.000 |
1st Qu. | 4605.000 | 4473.250 | 1776.50 | 0.4908 | 777.250 |
Median | 8039.500 | 8220.000 | 2835.00 | 0.5268 | 1241.500 |
Mean | 7591.375 | 7429.625 | 2477.75 | 0.5227 | 1076.125 |
3rd Qu. | 11062.000 | 10941.500 | 3469.50 | 0.5414 | 1509.750 |
Max. | 13591.000 | 12910.000 | 4048.00 | 0.6128 | 1733.000 |
In these table we can notice that Atlantic is tha division that has the most win among all division, this is very reasonable for me because in the previous table, we know that Atlantic is the one that has more older teams. Other data seems reasonable, in goals Against and Goals for the summary are similar among all the division, only Pacific and Central has lower minimum. Maybe some younger teams just played less games than others.
Plots
A bar plot tells conference and division
This Bar plot show you the count of each Division and Conference. So we know that there are two conference, each has two division. In each division there are 7~8 teams.
Histogram of penalty times
In these 2 plots we can say that the oldest teams have lower average penalty time both in the regular seasons and in the play off seasons if we don’t take recent established teams into consider.
Data analysis between 4 selected team
I choose NY islanders, Tampa Bay lightening, Vegas Golden Knights, Dallas stars, which are in the Conference Finals in 2019-20 NHL season.
Numerical summary
Some data to compare between these team
Team | Game Played | Win Rate | Win on Road Rate | Loss at Home Rate |
---|---|---|---|---|
New York Islanders | 3732 | 0.4437299 | 0.4434783 | 0.4170792 |
Tampa Bay Lightning | 2138 | 0.4438728 | 0.4400428 | 0.4306878 |
Dallas Stars | 2053 | 0.5168047 | 0.5381166 | 0.3457207 |
Vegas Golden Knights | 235 | 0.5659574 | 0.5523810 | 0.3055556 |
Team | Game Played | Win Rate | Win on Road Rate | Loss at Home Rate |
---|---|---|---|---|
New York Islanders | 294 | 0.5476190 | 0.4610390 | 0.3571429 |
Tampa Bay Lightning | 162 | 0.5617284 | 0.6025641 | 0.4761905 |
Dallas Stars | 200 | 0.5250000 | 0.5100000 | 0.4600000 |
Vegas Golden Knights | 47 | 0.5957447 | 0.5217391 | 0.3333333 |
These table show the data directly, we can notice that Vegas Golden Knights has not played many games. Because this team started to play in NHL in 2017. Therefore, the data seems to be not so compareable to other three groups. But we can also say that, this young group has less likely to loss at home than any others.
Distribution of skater position for each team
Dallas Stars | New York Islanders | Tampa Bay Lightning | Vegas Golden Knights | |
---|---|---|---|---|
C | 137 | 120 | 59 | 1 |
D | 179 | 161 | 126 | 1 |
L | 125 | 98 | 64 | 1 |
R | 115 | 78 | 69 | 1 |
Dallas Stars | New York Islanders | Tampa Bay Lightning | Vegas Golden Knights | |
---|---|---|---|---|
C | 15 | 15 | 22 | 14 |
D | 25 | 17 | 15 | 14 |
L | 9 | 10 | 5 | 10 |
R | 10 | 10 | 6 | 5 |
The distribution of the position is similar among each team about 15 for position c and D and about 10 for L and R. Vegaas golden Knights has less amount in each position maybe because they are a 3-year-old team.
Plots
Scatter and boxplot
We can say that Vegas Golden Knights has higher average points in one season and Tampa Bay Lightning has a liitle bit lower. Also, there are some outliers for New York Islanders and Tampa Bay Lightning. I personnally think that this is Rare in a century. Many fter 50 years Vegas Golen Knights will also have some skaters who will get more than 100 points in one season.
In this plot we can notice that Tampa Bay Lightning’s goalier havehiest average win rate, Even one of them has a 100% winning rate. I bet that is because they play less games.
In this plot we can notice that Tampa Bay Lightning’s goalier are worse savers but better goal maker than other teams’ goalier.
Thank you for reading my Project.