thanks in advance or suggestions on this. here shortened example of dataframe working with.
boxscore_stats = structure(list(game_id = c(157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l), team_id = c(116975, 116975, 116975, 116975, 116975, 116975, 116975, 116975, 116975, 120310, 120310, 120310, 120310, 120310, 120310, 120310, 120310, 120310, 121910, 121910, 121910, 121910, 121910, 121910, 121910, 121910, 121910, 121910, 122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072), minutes_played = c(18.76, 14.63, 8, 16.69, 24.62, 32, 12.79, 5.28, 3.22, 24.35, 10.18, 20.65, 9.59, 25.08, 14.12, 17.46, 23.15, 15.43, 22.84, 19.27, 21.31, 6.41, 17.57, 17.4, 17.29, 7.22, 12.09, 17.25, 2.28, 16.87, 6.6, 19.73, 6.31, 13.25, 26.25, 6.08, 28.71, 11.2, 17.54, 5.17), fieldgoalsmade = c(1l, 1l, 4l, 1l, 2l, 7l, 1l, 1l, 1l, 4l, 0l, 3l, 1l, 3l, 0l, 6l, 7l, 1l, 7l, 4l, 5l, 1l, 2l, 6l, 2l, 0l, 1l, 3l, 0l, 1l, 1l, 3l, 0l, 1l, 11l, 2l, 5l, 1l, 2l, 1l), fieldgoalattempts = c(8l, 6l, 7l, 2l, 9l, 16l, 3l, 1l, 2l, 12l, 4l, 12l, 3l, 11l, 4l, 9l, 13l, 6l, 12l, 10l, 14l, 2l, 6l, 11l, 6l, 2l, 2l, 6l, 0l, 5l, 3l, 10l, 2l, 3l, 21l, 3l, 17l, 4l, 9l, 2l)), .names = c("game_id", "team_id", "minutes_played", "fieldgoalsmade", "fieldgoalattempts"), row.names = c(na, 40l), class = "data.frame") head(boxscore_stats) game_id team_id minutes_played fieldgoalsmade fieldgoalattempts 1 157046 116975 18.76 1 8 2 157046 116975 14.63 1 6 3 157046 116975 8.00 4 7 4 157046 116975 16.69 1 2 5 157046 116975 24.62 2 9 6 157046 116975 32.00 7 16 7 157046 116975 12.79 1 3 8 157046 116975 5.28 1 1 9 157046 116975 3.22 1 2 10 157046 120310 24.35 4 12 11 157046 120310 10.18 0 4 12 157046 120310 20.65 3 12 13 157046 120310 9.59 1 3 14 157046 120310 25.08 3 11 15 157046 120310 14.12 0 4 16 157046 120310 17.46 6 9 17 157046 120310 23.15 7 13 18 157046 120310 15.43 1 6 19 159151 121910 22.84 7 12 20 159151 121910 19.27 4 10 21 159151 121910 21.31 5 14 22 159151 121910 6.41 1 2 23 159151 121910 17.57 2 6 24 159151 121910 17.40 6 11 25 159151 121910 17.29 2 6 26 159151 121910 7.22 0 2 27 159151 121910 12.09 1 2 28 159151 121910 17.25 3 6 29 159151 122072 2.28 0 0 30 159151 122072 16.87 1 5 31 159151 122072 6.60 1 3 32 159151 122072 19.73 3 10 33 159151 122072 6.31 0 2 34 159151 122072 13.25 1 3 35 159151 122072 26.25 11 21 36 159151 122072 6.08 2 3 37 159151 122072 28.71 5 17 38 159151 122072 11.20 1 4 39 159151 122072 17.54 2 9 40 159151 122072 5.17 1 2
the important things note dataframe each game_id corresponds 2 team_ids, 2 teams played in game. each game_id unique 1 game of basketball. each row corresponds stats player on team_ids team in game. example above has 2 games / 4 teams / 40 players, full dataframe has hundreds of games, each team_id showing many times.
my first aggregation, able do, aggregate team_id. code got job done me first aggregation:
boxscore_stats_aggregated = aggregate(boxscore_stats, = list(boxscore_stats[, 2]), fun = sum)
which straightforward. team_id, had aggregated of minutes played, of fieldgoalsmade, etc. next aggregation though, need aggregate team_id again instead of aggregating team own rows / stats, instead need aggregate rows / stats of opponents. answers question "for team, how many fieldsgoalsmade did allow in total opponents, etc." in case, team_id = 116975, want aggregate rows team_id 120310. of course next time team_id 116975 appears in dataframe in new game, playing different opponent, aggregation not simple aggregating team_id 120310.
i think should able use relationship between 2 team_ids being unique unique game_ids make aggregation possible, struggling how implemented.
thanks!
here approach using data.table:
(1) read in data:
# load package library(data.table) # load data boxscore_stats <- fread("row game_id team_id minutes_played fieldgoalsmade fieldgoalattempts 1 157046 116975 18.76 1 8 2 157046 116975 14.63 1 6 3 157046 116975 8.00 4 7 4 157046 116975 16.69 1 2 5 157046 116975 24.62 2 9 6 157046 116975 32.00 7 16 7 157046 116975 12.79 1 3 8 157046 116975 5.28 1 1 9 157046 116975 3.22 1 2 10 157046 120310 24.35 4 12 11 157046 120310 10.18 0 4 12 157046 120310 20.65 3 12 13 157046 120310 9.59 1 3 14 157046 120310 25.08 3 11 15 157046 120310 14.12 0 4 16 157046 120310 17.46 6 9 17 157046 120310 23.15 7 13 18 157046 120310 15.43 1 6 19 159151 121910 22.84 7 12 20 159151 121910 19.27 4 10 21 159151 121910 21.31 5 14 22 159151 121910 6.41 1 2 23 159151 121910 17.57 2 6 24 159151 121910 17.40 6 11 25 159151 121910 17.29 2 6 26 159151 121910 7.22 0 2 27 159151 121910 12.09 1 2 28 159151 121910 17.25 3 6 29 159151 122072 2.28 0 0 30 159151 122072 16.87 1 5 31 159151 122072 6.60 1 3 32 159151 122072 19.73 3 10 33 159151 122072 6.31 0 2 34 159151 122072 13.25 1 3 35 159151 122072 26.25 11 21 36 159151 122072 6.08 2 3 37 159151 122072 28.71 5 17 38 159151 122072 11.20 1 4 39 159151 122072 17.54 2 9 40 159151 122072 5.17 1 2 ")
(2) proceed actual calculations:
# aggregate on team-and game level (data.table style) boxscore_stats_aggregated <- boxscore_stats[, lapply(.sd, sum), = list(game_id, team_id)] # match every team each opponent, i.e. still 2 rows per game # columns opponent's performance added. # teams drops out in dummy data opponent data missing. merge(boxscore_stats_aggregated, boxscore_stats_aggregated, by="game_id", suffixes = c("", ".opponent"))[team_id!=team_id.opponent,]
output
looks that:
# > output # game_id team_id row minutes_played fieldgoalsmade fieldgoalattempts team_id.opponent row.opponent minutes_played.opponent fieldgoalsmade.opponent fieldgoalattempts.opponent # 1: 1413414 116975 45 135.99 19 54 120310 126 160.01 25 74 # 2: 1413414 120310 126 160.01 25 74 116975 45 135.99 19 54
Comments
Post a Comment