Difficult aggregation in R, grouping team basketball stats -


thanks in advance or suggestions on this. here shortened example of dataframe working with.

boxscore_stats = structure(list(game_id = c(157046l, 157046l, 157046l, 157046l,  157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l,  157046l, 157046l, 157046l, 157046l, 157046l, 157046l, 157046l,  159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l,  159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l,  159151l, 159151l, 159151l, 159151l, 159151l, 159151l, 159151l,  159151l), team_id = c(116975, 116975, 116975, 116975, 116975,  116975, 116975, 116975, 116975, 120310, 120310, 120310, 120310,  120310, 120310, 120310, 120310, 120310, 121910, 121910, 121910,  121910, 121910, 121910, 121910, 121910, 121910, 121910, 122072,  122072, 122072, 122072, 122072, 122072, 122072, 122072, 122072,  122072, 122072, 122072), minutes_played = c(18.76, 14.63, 8,  16.69, 24.62, 32, 12.79, 5.28, 3.22, 24.35, 10.18, 20.65, 9.59,  25.08, 14.12, 17.46, 23.15, 15.43, 22.84, 19.27, 21.31, 6.41,  17.57, 17.4, 17.29, 7.22, 12.09, 17.25, 2.28, 16.87, 6.6, 19.73,  6.31, 13.25, 26.25, 6.08, 28.71, 11.2, 17.54, 5.17), fieldgoalsmade = c(1l,  1l, 4l, 1l, 2l, 7l, 1l, 1l, 1l, 4l, 0l, 3l, 1l, 3l, 0l, 6l, 7l,  1l, 7l, 4l, 5l, 1l, 2l, 6l, 2l, 0l, 1l, 3l, 0l, 1l, 1l, 3l, 0l,  1l, 11l, 2l, 5l, 1l, 2l, 1l), fieldgoalattempts = c(8l, 6l, 7l,  2l, 9l, 16l, 3l, 1l, 2l, 12l, 4l, 12l, 3l, 11l, 4l, 9l, 13l,  6l, 12l, 10l, 14l, 2l, 6l, 11l, 6l, 2l, 2l, 6l, 0l, 5l, 3l, 10l,  2l, 3l, 21l, 3l, 17l, 4l, 9l, 2l)), .names = c("game_id", "team_id",  "minutes_played", "fieldgoalsmade", "fieldgoalattempts"), row.names = c(na,  40l), class = "data.frame")   head(boxscore_stats)     game_id team_id minutes_played fieldgoalsmade fieldgoalattempts 1   157046  116975          18.76              1                 8 2   157046  116975          14.63              1                 6 3   157046  116975           8.00              4                 7 4   157046  116975          16.69              1                 2 5   157046  116975          24.62              2                 9 6   157046  116975          32.00              7                16 7   157046  116975          12.79              1                 3 8   157046  116975           5.28              1                 1 9   157046  116975           3.22              1                 2 10  157046  120310          24.35              4                12 11  157046  120310          10.18              0                 4 12  157046  120310          20.65              3                12 13  157046  120310           9.59              1                 3 14  157046  120310          25.08              3                11 15  157046  120310          14.12              0                 4 16  157046  120310          17.46              6                 9 17  157046  120310          23.15              7                13 18  157046  120310          15.43              1                 6 19  159151  121910          22.84              7                12 20  159151  121910          19.27              4                10 21  159151  121910          21.31              5                14 22  159151  121910           6.41              1                 2 23  159151  121910          17.57              2                 6 24  159151  121910          17.40              6                11 25  159151  121910          17.29              2                 6 26  159151  121910           7.22              0                 2 27  159151  121910          12.09              1                 2 28  159151  121910          17.25              3                 6 29  159151  122072           2.28              0                 0 30  159151  122072          16.87              1                 5 31  159151  122072           6.60              1                 3 32  159151  122072          19.73              3                10 33  159151  122072           6.31              0                 2 34  159151  122072          13.25              1                 3 35  159151  122072          26.25             11                21 36  159151  122072           6.08              2                 3 37  159151  122072          28.71              5                17 38  159151  122072          11.20              1                 4 39  159151  122072          17.54              2                 9 40  159151  122072           5.17              1                 2 

the important things note dataframe each game_id corresponds 2 team_ids, 2 teams played in game. each game_id unique 1 game of basketball. each row corresponds stats player on team_ids team in game. example above has 2 games / 4 teams / 40 players, full dataframe has hundreds of games, each team_id showing many times.

my first aggregation, able do, aggregate team_id. code got job done me first aggregation:

boxscore_stats_aggregated = aggregate(boxscore_stats, = list(boxscore_stats[, 2]), fun = sum) 

which straightforward. team_id, had aggregated of minutes played, of fieldgoalsmade, etc. next aggregation though, need aggregate team_id again instead of aggregating team own rows / stats, instead need aggregate rows / stats of opponents. answers question "for team, how many fieldsgoalsmade did allow in total opponents, etc." in case, team_id = 116975, want aggregate rows team_id 120310. of course next time team_id 116975 appears in dataframe in new game, playing different opponent, aggregation not simple aggregating team_id 120310.

i think should able use relationship between 2 team_ids being unique unique game_ids make aggregation possible, struggling how implemented.

thanks!

here approach using data.table:

(1) read in data:

# load package library(data.table)  # load data boxscore_stats <- fread("row game_id team_id minutes_played fieldgoalsmade fieldgoalattempts 1   157046  116975          18.76              1                 8            2   157046  116975          14.63              1                 6            3   157046  116975           8.00              4                 7            4   157046  116975          16.69              1                 2            5   157046  116975          24.62              2                 9            6   157046  116975          32.00              7                16            7   157046  116975          12.79              1                 3            8   157046  116975           5.28              1                 1            9   157046  116975           3.22              1                 2            10  157046  120310          24.35              4                12            11  157046  120310          10.18              0                 4            12  157046  120310          20.65              3                12            13  157046  120310           9.59              1                 3            14  157046  120310          25.08              3                11            15  157046  120310          14.12              0                 4            16  157046  120310          17.46              6                 9            17  157046  120310          23.15              7                13            18  157046  120310          15.43              1                 6            19  159151  121910          22.84              7                12            20  159151  121910          19.27              4                10            21  159151  121910          21.31              5                14            22  159151  121910           6.41              1                 2            23  159151  121910          17.57              2                 6            24  159151  121910          17.40              6                11            25  159151  121910          17.29              2                 6            26  159151  121910           7.22              0                 2            27  159151  121910          12.09              1                 2            28  159151  121910          17.25              3                 6            29  159151  122072           2.28              0                 0            30  159151  122072          16.87              1                 5            31  159151  122072           6.60              1                 3            32  159151  122072          19.73              3                10            33  159151  122072           6.31              0                 2            34  159151  122072          13.25              1                 3            35  159151  122072          26.25             11                21            36  159151  122072           6.08              2                 3            37  159151  122072          28.71              5                17            38  159151  122072          11.20              1                 4            39  159151  122072          17.54              2                 9            40  159151  122072           5.17              1                 2            ") 

(2) proceed actual calculations:

# aggregate on team-and game level (data.table style) boxscore_stats_aggregated  <-  boxscore_stats[, lapply(.sd, sum), = list(game_id, team_id)]   # match every team each opponent, i.e. still 2 rows per game # columns opponent's performance added. # teams drops out in dummy data opponent data missing. merge(boxscore_stats_aggregated, boxscore_stats_aggregated,        by="game_id", suffixes = c("", ".opponent"))[team_id!=team_id.opponent,] 

output looks that:

# > output #    game_id team_id row minutes_played fieldgoalsmade fieldgoalattempts team_id.opponent row.opponent minutes_played.opponent fieldgoalsmade.opponent fieldgoalattempts.opponent # 1: 1413414  116975  45         135.99             19                54           120310          126                  160.01                      25                         74 # 2: 1413414  120310 126         160.01             25                74           116975           45                  135.99                      19                         54 

Comments