r - Filter within groups where x first exceeds y -


i have database consisting of

  • neighborhood ids home neighborhoods (id_h),
  • block ids home blocks (blk_h,
  • a sub-geography of neighborhoods),
  • work blocks (blk_w),
  • the flow of commuters between 2 (flow),
  • the median commuter per home neighborhood (med_c), and
  • the cumulative worker flow home neighborhood (cumflow).

the data sorted distance between blk_h , blk_w (descending), grouped id_h. need subset data extract case each home neighborhood cumflow first equals or exceeds med_c.

i've tried variety of dplyr functions , cannot work. here's example:

df <- data.frame(   id_h=c("a","a","a","a","b","b","b"),   blk_h=c("a1","a1","a2","a2","b1","b2","b2"),   blk_w=c("w1","w2","w3","w3","w1","w2","w2"),   dist=c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),   flow=c(3,6,3,7,5,4,2),   cumflow=c(3,9,12,19,5,9,11),   med_c=c(10,10,10,10,6,6,6) ) df 

i need return table this:

id_h  blk_h  blk_w  dist  flow  cumflow  med_c     a2     w3     7.0   3     12       10 b     b2     w2     6.5   4     9        6 

and here of things i've tried make happen: attempt #1

library(dplyr) df.g <- group_by(df, id_h)  df.g2 <- filter(df.g, cumflow == which.min(cumflow >= med_c)) 

attempt #2

library(data.table) setdt(df)[, .sd[which.min(cumcount >= med_c)], = id_h] 

attempt #3

library(dplyr) test <- df %>% group_by(id_h) %>% filter(min(cumflow) >= med_c) 

i think misunderstanding how use which.min function. advice appreciated.

two filter calls can solve this.

using group_by work within each id_h, first filter returns data.frame rows cumflow greater or equal med_c. second filter returns, within each id_h, row lowest cumflow. works because data sorted. make work more robust may consider adding call arrange after call group_by.

library(dplyr)  df <- data.frame(   id_h    = c("a","a","a","a","b","b","b"),   blk_h   = c("a1","a1","a2","a2","b1","b2","b2"),   blk_w   = c("w1","w2","w3","w3","w1","w2","w2"),   dist    = c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),   flow    = c(3,6,3,7,5,4,2),   cumflow = c(3,9,12,19,5,9,11),   med_c   = c(10,10,10,10,6,6,6) ) df  df %>% group_by(id_h) %>% filter(cumflow >= med_c) %>% filter(cumflow == min(cumflow)) 

Comments