i analyzing clinical data , trying filter out information in 1 dataframe based on information dataframe.
one of dataframes lists dates patients come in treatment
dftreatments = pd.dataframe({'patientid': [4,4,4,9,9,9,11,11,11], 'treatmentdate': ['2016-01-01', '2016-01-15', '2016-03-25','2016-01-01','2016-01-15','2016-01-29','2016-01-01','2016-03-15','2016-03-25']}) dftreatments['treatmentdate'] = pd.to_datetime(dftreatments['treatmentdate']) patientid treatmentdate 0 4 2016-01-01 1 4 2016-01-15 2 4 2016-03-25 3 9 2016-01-01 4 9 2016-01-15 5 9 2016-01-29 6 11 2016-01-01 7 11 2016-03-15 8 11 2016-03-25
and other dataframe lists dates patients visit hospital complication.
dfhospitalvisits = pd.dataframe({'patientid': [4,4,9,11], 'hospitalvisitdate': ['2016-01-14','2016-03-10','2016-01-28','2016-01-03']}) dfhospitalvisits['hospitalvisitdate'] = pd.to_datetime(dfhospitalvisits['hospitalvisitdate']) hospitalvisitdate patientid 0 2016-01-14 4 1 2016-03-10 4 2 2016-01-28 9 3 2016-01-03 11
in our study, want exclude hospital visits our analysis if patient does not receive treatment 20 days. start excluding them @ last treatment before 20 day gap. e.g.: exclude hospital visits patient 4 after 2016-01-15.
in example, patient 4's 2nd hospital visit , patient 11's hospital visit removed dfhospitalvisits.
edit: @merlin, far have used dftreatments.groupby('patientid')['treatmentdate'].diff()
me gaps in treatment dates grouped patient. part stuck don't know how use difference in treatment dates >20 filter out values in dfhospitalvisits .
i suggest following:
# make sorted dataframe calculate time gap before next treatment dftreatments_sorted = dftreatments.sort_values(['patientid','treatmentdate'], ascending=false) # calculate time gap before next treatment df_diff = dftreatments_sorted.groupby('patientid').treatmentdate.diff(periods=1).rename('gap_before_next_treatment') # add time gaps new column existing dftreatments dataframe dftreatments = pd.concat([dftreatments, -df_diff], axis=1, join='inner').sort_index() # join dftreatments , dfhospitalvisits new dataframe (df) df = dfhospitalvisits.set_index('patientid').join(dftreatments.set_index('patientid')) # select combination treatmentdate before corresponding hospitalvisitdate df = df[(df.hospitalvisitdate>df.treatmentdate)] # treatmentdate important latest before hospitalvisitdate df = df.reset_index().groupby(['patientid','hospitalvisitdate']).max() # can filter hospital visits given calculated time gap df = df[df.gap_before_next_treatment<'20 days'].reset_index()[['patientid','hospitalvisitdate']]
Comments
Post a Comment