i have 2 data frames follows:
a = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]}) b = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] })
which following:
>>> id date 0 2014-06-22 1 2014-07-02 2 c 2015-01-01 3 b 1991-01-01 4 b 1999-08-02 >>> b id date value 0 2015-02-15 3 1 2014-06-30 5 2 c 1999-07-02 1 3 b 1990-10-05 7 4 b 2014-06-24 8
i want merge values of b using nearest date. in example, none of dates match, the case do.
the output should this:
>>> c id date value 0 06/22/2014 8 1 07/02/2014 5 2 c 01/01/2015 3 3 b 01/01/1991 7 4 b 08/02/1999 1
it seems me there should native function in pandas allow this.
note: similar question has been asked here pandas.merge: match nearest time stamp >= series of timestamps
you can use reindex
method='nearest'
, merge
:
a['date'] = pd.to_datetime(a.date) b['date'] = pd.to_datetime(b.date) a.sort_values('date', inplace=true) b.sort_values('date', inplace=true) b1 = b.set_index('date').reindex(a.set_index('date').index, method='nearest').reset_index() print (b1) print (pd.merge(a,b1, on='date')) id_x date id_y value 0 b 1991-01-01 b 7 1 b 1999-08-02 c 1 2 2014-06-22 b 8 3 2014-07-02 5 4 c 2015-01-01 3
you can add parameter suffixes
:
print (pd.merge(a,b1, on='date', suffixes=('_', ''))) id_ date id value 0 b 1991-01-01 b 7 1 b 1999-08-02 c 1 2 2014-06-22 b 8 3 2014-07-02 5 4 c 2015-01-01 3
Comments
Post a Comment