python - Merge dataframes on nearest datetime / timestamp -


i have 2 data frames follows:

a = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]})  b = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] }) 

which following:

>>>   id       date 0  2014-06-22 1  2014-07-02 2  c 2015-01-01 3  b 1991-01-01 4  b 1999-08-02  >>> b   id       date value 0  2015-02-15     3 1  2014-06-30     5 2  c 1999-07-02     1 3  b 1990-10-05     7 4  b 2014-06-24     8 

i want merge values of b using nearest date. in example, none of dates match, the case do.

the output should this:

>>> c   id        date value 0   06/22/2014     8 1   07/02/2014     5 2  c  01/01/2015     3 3  b  01/01/1991     7 4  b  08/02/1999     1 

it seems me there should native function in pandas allow this.

note: similar question has been asked here pandas.merge: match nearest time stamp >= series of timestamps

you can use reindex method='nearest' , merge:

a['date'] = pd.to_datetime(a.date) b['date'] = pd.to_datetime(b.date) a.sort_values('date', inplace=true) b.sort_values('date', inplace=true)  b1 = b.set_index('date').reindex(a.set_index('date').index, method='nearest').reset_index() print (b1)  print (pd.merge(a,b1, on='date'))   id_x       date id_y value 0    b 1991-01-01    b     7 1    b 1999-08-02    c     1 2    2014-06-22    b     8 3    2014-07-02        5 4    c 2015-01-01        3 

you can add parameter suffixes:

print (pd.merge(a,b1, on='date', suffixes=('_', '')))   id_       date id value 0   b 1991-01-01  b     7 1   b 1999-08-02  c     1 2   2014-06-22  b     8 3   2014-07-02      5 4   c 2015-01-01      3 

Comments