i have pandas dataframe 2 dates in them. want take difference in days between them. resulting difference looks string ex ('7 days'). there way change integer date difference?
y['datepulled'] = pd.to_datetime(y['datepulled']) y['dates'] = pd.to_datetime(y['dates']) y['datediff'] = y['datepulled'] - y['dates'] y['datediff'] 0 7 days 1 6 days 2 5 days 3 4 days 4 3 days 5 2 days 6 1 days
you can use:
(y['datediff'] / np.timedelta64(1, 'd')).astype(int)
or:
y['datediff'].dt.days
sample:
import pandas pd import numpy np y = pd.dataframe({ 'datepulled': ['2016-01-05','2016-01-04'], 'dates': ['2016-01-01','2016-01-02']}) y['datepulled'] = pd.to_datetime(y['datepulled']) y['dates'] = pd.to_datetime(y['dates']) y['datediff'] = y['datepulled'] - y['dates'] print (y) #output float, cast int y['datediff1'] = (y['datediff'] / np.timedelta64(1, 'd')).astype(int) y['datediff2'] = y['datediff'].dt.days print (y) dates datepulled datediff datediff1 datediff2 0 2016-01-01 2016-01-05 4 days 4 4 1 2016-01-02 2016-01-04 2 days 2 2
in larger dataframe first method faster:
y = pd.concat([y]*1000).reset_index(drop=true) in [236]: %timeit (y['datediff'] / np.timedelta64(1, 'd')).astype(int) 1000 loops, best of 3: 789 µs per loop in [237]: %timeit y['datediff'].dt.days 100 loops, best of 3: 15.3 ms per loop
Comments
Post a Comment