currently, i'm using following loop based on if condition each month assign months numeric equivalents. seems quite efficient in terms of runtime, manual , ugly preferences.
how better executed? imagine it's possible improve on simplifying/condensing multiple if conditions somehow, using sort of translator made date conversions? each of preferable?
#make numeric month combined = combined.sort_values('month') combined.index = range(len(combined)) combined['month_numeric'] = none in combined['month'].unique(): first = combined['month'].searchsorted(i, side='left') last = combined['month'].searchsorted(i, side='right') first_num = list(first)[0] #gives first instance last_num = list(last)[0] #gives last instance if == 'january': combined['month_numeric'][first_num:last_num] = "01" elif == 'february': combined['month_numeric'][first_num:last_num] = "02" elif == 'march': combined['month_numeric'][first_num:last_num] = "03" elif == 'april': combined['month_numeric'][first_num:last_num] = "04" elif == 'may': combined['month_numeric'][first_num:last_num] = "05" elif == 'june': combined['month_numeric'][first_num:last_num] = "06" elif == 'july': combined['month_numeric'][first_num:last_num] = "07" elif == 'august': combined['month_numeric'][first_num:last_num] = "08" elif == 'september': combined['month_numeric'][first_num:last_num] = "09" elif == 'october': combined['month_numeric'][first_num:last_num] = "10" elif == 'november': combined['month_numeric'][first_num:last_num] = "11" elif == 'december': combined['month_numeric'][first_num:last_num] = "12"
you can use to_datetime
, month
, convert string , use zfill
:
print (pd.to_datetime(df['month'], format='%b').dt.month.astype(str).str.zfill(2))
sample:
import pandas pd df = pd.dataframe({ 'month': ['january','february', 'december']}) print (df) month 0 january 1 february 2 december print (pd.to_datetime(df['month'], format='%b').dt.month.astype(str).str.zfill(2)) 0 01 1 02 2 12 name: month, dtype: object
another solution map
dict d
:
d = {'january':'01','february':'02','december':'12'} print (df['month'].map(d)) 0 01 1 02 2 12 name: month, dtype: object
timings:
df = pd.dataframe({ 'month': ['january','february', 'december']}) print (df) df = pd.concat([df]*1000).reset_index(drop=true) print (pd.to_datetime(df['month'], format='%b').dt.month.astype(str).str.zfill(2)) print (df['month'].map({'january':'01','february':'02','december':'12'})) in [200]: %timeit (pd.to_datetime(df['month'], format='%b').dt.month.astype(str).str.zfill(2)) 100 loops, best of 3: 13.5 ms per loop in [201]: %timeit (df['month'].map({'january':'01','february':'02','december':'12'})) 1000 loops, best of 3: 462 µs per loop
Comments
Post a Comment