Python How to extract specified string within [ ] brackets in pandas dataframe and create a new column with boolean values -
i'm new programming , appreciate of insights!
i have data frame this.
df;
info price 0 [100:sailing] $100 1 [150:boating, 100:sailing] $200 2 [200:surfing] $300
i create new columns activity names based on information in info column , add 1 in new column if there corresponding name in info column. going dataframe below.
price sailing boating surfing 0 $100 1 0 0 1 $200 1 1 0 2 $300 0 0 1
i tried code blow did not work..(eventhough approach works in other columns)
df1 = df.info.str.extract(r'(boating|sailing|surfing)',expand=false) df2 = pd.concat([df,pd.get_dummies(df1).astype(int)],axis=1)
i have on 10 thousands of data idealy write code automatically extract specified string (like surfing) in info column, create new column activity name , return 1 or 0 shown above. thought maybe brackets in data or data type in dataframe causing problem, not sure how tackle this..
i assumed format of values in info column python list.
df1 = df['info'].str[1:-1].str.replace(' ', '').str.get_dummies(',') df1.rename(columns=lambda x: x.rsplit(':')[-1], inplace=true) df2 = pd.concat([df, df1.astype(int)], axis=1) df2 out: info price sailing boating surfing 0 [100:sailing] $100 1 0 0 1 [150:boating, 100:sailing] $200 1 1 0 2 [200:surfing] $300 0 0 1
Comments
Post a Comment