python - Confusion re: pandas copy of slice of dataframe warning -


i've looked through bunch of questions , answers related issue, i'm still finding i'm getting copy of slice warning in places don't expect it. also, it's cropping in code running fine me previously, leading me wonder if sort of update may culprit.

for example, set of code i'm doing reading in excel file pandas dataframe, , cutting down set of columns included df[[]] syntax.

 izmir = pd.read_excel(filepath)  izmir_lim = izmir[['gender','age','mc_old_m>=60','mc_old_f>=60','mc_old_m>18','mc_old_f>18','mc_old_18>m>5','mc_old_18>f>5',                'mc_old_m_child<5','mc_old_f_child<5','mc_old_m>0<=1','mc_old_f>0<=1','date delivery','date insert','date of entery']] 

now, further changes make izmir_lim file raise copy of slice warning.

izmir_lim['age'] = izmir_lim.age.fillna(0) izmir_lim['age'] = izmir_lim.age.astype(int) 

/users/samlilienfeld/anaconda/lib/python3.5/site-packages/ipykernel/main.py:2: settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_indexer,col_indexer] = value instead

i'm confused because thought df[[]] column subsetting returned copy default. way i've found suppress errors explicitly adding df[[]].copy(). have sworn in past did not have , did not raise copy of slice error.

similarly, have other code runs function on dataframe filter in ways:

def lim(df): if (geography == "all"):     df_geo = df else:     df_geo = df[df.center_jo == geography]  df_date = df_geo[(df_geo.date_survey >= start_date) & (df_geo.date_survey <= end_date)]  return df_date  df_lim = lim(df) 

from point forward, changes make of values of df_lim raise copy of slice error. way around i've found change function call to:

df_lim = lim(df).copy() 

this seems wrong me. missing? seems these use cases should return copies default, , have sworn last time ran these scripts not running in these errors.
need start adding .copy() on place? seems there should cleaner way this. insight or appreciated.

 izmir = pd.read_excel(filepath)  izmir_lim = izmir[['gender','age','mc_old_m>=60','mc_old_f>=60',                     'mc_old_m>18','mc_old_f>18','mc_old_18>m>5',                     'mc_old_18>f>5','mc_old_m_child<5','mc_old_f_child<5',                     'mc_old_m>0<=1','mc_old_f>0<=1','date delivery',                     'date insert','date of entery']] 

izmir_lim view/copy of izmir. subsequently attempt assign it. throwing error. use instead:

 izmir_lim = izmir[['gender','age','mc_old_m>=60','mc_old_f>=60',                     'mc_old_m>18','mc_old_f>18','mc_old_18>m>5',                     'mc_old_18>f>5','mc_old_m_child<5','mc_old_f_child<5',                     'mc_old_m>0<=1','mc_old_f>0<=1','date delivery',                     'date insert','date of entery']].copy() 

whenever 'create' new dataframe in following fashion:

new_df = old_df[list_of_columns_names] 

new_df have truthy value in it's is_copy attribute. when attempt assign it, pandas throws settingwithcopywarning.

new_df.iloc[0, 0] = 1  # should throw error 

you can overcome in several ways.

option #1

new_df = old_df[list_of_columns_names].copy() 

option #2 (as @ayhan suggested in comments)

new_df = old_df[list_of_columns_names] new_df.is_copy = none 

option #3

new_df = old_df.loc[:, list_of_columns_names] 

Comments