given following data frame:
import pandas pd df=pd.dataframe({'a':[0,4,4,4], 'b':[0,4,4,0], 'c':[0,4,4,4], 'd':[4,0,0,4], 'e':[4,0,0,0], 'name':['a','a','b','c']}) df b c d e name 0 0 0 0 4 4 1 4 4 4 0 0 2 4 4 4 0 0 b 3 4 0 4 4 0 c
i'd add new field called "match_flag" labels unique combinations of rows if have complementary 0 patterns (as rows 0, 1, , 2) , have same name (just rows 0 , 1). uses name of rows match.
the desired result follows:
b c d e name match_flag 0 0 0 0 4 4 1 4 4 4 0 0 2 4 4 4 0 0 b nan 3 4 0 4 4 0 c nan
caveat: patterns may vary, should still complementary.
thanks in advance!
update
sorry confusion. here clarification:
the reason why rows 0 , 1 "complementary" have opposite patterns of zeros in columns; 0,0,0,4,4 vs, 4,4,4,0,0. number 4 arbitrary; 0,0,0,4,2 , 65,770,23,0,0. if 2 such rows indeed complementary , have same name, i'd them flagged same name under "match_flag" column.
you can identify compliment if it's dot product 0 , it's element wise sum zero.
def complements(df): v = df.drop('name', axis=1).values n = v.shape[0] row, col = np.triu_indices(n, 1) # ensure 2 rows complete # sum contains no zeros c = ((v[row] + v[col]) != 0).all(1) complete = set(row[c]).union(col[c]) # ensure 2 rows not overlap # product 0 everywhere o = (v[row] * v[col] == 0).all(1) non_overlap = set(row[o]).union(col[o]) # compliment iff # not overlap , complete complement = list(non_overlap.intersection(complete)) # return slice return df.name.iloc[complement]
then groupby('name')
, apply
our function
df['match_flag'] = df.groupby('name', group_keys=false).apply(complements)
Comments
Post a Comment