python 3.x - Pandas Flag Rows with Complementary Zeros -


given following data frame:

import pandas pd df=pd.dataframe({'a':[0,4,4,4],                  'b':[0,4,4,0],                  'c':[0,4,4,4],                  'd':[4,0,0,4],                  'e':[4,0,0,0],                  'name':['a','a','b','c']}) df       b   c   d   e   name 0   0   0   0   4   4   1   4   4   4   0   0   2   4   4   4   0   0   b 3   4   0   4   4   0   c 

i'd add new field called "match_flag" labels unique combinations of rows if have complementary 0 patterns (as rows 0, 1, , 2) , have same name (just rows 0 , 1). uses name of rows match.

the desired result follows:

      b   c   d   e   name  match_flag 0   0   0   0   4   4       1   4   4   4   0   0       2   4   4   4   0   0   b     nan 3   4   0   4   4   0   c     nan 

caveat: patterns may vary, should still complementary.

thanks in advance!

update

sorry confusion. here clarification:

the reason why rows 0 , 1 "complementary" have opposite patterns of zeros in columns; 0,0,0,4,4 vs, 4,4,4,0,0. number 4 arbitrary; 0,0,0,4,2 , 65,770,23,0,0. if 2 such rows indeed complementary , have same name, i'd them flagged same name under "match_flag" column.

you can identify compliment if it's dot product 0 , it's element wise sum zero.

def complements(df):     v = df.drop('name', axis=1).values     n = v.shape[0]     row, col = np.triu_indices(n, 1)      # ensure 2 rows complete     # sum contains no zeros     c = ((v[row] + v[col]) != 0).all(1)     complete = set(row[c]).union(col[c])      # ensure 2 rows not overlap     # product 0 everywhere     o = (v[row] * v[col] == 0).all(1)     non_overlap = set(row[o]).union(col[o])      # compliment iff     # not overlap , complete     complement = list(non_overlap.intersection(complete))      # return slice     return df.name.iloc[complement] 

then groupby('name') , apply our function

df['match_flag'] = df.groupby('name', group_keys=false).apply(complements) 

enter image description here


Comments