Tengo dos marcos de datos,
df1,
Names one two three Sri is a good player Ravi is a mentor Kumar is a cricketer
df2
values sri NaN sri, is kumar,cricketer
Estoy intentando obtener la fila en df1 que contiene todos los elementos en df2
Mi salida esperada es,
values Names sri Sri is a good player NaN sri, is Sri is a good player kumar,cricketer Kumar is a cricketer
lo intenté, df1["Names"].str.contains("|".join(df2["values"].values.tolist()))
pero no puedo lograr mi salida esperada como lo ha hecho (“,”). Por favor ayuda
Utilizando conjuntos
s1 = df1.Names.dropna() s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()] a1 = s1.values s2 = df2['values'].dropna() s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()] a2 = s2.values i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1) df2.assign(Names=pd.Series( np.append(df1.Names.values, np.nan)[i], s2.index )) values Names 0 sri Sri is a good player 1 NaN NaN 2 sri, is Sri is a good player 3 kumar,cricketer Kumar is a cricketer
import pandas as pd names = [ 'one two three', 'Sri is a good player', 'Ravi is a mentor', 'Kumar is a cricketer' ] values = [ 'sri', 'NaN', 'sri, is', 'kumar,cricketer', ] names = pd.Series(names) values = pd.DataFrame(values, columns=['values']) def foo(words): names_copy = names.copy() for word in words.split(','): names_copy = names_copy[names_copy.str.contains(word, case=False)] return names_copy.values values['names'] = values['values'].map(foo) values values names 0 sri [Sri is a good player] 1 NaN [] 2 sri, is [Sri is a good player] 3 kumar,cricketer [Kumar is a cricketer]