Skip to content Skip to sidebar Skip to footer

How To Group A Pandas Dataframe Which Has A List Of Combinations?

I have a pandas dataframe which has results of record similarity. For example, rowid 123 is similar to rowid 512 and rowid 123 is similar to 681. Technically, all three rows are si

Solution 1:

You could use networkx to determine connected groups.

In [750]: import networkx as nx

In [751]: G = nx.from_pandas_dataframe(df, 'A', 'B')  # Create the graph

In [752]: Gcc = nx.connected_components(G)

In [753]: pd.DataFrame([{'id': i, 'group': 'group%s' % (g+1)}
     ...:               for g, ids in enumerate(Gcc) for i in ids])
Out[753]:
    group   id
0  group1  512
1  group1  681
2  group1  123
3  group2  536
4  group2  412
5  group2  919

Post a Comment for "How To Group A Pandas Dataframe Which Has A List Of Combinations?"