Skip to content Skip to sidebar Skip to footer

Merging Two Dataframes In Pandas Based On Time-range Difference

I have these two dataframes, df1,df2. df1: dateTime userId session 2018-08-30 02:20:19 2233 1 2018-08-30 05:32:10 1933 1 2018-08-30 09:10:39

Solution 1:

IIUC: Use pandas.merge_asof

pd.merge_asof(
    df1, df2,
    left_on='dateTime',
    right_on='clickTime',
    by='userId',
    direction='nearest'
)

             dateTime  userId  session           clickTime  clickId
0 2018-08-30 02:20:19    2233        1 2018-08-30 02:21:09     1987
1 2018-08-30 05:32:10    1933        1 2018-08-30 05:33:10     2009
2 2018-08-30 09:10:39    2233        2 2018-08-30 02:32:09     1990
3 2018-08-30 10:26:59    2233        3 2018-08-30 02:32:09     1990
4 2018-08-30 11:56:25    4459        1 2018-08-30 11:57:25     3012
5 2018-08-30 12:30:55    4459        1 2018-08-30 11:58:55     3013

You can specify a tolerance on how far away to look

pd.merge_asof(
    df1, df2,
    left_on='dateTime',
    right_on='clickTime',
    by='userId',
    direction='nearest',
    tolerance=pd.Timedelta(15, unit='m')
)

             dateTime  userId  session           clickTime  clickId
0 2018-08-30 02:20:19    2233        1 2018-08-30 02:21:09   1987.0
1 2018-08-30 05:32:10    1933        1 2018-08-30 05:33:10   2009.0
2 2018-08-30 09:10:39    2233        2                 NaT      NaN
3 2018-08-30 10:26:59    2233        3                 NaT      NaN
4 2018-08-30 11:56:25    4459        1 2018-08-30 11:57:25   3012.0
5 2018-08-30 12:30:55    4459        1                 NaT      NaN

Post a Comment for "Merging Two Dataframes In Pandas Based On Time-range Difference"