Skip to content Skip to sidebar Skip to footer

Using Csr_matrix Of Items Similarities To Get Most Similar Items To Item X Without Having To Transform Csr_matrix To Dense Matrix

I have a purchase data (df_temp). I managed to replace using Pandas Dataframe to using a sparse csr_matrix because I have lots of products (89000) which I have to get their user-it

Solution 1:

I finally understood how I can get the 5 most similar items to each products and this is by using .tolil() matrix and then convert each row to a numpy array and use argsort to get the 5 most similar items. I used @hpaulj solution suggested in this link.

defmax_n(row_data, row_indices, n):
        i = row_data.argsort()[-n:]
        # i = row_data.argpartition(-n)[-n:]
        top_values = row_data[i]
        top_indices = row_indices[i]  # do the sparse indices matter?return top_values, top_indices, i

and then I applied it to one row for testing:

top_v, top_ind, ind = max_n(np.array(arr_ll.data[0]),np.array(arr_ll.rows[0]),5)

What I need is the top_indices which are the indices of the 5 most similar products, but those indices are not the real product_id. I mapped them when I constructed the coo_matrix

rows, r_pos = np.unique(ar1['product_id'], return_inverse=True)

But how to get the real product_id back from the indices?

Now for example I have:

top_ind = [21349123]

How to know 2 correspond to what product_id, 1 to what, etc?

Post a Comment for "Using Csr_matrix Of Items Similarities To Get Most Similar Items To Item X Without Having To Transform Csr_matrix To Dense Matrix"