Using Csr_matrix Of Items Similarities To Get Most Similar Items To Item X Without Having To Transform Csr_matrix To Dense Matrix
I have a purchase data (df_temp). I managed to replace using Pandas Dataframe to using a sparse csr_matrix because I have lots of products (89000) which I have to get their user-it
Solution 1:
I finally understood how I can get the 5 most similar items to each products and this is by using .tolil()
matrix and then convert each row to a numpy array and use argsort
to get the 5 most similar items. I used @hpaulj solution suggested in this link.
defmax_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?return top_values, top_indices, i
and then I applied it to one row for testing:
top_v, top_ind, ind = max_n(np.array(arr_ll.data[0]),np.array(arr_ll.rows[0]),5)
What I need is the top_indices
which are the indices of the 5 most similar products, but those indices are not the real product_id
. I mapped them when I constructed the coo_matrix
rows, r_pos = np.unique(ar1['product_id'], return_inverse=True)
But how to get the real product_id
back from the indices?
Now for example I have:
top_ind = [21349123]
How to know 2
correspond to what product_id
, 1
to what, etc?
Post a Comment for "Using Csr_matrix Of Items Similarities To Get Most Similar Items To Item X Without Having To Transform Csr_matrix To Dense Matrix"