Skip to content Skip to sidebar Skip to footer

Sklearn Multilabelbinarizer() Error When Using For Production

Edit: I have changed the code , from mlb to TfIdfVectorizer(). Still I am facing a problem. Please see below my code. from sklearn.externals import joblib from sklearn.preprocessin

Solution 1:

The issue is you are not saving any model on your path. Let's forget the GridSearch here

from sklearn.externals import joblib
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import SGDClassifier
from sklearn.multiclassimport OneVsRestClassifier

dataset = pd.DataFrame({'X': ['How to resent my Password',
                              'Where to See the next Road',
                              'What is my next topic',
                              'Can I move without pass']*10, 
                        'Y': [['Pass','ResetPass'], ['Direction','NaN'], ['Topic','Class'], ['Pass','MovePass']]*10})

mlb = MultiLabelBinarizer()
X, Y = dataset['X'], mlb.fit_transform(dataset['Y'])
X_Train, X_Test, Y_Train, y_test = train_test_split(X, Y, random_state=0, test_size=0.33, shuffle=True)

clf = SGDClassifier(loss='hinge', penalty='l2', 
                    alpha=1e-3, random_state=42, 
                    max_iter=5, tol=None)
text_clf = Pipeline([('vect', TfidfVectorizer()), 
                     ('clf', OneVsRestClassifier(clf))])

text_clf.fit(X, Y) ### new line here
predict = text_clf.predict(X_Test)
predict_label = mlb.inverse_transform(predict)

joblib.dump(text_clf, 'PATHTO/model_mlb.pkl') #save the good model
joblib.dump(mlb, 'PATHTO/mlb.pkl') # save the MLB

model = joblib.load('PATHTO/model_mlb.pkl')
mlb = joblib.load('PATHTO/mlb.pkl') # load the MLB
new_input = 'How to resent my Password'
pred = model.predict([new_input]) ## tfidf in your pipeline
pred = mlb.inverse_transform(pred)

And this returns

[('Pass', 'ResetPass')]

as in your train test

And if you want your grid search to be save just save the fit (= grid.fit())

Post a Comment for "Sklearn Multilabelbinarizer() Error When Using For Production"