Valueerror: Dimension Mismatch

October 27, 2023 Post a Comment

I use SciPy and scikit-learn to train and apply a Multinomial Naive Bayes Classifier for binary text classification. Precisely, I use the module sklearn.feature_extraction.text.Cou

Solution 1:

Sounds to me, like you just need to use vectorizer.transform for the test dataset, since the training dataset fixes the vocabulary (you cannot know the full vocabulary including the training set afterall). Just to be clear, thats vectorizer.transform instead of vectorizer.fit_transform.

Solution 2:

Another solution will be using vector.vocabulary

# after trainning the data
vector = CountVectorizer()
vector.fit(self.x_data)
training_data = vector.transform(self.x_data)
bayes = MultinomialNB()
bayes.fit(training_data, y_data)

# use vector.vocabulary for predict
vector = CountVectorizer(vocabulary=vector.vocabulary_) #vocabulary is a parameter, it should be vocabulary_ as it is an attribute.
text_vector = vector.transform(text)
trained_model.predict_prob(text_vector)

Introduction to Python Course

Valueerror: Dimension Mismatch

Solution 1:

Solution 2:

Post a Comment for "Valueerror: Dimension Mismatch"