Skip to content Skip to sidebar Skip to footer

How To Optimize Scikit One-class Training Time?

Essentially my questions is the same as SciKit One-class SVM classifier training time increases exponentially with size of training data, but no one has figured out the problem. It

Solution 1:

I'm very junior in this field, so take this with a grain of salt.

Isolation Forests appear to be an efficient solution for outlier detection. They have been shown to perform well against other popular algorithms [Liu, 2008]. Also, One-class SVMs are somewhat susceptible to anomalies according to scikit learn. The anomalies in your Class 1 could overlap with Class 2 and cause data to be mislabeled... perhaps taking subsets of your samples and using them to create an ensemble of SVMs could avoid this (and still save you time, depending on the size of the subsets), but Isolation Forests naturally do this.

For further reading, this seems like a good reference paper on the topic http://www.robots.ox.ac.uk/~davidc/pubs/NDreview2014.pdf

It mentions clustering and distance methods which may be applicable in your case. I think it's best to do a lot of reading and make sure you understand the different strengths/weaknesses of the algorithms. Especially since I'm in the process of doing that and really can't give solid advice even if I knew the specifics of your problem.

Note re:distance based algorithms. I know some are optimized, but I think the general complaint is that they have high computation complexity. Many clustering/distance/probability based algorithms also have weaknesses dealing with high dimensionality data.

Post a Comment for "How To Optimize Scikit One-class Training Time?"