La regresión logística aleatoria de Sklearn da error “ValueError: El número de clases debe ser mayor que uno”

Descubrí lo que parece ser un error en sklearn.RandomizedLogistic, y como me tomó mucho tiempo resolverlo, lo publicaré aquí en caso de que otros tengan el mismo problema.

Lo que sucede es: en datos perfectamente formateados, sklearn.RandomizedLogistic reclamaciones “ValueError: El número de clases debe ser mayor que uno”.

Resulta que esto sucede cuando los datos de entrada tienen menos de 9 instancias de entrenamiento:

>>>sklearn.__version__ '0.15-git' >>> randomized_logistic.fit(X[0:10, :], y[0:10]) RandomizedLogisticRegression(C=1, fit_intercept=True, memory=Memory(cachedir=None), n_jobs=1, n_resampling=200, normalize=True, pre_dispatch='3*n_jobs', random_state=None, sample_fraction=0.75, scaling=0.5, selection_threshold=0.25, tol=0.001, verbose=False) >>> randomized_logistic.fit(X[0:9, :], y[0:9]) Traceback (most recent call last): File "", line 1, in  File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/randomized_l1.py", line 109, in fit sample_fraction=self.sample_fraction, **params) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__ return self.func(*args, **kwargs) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/randomized_l1.py", line 51, in _resample_model for _ in range(n_resampling)): File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/externals/joblib/parallel.py", line 644, in __call__ self.dispatch(function, args, kwargs) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/externals/joblib/parallel.py", line 391, in dispatch job = ImmediateApply(func, args, kwargs) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/externals/joblib/parallel.py", line 129, in __init__ self.results = func(*args, **kwargs) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/randomized_l1.py", line 355, in _randomized_logistic clf.fit(X, y) File "/Users/isaac/Library/Python/2.7/lib/python/site-packages/sklearn/svm/base.py", line 676, in fit raise ValueError("The number of classes has to be greater than" ValueError: The number of classes has to be greater than one. >>> X array([[1, 1, 1], [2, 1, 0], [3, 1, 1], [1, 2, 0], [2, 2, 1], [3, 2, 0], [1, 3, 1], [2, 3, 0], [3, 3, 1], [1, 4, 0], [2, 4, 1], [3, 4, 6]]) >>> y array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])