Version 0.21.2
Changelog
sklearn.decomposition
Fix Fixed a bug in cross_decomposition.CCA improving numerical stability when Y is close to zero.
sklearn.metrics
Fix Fixed a bug in metrics.pairwise.euclidean_distances where a part of the distance matrix was left un-instanciated for suffiently large float32 datasets (regression introduced in 0.21).
sklearn.preprocessing
Fix Fixed a bug in preprocessing.OneHotEncoder where the new drop parameter was not reflected in get_feature_names.
sklearn.utils.sparsefuncs
Fix Fixed a bug where min_max_axis would fail on 32-bit systems for certain large inputs. This affects preprocessing.MaxAbsScaler, preprocessing.normalize and preprocessing.LabelBinarizer.
Version 0.21.1
This is a bug-fix release to primarily resolve some packaging issues in version 0.21.0. It also includes minor documentation improvements and some bug fixes.
Changelog
sklearn.metrics
Fix Fixed a bug in metrics.pairwise_distances where it would raise AttributeError for boolean metrics when X had a boolean dtype and Y == None.
Fix Fixed two bugs in metrics.pairwise_distances when n_jobs > 1. First it used to return a distance matrix with same dtype as input, even for integer dtype. Then the diagonal was not zeros for euclidean metric when Y is X.
sklearn.neighbors
Fix Fixed a bug in neighbors.KernelDensity which could not be restored from a pickle if sample_weight had been used.
Version 0.21.0
Changed models
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
discriminant_analysis.LinearDiscriminantAnalysis for multiclass classification. Fix
discriminant_analysis.LinearDiscriminantAnalysis with ‘eigen’ solver. Fix
linear_model.BayesianRidge Fix
Decision trees and derived ensembles when both max_depth and max_leaf_nodes are set. Fix
linear_model.LogisticRegression and linear_model.LogisticRegressionCV with ‘saga’ solver. Fix
ensemble.GradientBoostingClassifier Fix
sklearn.feature_extraction.text.HashingVectorizer, sklearn.feature_extraction.text.TfidfVectorizer, and sklearn.feature_extraction.text.CountVectorizer Fix
neural_network.MLPClassifier Fix
svm.SVC.decision_function and multiclass.OneVsOneClassifier.decision_function. Fix
linear_model.SGDClassifier and any derived classifiers. Fix
Any model using the linear_model.sag.sag_solver function with a 0 seed, including linear_model.LogisticRegression, linear_model.LogisticRegressionCV, linear_model.Ridge, and linear_model.RidgeCV with ‘sag’ solver. Fix
linear_model.RidgeCV when using generalized cross-validation with sparse inputs
Highlights:
Missing values in features, represented by NaNs, are now accepted in
column-wise preprocessing such as scalers. Each feature is fitted
disregarding NaNs, and data containing NaNs can be transformed. The
new impute module provides estimators for learning despite missing
data.
ColumnTransformer handles the case where different features or columns
of a pandas.DataFrame need different preprocessing. String or pandas
Categorical columns can now be encoded with OneHotEncoder or
OrdinalEncoder.
TransformedTargetRegressor helps when the regression target needs to
be transformed to be modeled. PowerTransformer and KBinsDiscretizer
join QuantileTransformer as non-linear transformations.
Added sample_weight support to several estimators (including KMeans,
BayesianRidge and KernelDensity) and improved stopping criteria in
others (including MLPRegressor, GradientBoostingRegressor and
SGDRegressor).
This release is also the first to be accompanied by a Glossary of
Common Terms and API Elements.
Notable new features since 0.18.2:
- `neighbors.LocalOutlierFactor` for anomaly detection
- `preprocessing.QuantileTransformer` for robust feature transformation
- `multioutput.ClassifierChain` meta-estimator to simply account
for dependencies between classes in multilabel problem
- multiplicative update in `decomposition.NMF`
- multinomial `linear_model.LogisticRegression` with L1 loss
Packaged by Filip Hajny and updated by Kamel Derouiche and me.
scikit-learn is a Python module integrating classic machine learning
algorithms in the tightly-knit scientific Python world (numpy, scipy,
matplotlib). It aims to provide simple and efficient solutions to
learning problems, accessible to everybody and reusable in various
contexts: machine-learning as a versatile tool for science and
engineering.