Highlights:
Missing values in features, represented by NaNs, are now accepted in
column-wise preprocessing such as scalers. Each feature is fitted
disregarding NaNs, and data containing NaNs can be transformed. The
new impute module provides estimators for learning despite missing
data.
ColumnTransformer handles the case where different features or columns
of a pandas.DataFrame need different preprocessing. String or pandas
Categorical columns can now be encoded with OneHotEncoder or
OrdinalEncoder.
TransformedTargetRegressor helps when the regression target needs to
be transformed to be modeled. PowerTransformer and KBinsDiscretizer
join QuantileTransformer as non-linear transformations.
Added sample_weight support to several estimators (including KMeans,
BayesianRidge and KernelDensity) and improved stopping criteria in
others (including MLPRegressor, GradientBoostingRegressor and
SGDRegressor).
This release is also the first to be accompanied by a Glossary of
Common Terms and API Elements.
Notable new features since 0.18.2:
- `neighbors.LocalOutlierFactor` for anomaly detection
- `preprocessing.QuantileTransformer` for robust feature transformation
- `multioutput.ClassifierChain` meta-estimator to simply account
for dependencies between classes in multilabel problem
- multiplicative update in `decomposition.NMF`
- multinomial `linear_model.LogisticRegression` with L1 loss
Packaged by Filip Hajny and updated by Kamel Derouiche and me.
scikit-learn is a Python module integrating classic machine learning
algorithms in the tightly-knit scientific Python world (numpy, scipy,
matplotlib). It aims to provide simple and efficient solutions to
learning problems, accessible to everybody and reusable in various
contexts: machine-learning as a versatile tool for science and
engineering.