Changelog
Version 1.1.0 - 2022-10-01
Added
General:
Small-Text package is now available via conda-forge.
Imports have been reorganized. You can import all public classes and methods from the top-level package (
small_text
):from small_text import PoolBasedActiveLearner
Classification:
All classifiers now support weighting of training samples.
Early stopping has been reworked, improved, and documented (#18).
Model selection has been reworked and documented.
[!]
KimCNNClassifier.__init()__
: The default value of the (now deprecated) keyword argumentearly_stopping_acc
has been changed from0.98
to-1
in order to matchTransformerBasedClassification
.[!] Removed weight renormalization after gradient clipping.
Datasets:
The
target_labels
keyword argument in__init()__
will now raise a warning if not passed.Added
from_arrays()
toSklearnDataset
,PytorchTextClassificationDataset
, andTransformersDataset
to construct datasets more conveniently.
Query Strategies:
New multi-label strategy: CategoryVectorInconsistencyAndRanking
Stopping Criteria:
New stopping criteria: ClassificationChange, OverallUncertainty, and MaxIterations.
Deprecated
small_text.integrations.pytorch.utils.misc.default_tensor_type()
is deprecated without replacement (#2).TransformerBasedClassification
andKimCNNClassifier
: The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to__init__()
are now deprecated. Use theearly_stopping
keyword argument in thefit()
method instead (#18).
Fixed
Classification:
KimCNNClassifier.fit()
andTransformerBasedClassification.fit()
now correctly process thescheduler
keyword argument (#16).
Removed
Removed the strict check that every target label has to occur in the training data.
This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.
Version 1.0.1 - 2022-09-12
Minor bug fix release.
Fixed
Links to notebooks and code examples will now always point to the latest release instead of the latest main branch.
Version 1.0.0 - 2022-06-14
First stable release.
Changed
Datasets:
SklearnDataset
now checks if the dimensions of the features and labels match.
Query Strategies:
ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
Documentation:
The documentation is now available in full width.
Repository:
Versions in this can now be referenced using the respective Zenodo DOI.
[1.0.0b4] - 2022-05-04
Added
General:
We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
Datasets:
The
Dataset
interface now has aclone()
method that creates an identical copy of the respective dataset.
Query Strategies:
New strategies: DiscriminativeActiveLearning and SEALS.
Changed
Datasets:
Separated the previous
DatasetView
implementation into interface (DatasetView
) and implementation (SklearnDatasetView
).Added
clone()
method which creates an identical copy of the dataset.
Query Strategies:
EmbeddingBasedQueryStrategy
now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
Code examples:
Code structure was unified.
Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data
:Method
get_class_weights()
now scales the resulting multi-class weights so that the smallest class weight is equal to1.0
.
[1.0.0b3] - 2022-03-06
Added
New query strategy: ContrastiveActiveLearning.
Added Reproducibility Notes.
Changed
Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous
x_
notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.PoolBasedActiveLearner
:attribute
x_indices_labeled
was renamed toindices_labeled
attribute
x_indices_ignored
was unified toindices_ignored
attribute
queried_indices
was unified toindices_queried
attribute
_x_index_to_position
was named to_index_to_position
arguments
x_indices_initial
,x_indices_ignored
, andx_indices_validation
were renamed toindices_initial
,indices_ignored
, andindices_validation
. This affects most methods of thePoolBasedActiveLearner
.
QueryStrategy
old:
query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
new:
query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
StoppingCriterion
old:
stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
new:
stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
Renamed environment variable which sets the small-text temp folder from
ALL_TMP
toSMALL_TEXT_TEMP
[1.0.0b2] - 2022-02-22
Bugfix release.
Fixed
Fix links to the documentation in README.md and notebooks.
[1.0.0b1] - 2022-02-22
First beta release with multi-label functionality and stopping criteria.
Added
Added a changelog.
All provided classifiers are now capable of multi-label classification.
Changed
Documentation has been overhauled considerably.
PoolBasedActiveLearner
: Renamedincremental_training
kwarg toreuse_model
.SklearnClassifier
: Changed__init__(clf)
to__init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory
:__init__(clf_template, kwargs={})
to__init__(base_estimator, num_classes, kwargs={})
.Refactored
KimCNNClassifier
andTransformerBasedClassification
.
Removed
Removed
device
kwarg fromPytorchDataset.__init__()
,PytorchTextClassificationDataset.__init__()
andTransformersDataset.__init__()
.