Changelog
Version 1.4.1 - 2024-08-18
Fixed
Fixed an out of bounds error that occurred when
DiscriminativeActiveLearning
queries all remaining unlabeled data.Fixed typos/wording in PoolBasedActiveLearner docstrings.
Pinned SetFit version in notebook example. (#64)
Fixed an out of bounds error that could occur in
SetFitClassification
for both 32bit systems and Windows. (#66)Fixed errors in notebook examples that occurred with more recent seaborn / matplotlib versions.
Changed
Documentation: added links to bibliography. (#65)
Contributors
Version 1.4.0 - 2024-06-09
Added
New query strategy: AnchorSubsampling.
Fixed
Changed the way how the seed is controlled in
SetFitClassification
since the seed was fixed unless explicitly set via the respective trainer keyword argument.
Changed
Documentation: added a section where compatible transformer models are listed.
Documentation: updated showcase section.
Version 1.3.3 - 2023-12-29
Changed
An errata section was added to the documentation.
Fixed
Contributors
Version 1.3.2 - 2023-08-19
Fixed
Fixed a bug in
TransformerBasedClassification
, wherevalidations_per_epoch>=2
left the model in eval mode. (#40)
Version 1.3.1 - 2023-07-22
Fixed
Contributors
Version 1.3.0 - 2023-02-21
Added
Added dropout sampling to SetFitClassification.
Fixed
Fixed broken link in README.md.
Fixed typo in README.md. (#26)
Changed
The ClassificationChange stopping criterion now supports multi-label classification.
Documentation:
Updated the active learning setup figure.
The documentation of integrations has been reorganized.
Contributors
Version 1.2.0 - 2023-02-04
Added
Added new classifier: SetFitClassification which wraps huggingface/setfit.
Active Learner:
PoolBasedActiveLearner now handles keyword arguments passed to the classifier’s
fit()
during theupdate()
step.
Query Strategies:
New strategy: BALD.
SubsamplingQueryStrategy now uses the remaining unlabeled pool when more samples are requested than are available.
Notebook Examples:
Revised both existing notebook examples.
Added a notebook example for active learning with SetFit classifiers.
Added a notebook example for cold start initialization with SetFit classifiers.
Documentation:
A showcase section has been added to the documentation.
Fixed
Distances in lightweight_coreset were not correctly projected onto the [0, 1] interval (but ranking was unaffected).
Changed
Coreset implementations now use the distance-based (as opposed to the similarity-based) formulation.
Version 1.1.1 - 2022-10-14
Fixed
Model selection raised an error in cases where no model was available for selection (#21).
Version 1.1.0 - 2022-10-01
Added
General:
Small-Text package is now available via conda-forge.
Imports have been reorganized. You can import all public classes and methods from the top-level package (
small_text
):from small_text import PoolBasedActiveLearner
Classification:
All classifiers now support weighting of training samples.
Early stopping has been reworked, improved, and documented (#18).
Model selection has been reworked and documented.
[!]
KimCNNClassifier.__init()__
: The default value of the (now deprecated) keyword argumentearly_stopping_acc
has been changed from0.98
to-1
in order to matchTransformerBasedClassification
.[!] Removed weight renormalization after gradient clipping.
Datasets:
The
target_labels
keyword argument in__init()__
will now raise a warning if not passed.Added
from_arrays()
toSklearnDataset
,PytorchTextClassificationDataset
, andTransformersDataset
to construct datasets more conveniently.
Query Strategies:
New multi-label strategy: CategoryVectorInconsistencyAndRanking.
Stopping Criteria:
New stopping criteria: ClassificationChange, OverallUncertainty, and MaxIterations.
Deprecated
small_text.integrations.pytorch.utils.misc.default_tensor_type()
is deprecated without replacement (#2).TransformerBasedClassification
andKimCNNClassifier
: The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to__init__()
are now deprecated. Use theearly_stopping
keyword argument in thefit()
method instead (#18).
Fixed
Classification:
KimCNNClassifier.fit()
andTransformerBasedClassification.fit()
now correctly process thescheduler
keyword argument (#16).
Removed
Removed the strict check that every target label has to occur in the training data. (This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.)
Version 1.0.1 - 2022-09-12
Minor bug fix release.
Fixed
Links to notebooks and code examples will now always point to the latest release instead of the latest main branch.
Version 1.0.0 - 2022-06-14
First stable release.
Changed
Datasets:
SklearnDataset
now checks if the dimensions of the features and labels match.
Query Strategies:
ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
Documentation:
The documentation is now available in full width.
Repository:
Versions in this can now be referenced using the respective Zenodo DOI.
[1.0.0b4] - 2022-05-04
Added
General:
We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
Datasets:
The
Dataset
interface now has aclone()
method that creates an identical copy of the respective dataset.
Query Strategies:
New strategies: DiscriminativeActiveLearning and SEALS.
Changed
Datasets:
Separated the previous
DatasetView
implementation into interface (DatasetView
) and implementation (SklearnDatasetView
).Added
clone()
method which creates an identical copy of the dataset.
Query Strategies:
EmbeddingBasedQueryStrategy
now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
Code examples:
Code structure was unified.
Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data
:Method
get_class_weights()
now scales the resulting multi-class weights so that the smallest class weight is equal to1.0
.
[1.0.0b3] - 2022-03-06
Added
New query strategy: ContrastiveActiveLearning.
Added Reproducibility Notes.
Changed
Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous
x_
notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.PoolBasedActiveLearner
:attribute
x_indices_labeled
was renamed toindices_labeled
attribute
x_indices_ignored
was unified toindices_ignored
attribute
queried_indices
was unified toindices_queried
attribute
_x_index_to_position
was named to_index_to_position
arguments
x_indices_initial
,x_indices_ignored
, andx_indices_validation
were renamed toindices_initial
,indices_ignored
, andindices_validation
. This affects most methods of thePoolBasedActiveLearner
.
QueryStrategy
old:
query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
new:
query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
StoppingCriterion
old:
stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
new:
stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
Renamed environment variable which sets the small-text temp folder from
ALL_TMP
toSMALL_TEXT_TEMP
[1.0.0b2] - 2022-02-22
Bugfix release.
Fixed
Fix links to the documentation in README.md and notebooks.
[1.0.0b1] - 2022-02-22
First beta release with multi-label functionality and stopping criteria.
Added
Added a changelog.
All provided classifiers are now capable of multi-label classification.
Changed
Documentation has been overhauled considerably.
PoolBasedActiveLearner
: Renamedincremental_training
kwarg toreuse_model
.SklearnClassifier
: Changed__init__(clf)
to__init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory
:__init__(clf_template, kwargs={})
to__init__(base_estimator, num_classes, kwargs={})
.Refactored
KimCNNClassifier
andTransformerBasedClassification
.
Removed
Removed
device
kwarg fromPytorchDataset.__init__()
,PytorchTextClassificationDataset.__init__()
andTransformersDataset.__init__()
.