Changelog
[1.0.0rc1] - 2022-06-13
This is a release candidate before the upcoming 1.0.0 release, which mainly consists of cleanup work.
Changed
Datasets:
SklearnDataset
now checks if the dimensions of the features and labels match.
Query Strategies:
ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
Documentation:
The documentation is now available in full width.
[1.0.0b4] - 2022-05-04
Added
General:
We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
Datasets:
The
Dataset
interface now has aclone()
method that creates an identical copy of the respective dataset.
Query Strategies:
New strategies: DiscriminativeActiveLearning and SEALS.
Changed
Datasets:
Separated the previous
DatasetView
implementation into interface (DatasetView
) and implementation (SklearnDatasetView
).Added
clone()
method which creates an identical copy of the dataset.
Query Strategies:
EmbeddingBasedQueryStrategy
now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
Code examples:
Code structure was unified.
Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data
:Method
get_class_weights()
now scales the resulting multi-class weights so that the smallest class weight is equal to1.0
.
[1.0.0b3] - 2022-03-06
Added
New query strategy: ContrastiveActiveLearning.
Added Reproducibility Notes.
Changed
Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous
x_
notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.PoolBasedActiveLearner
:attribute
x_indices_labeled
was renamed toindices_labeled
attribute
x_indices_ignored
was unified toindices_ignored
attribute
queried_indices
was unified toindices_queried
attribute
_x_index_to_position
was named to_index_to_position
arguments
x_indices_initial
,x_indices_ignored
, andx_indices_validation
were renamed toindices_initial
,indices_ignored
, andindices_validation
. This affects most methods of thePoolBasedActiveLearner
.
QueryStrategy
old:
query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
new:
query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
StoppingCriterion
old:
stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
new:
stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
Renamed environment variable which sets the small-text temp folder from
ALL_TMP
toSMALL_TEXT_TEMP
[1.0.0b2] - 2022-02-22
Bugfix release.
Fixed
Fix links to the documentation in README.md and notebooks.
[1.0.0b1] - 2022-02-22
First beta release with multi-label functionality and stopping criteria.
Added
Added a changelog.
All provided classifiers are now capable of multi-label classification.
Changed
Documentation has been overhauled considerably.
PoolBasedActiveLearner
: Renamedincremental_training
kwarg toreuse_model
.SklearnClassifier
: Changed__init__(clf)
to__init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory
:__init__(clf_template, kwargs={})
to__init__(base_estimator, num_classes, kwargs={})
.Refactored
KimCNNClassifier
andTransformerBasedClassification
.
Removed
Removed
device
kwarg fromPytorchDataset.__init__()
,PytorchTextClassificationDataset.__init__()
andTransformersDataset.__init__()
.