Changelog

[1.0.0rc1] - 2022-06-13

This is a release candidate before the upcoming 1.0.0 release, which mainly consists of cleanup work.

Datasets:
- SklearnDataset now checks if the dimensions of the features and labels match.
Query Strategies:
- ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
Documentation:
- The documentation is now available in full width.

General:
- We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
Datasets:
- The Dataset interface now has a clone() method that creates an identical copy of the respective dataset.
Query Strategies:
- New strategies: DiscriminativeActiveLearning and SEALS.

Datasets:
- Separated the previous DatasetView implementation into interface (DatasetView) and implementation (SklearnDatasetView).
- Added clone() method which creates an identical copy of the dataset.
Query Strategies:
- EmbeddingBasedQueryStrategy now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
Code examples:
- Code structure was unified.
- Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data:
- Method get_class_weights() now scales the resulting multi-class weights so that the smallest class weight is equal to 1.0.

Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous x_ notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.
- PoolBasedActiveLearner:
  - attribute x_indices_labeled was renamed to indices_labeled
  - attribute x_indices_ignored was unified to indices_ignored
  - attribute queried_indices was unified to indices_queried
  - attribute _x_index_to_position was named to _index_to_position
  - arguments x_indices_initial, x_indices_ignored, and x_indices_validation were renamed to indices_initial, indices_ignored, and indices_validation. This affects most methods of the PoolBasedActiveLearner.
- QueryStrategy
  - old: query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
  - new: query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
- StoppingCriterion
  - old: stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
  - new: stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
Renamed environment variable which sets the small-text temp folder from ALL_TMP to SMALL_TEXT_TEMP

Bugfix release.

First beta release with multi-label functionality and stopping criteria.

Documentation has been overhauled considerably.
PoolBasedActiveLearner: Renamed incremental_training kwarg to reuse_model.
SklearnClassifier: Changed __init__(clf) to __init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory: __init__(clf_template, kwargs={}) to __init__(base_estimator, num_classes, kwargs={}).
Refactored KimCNNClassifier and TransformerBasedClassification.

Removed device kwarg from PytorchDataset.__init__(), PytorchTextClassificationDataset.__init__() and TransformersDataset.__init__().