Changelog

Version 1.1.0 - 2022-10-01

General:
- Small-Text package is now available via conda-forge.
- Imports have been reorganized. You can import all public classes and methods from the top-level package (small_text):
```
from small_text import PoolBasedActiveLearner
```
Classification:
- All classifiers now support weighting of training samples.
- Early stopping has been reworked, improved, and documented (#18).
- Model selection has been reworked and documented.
- [!] KimCNNClassifier.__init()__: The default value of the (now deprecated) keyword argument early_stopping_acc has been changed from 0.98 to -1 in order to match TransformerBasedClassification.
- [!] Removed weight renormalization after gradient clipping.
Datasets:
- The target_labels keyword argument in __init()__ will now raise a warning if not passed.
- Added from_arrays() to SklearnDataset, PytorchTextClassificationDataset, and TransformersDataset to construct datasets more conveniently.
Query Strategies:
- New multi-label strategy: CategoryVectorInconsistencyAndRanking
Stopping Criteria:
- New stopping criteria: ClassificationChange, OverallUncertainty, and MaxIterations.

small_text.integrations.pytorch.utils.misc.default_tensor_type() is deprecated without replacement (#2).
TransformerBasedClassification and KimCNNClassifier: The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to __init__() are now deprecated. Use the early_stopping keyword argument in the fit() method instead (#18).

Classification:
- KimCNNClassifier.fit() and TransformerBasedClassification.fit() now correctly process the scheduler keyword argument (#16).

Removed the strict check that every target label has to occur in the training data.
- This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.

Minor bug fix release.

Links to notebooks and code examples will now always point to the latest release instead of the latest main branch.

First stable release.

Datasets:
- SklearnDataset now checks if the dimensions of the features and labels match.
Query Strategies:
- ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
Documentation:
- The documentation is now available in full width.
Repository:
- Versions in this can now be referenced using the respective Zenodo DOI.

General:
- We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
Datasets:
- The Dataset interface now has a clone() method that creates an identical copy of the respective dataset.
Query Strategies:
- New strategies: DiscriminativeActiveLearning and SEALS.

Datasets:
- Separated the previous DatasetView implementation into interface (DatasetView) and implementation (SklearnDatasetView).
- Added clone() method which creates an identical copy of the dataset.
Query Strategies:
- EmbeddingBasedQueryStrategy now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
Code examples:
- Code structure was unified.
- Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data:
- Method get_class_weights() now scales the resulting multi-class weights so that the smallest class weight is equal to 1.0.

Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous x_ notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.
- PoolBasedActiveLearner:
  - attribute x_indices_labeled was renamed to indices_labeled
  - attribute x_indices_ignored was unified to indices_ignored
  - attribute queried_indices was unified to indices_queried
  - attribute _x_index_to_position was named to _index_to_position
  - arguments x_indices_initial, x_indices_ignored, and x_indices_validation were renamed to indices_initial, indices_ignored, and indices_validation. This affects most methods of the PoolBasedActiveLearner.
- QueryStrategy
  - old: query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
  - new: query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
- StoppingCriterion
  - old: stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
  - new: stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
Renamed environment variable which sets the small-text temp folder from ALL_TMP to SMALL_TEXT_TEMP

Bugfix release.

First beta release with multi-label functionality and stopping criteria.

Documentation has been overhauled considerably.
PoolBasedActiveLearner: Renamed incremental_training kwarg to reuse_model.
SklearnClassifier: Changed __init__(clf) to __init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory: __init__(clf_template, kwargs={}) to __init__(base_estimator, num_classes, kwargs={}).
Refactored KimCNNClassifier and TransformerBasedClassification.

Removed device kwarg from PytorchDataset.__init__(), PytorchTextClassificationDataset.__init__() and TransformersDataset.__init__().