Stopping Criteria

Stopping criteria indicate when to exit the active learning loop.

Pre-implemented Stopping Criteria

DeltaFScore
ClassificationChange
KappaAverage
OverallUncertainty
MaxIterations

Interface

This interface is one of the trickiest, since you might stop on any information available within the active learning process (excluding experiment only information like the test set of course). Therefore all arguments here are optional and None by default, and the interface only provides a very loose frame on how stopping criteria should be built.

class StoppingCriterion(ABC):

    @abstractmethod
    def stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None):
        """
        Parameters
        ----------
        active_learner : small_text.active_learner.PoolBasedActiveLearner
            An active learner instance.
        predictions : np.ndarray[int]
            Predictions for a fixed subset (usually the full train set).
        proba : np.ndarray[float]
            Probability distribution over the possible classes for a fixed subset. This is expected
            to have the same length as `predictions` unless one of `predictions` and `proba`
            is `None`.
        indices_stopping : np.ndarray[int]
            Uses the given indices to select a subset for stopping from either `predictions`
            or `proba` if not `None`. The indices are relative to `predictions` and `proba`.
        """
        pass

For an example, see the KappaAverage, which stops when the change in the predictions over multiple iterations falls below a fixed threshold.

Classes

class small_text.stopping_criteria.kappa.KappaAverage(num_classes, window_size=3, kappa=0.99)[source]

A stopping criterion which measures the agreement between sets of predictions [BV09].

Changed in version 1.3.3: The previous implementation, which was flawed, has been corrected.

__init__(num_classes, window_size=3, kappa=0.99)

num_classesint: Number of classes.
window_sizeint, default=3: Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
kappafloat, default=0.99: The criterion stops when the agreement between two consecutive predictions within the window falls below this threshold.

stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)

Parameters:

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.

class small_text.stopping_criteria.base.DeltaFScore(num_classes, window_size=3, threshold=0.05)[source]

A stopping criterion which stops if the predicted change of the F-score falls below a threshold [AB19].

Note

This criterion is only applicable for binary classification.

Changed in version 1.3.3: The implementation now correctly only considers the change in agreement of the predicted labels belonging to the positive class.

__init__(num_classes, window_size=3, threshold=0.05)

num_classesint: Number of classes.
window_sizeint, default=3: Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
thresholdfloat, default=0.05: The criterion stops when the predicted F-score falls below this threshold.

stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)

Parameters:

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.

class small_text.stopping_criteria.change.ClassificationChange(num_classes, threshold=0.0)[source]

A stopping criterion which stops as soon as the predictions do not change during two subsequent checks [ZWH08].

Compared to the original paper, this implementation offers a threshold parameter which lessens the stopping requirement so that a percentage of samples are allowed to change. The default setting (threshold=0.0) will result in the original algorithm.

Added in version 1.1.0.

__init__(num_classes, threshold=0.0)

num_classesint: Number of classes.
thresholdfloat, default=0.0: A percentage threshold of how many samples that are allowed to change.

stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)

Parameters:

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.

class small_text.stopping_criteria.uncertainty.OverallUncertainty(num_classes, threshold=0.05)[source]

A stopping criterion which stops as soon as the average overall uncertainty falls below a given threshold [ZWH08].

As a measure of uncertainty, normalized prediction entropy is used. In order to reproduce the original implementation pass the unlabeled set indices_stopping to stop() method.

Added in version 1.1.0.

__init__(num_classes, threshold=0.05)

num_classesint: Number of classes.
thresholdfloat, default=0.05: A normalized entropy value below which the criterion indicates to stop.

stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)

Parameters:

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.

class small_text.stopping_criteria.utility.MaxIterations(max_iterations)[source]

Stops after a fixed number of iterations.

Added in version 1.1.0.

__init__(max_iterations)

max_iterationsint: Number of iterations after which the criterion will indicate to stop.

stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)

Parameters:

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.