Stopping Criteria

Stopping criteria provide you an indication when to exit the active learning loop.

Pre-implemented Stopping Criteria

DeltaFScore
KappaAverage

Interface

This interface is one of the trickiest, since you might stop on any information available within the active learning process (excluding experiment only information like the test set of course). Therefore all arguments here are optional and None by default, and the interface only provides a very loose frame on how stopping criteria should be built.

class StoppingCriterion(ABC):

    @abstractmethod
    def stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None):
        """
        Parameters
        ----------
        active_learner : small_text.active_learner.PoolBasedActiveLearner
            An active learner instance.
        predictions : np.ndarray[int]
            Predictions for a fixed subset (usually the full train set).
        proba : np.ndarray[float]
            Probability distribution over the possible classes for a fixed subset. This is expected
            to have the same length as `predictions` unless one of `predictions` and `proba`
            is `None`.
        x_indices_stopping : np.ndarray[int]
            Uses the given indices to select a subset for stopping from either `predictions`
            or `proba` if not `None`. The indices are relative to `predictions` and `proba`.
        """
        pass

For an example, see the KappaAverage, which stops when the change in the predictions over multiple iterations falls below a fixed threshold.

Classes

class small_text.stopping_criteria.kappa.KappaAverage(num_classes, window_size=3, kappa=0.99)[source]

A stopping criterion which measures the agreement between sets of predictions [BV09].

References

BV09: M. Bloodgood and K. Vijay-Shanker. 2009. A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL ‘09). Association for Computational Linguistics, USA, 39–47.

__init__(num_classes, window_size=3, kappa=0.99)

num_classesint: Number of classes.
window_sizeint, default=3: Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
kappafloat, threshold=0.05: The criterion stops when the agreement between two consecutive predictions within the window falls below this threshold.

stop(active_learner=None, predictions=None, proba=None, x_indices_stopping=None)

Parameters

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
x_indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.

class small_text.stopping_criteria.base.DeltaFScore(num_classes, window_size=3, threshold=0.05)[source]

A stopping criterion which stops if the predicted change of the F-score falls below a threshold [AB19].

Note

This criterion is only applicable for binary classification.

References

AB19: Michael Altschuler and Michael Bloodgood. 2019. Stopping Active Learning based on Predicted Change of F Measure for Text Classification. In: International Conference on Semantic Computing (ICSC 2019).

__init__(num_classes, window_size=3, threshold=0.05)

num_classesint: Number of classes.
window_sizeint, default=3: Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
thresholdfloat, threshold=0.05: The criterion stops when the predicted F-score falls below this threshold.

stop(active_learner=None, predictions=None, proba=None, x_indices_stopping=None)

Parameters

active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
x_indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.