Stopping Criteria
Stopping criteria indicate when to exit the active learning loop.
Pre-implemented Stopping Criteria
Interface
This interface is one of the trickiest, since you might stop on any information available within
the active learning process
(excluding experiment only information like the test set of course).
Therefore all arguments here are optional and None
by default, and the interface only provides a
very loose frame on how stopping criteria should be built.
class StoppingCriterion(ABC):
@abstractmethod
def stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None):
"""
Parameters
----------
active_learner : small_text.active_learner.PoolBasedActiveLearner
An active learner instance.
predictions : np.ndarray[int]
Predictions for a fixed subset (usually the full train set).
proba : np.ndarray[float]
Probability distribution over the possible classes for a fixed subset. This is expected
to have the same length as `predictions` unless one of `predictions` and `proba`
is `None`.
indices_stopping : np.ndarray[int]
Uses the given indices to select a subset for stopping from either `predictions`
or `proba` if not `None`. The indices are relative to `predictions` and `proba`.
"""
pass
For an example, see the KappaAverage
,
which stops when the change in the predictions over multiple iterations falls below a fixed threshold.
Classes
- class small_text.stopping_criteria.kappa.KappaAverage(num_classes, window_size=3, kappa=0.99)[source]
A stopping criterion which measures the agreement between sets of predictions [BV09].
Changed in version 1.3.3: The previous implementation, which was flawed, has been corrected.
- __init__(num_classes, window_size=3, kappa=0.99)
- num_classesint
Number of classes.
- window_sizeint, default=3
Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
- kappafloat, default=0.99
The criterion stops when the agreement between two consecutive predictions within the window falls below this threshold.
- stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)
- Parameters
active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.
- class small_text.stopping_criteria.base.DeltaFScore(num_classes, window_size=3, threshold=0.05)[source]
A stopping criterion which stops if the predicted change of the F-score falls below a threshold [AB19].
Note
This criterion is only applicable for binary classification.
Changed in version 1.3.3: The implementation now correctly only considers the change in agreement of the predicted labels belonging to the positive class.
- __init__(num_classes, window_size=3, threshold=0.05)
- num_classesint
Number of classes.
- window_sizeint, default=3
Defines number of iterations for which the predictions are taken into account, i.e. this stopping criterion only sees the last window_size-many states of the prediction array passed to stop().
- thresholdfloat, default=0.05
The criterion stops when the predicted F-score falls below this threshold.
- stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)
- Parameters
active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.
- class small_text.stopping_criteria.change.ClassificationChange(num_classes, threshold=0.0)[source]
A stopping criterion which stops as soon as the predictions do not change during two subsequent checks [ZWH08].
Compared to the original paper, this implementation offers a threshold parameter which lessens the stopping requirement so that a percentage of samples are allowed to change. The default setting (
threshold=0.0
) will result in the original algorithm.New in version 1.1.0.
- __init__(num_classes, threshold=0.0)
- num_classesint
Number of classes.
- thresholdfloat, default=0.0
A percentage threshold of how many samples that are allowed to change.
- stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)
- Parameters
active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.
- class small_text.stopping_criteria.uncertainty.OverallUncertainty(num_classes, threshold=0.05)[source]
A stopping criterion which stops as soon as the average overall uncertainty falls below a given threshold [ZWH08].
As a measure of uncertainty, normalized prediction entropy is used. In order to reproduce the original implementation pass the unlabeled set indices_stopping to stop() method.
New in version 1.1.0.
- __init__(num_classes, threshold=0.05)
- num_classesint
Number of classes.
- thresholdfloat, default=0.05
A normalized entropy value below which the criterion indicates to stop.
- stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)
- Parameters
active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.
- class small_text.stopping_criteria.utility.MaxIterations(max_iterations)[source]
Stops after a fixed number of iterations.
New in version 1.1.0.
- __init__(max_iterations)
- max_iterationsint
Number of iterations after which the criterion will indicate to stop.
- stop(active_learner=None, predictions=None, proba=None, indices_stopping=None)
- Parameters
active_learner (small_text.active_learner.PoolBasedActiveLearner) – An active learner instance.
predictions (np.ndarray[int]) – Predictions for a fixed subset (usually the full train set).
proba (np.ndarray[float]) – Probability distribution over the possible classes for a fixed subset. This is expected to have the same length as predictions unless one of predictions and proba is None.
indices_stopping (np.ndarray[int]) – Uses the given indices to select a subset for stopping from either predictions or proba if not None. The indices are relative to predictions and proba.