Data Management

In order to decouple the PoolBasedActiveLearner from your application logic, most of its methods operate solely on indices. This also means if your data changes, i.e., examples are added, replaced or removed, or labels have been changed, you will need to keep this information updated.

Note

The following methods are more relevant to Active Learning applications than to the experimental scenario.

Note

Whenever the labeled data changes your current model might need retraining to reflect the updated data.

Updating Labels

In case you want to revise or undo a past labeling, previous labels can be updated (update_label_at()) or removed (remove_label_at()).

Adding / Removing Data

Whenever the dataset changes, previous indices might be invalid and as a consequence the active learner need to be (re-)initialized:

initialize_data()

By default, this also triggers a re-training of the model, which can be suppressed by passing retrain=True.

Ignoring Data

In real-world applications, there is often noisy data (e.g., after an OCR step). For this and other scenarios in which you don’t want to assign any labels (and also don’t want to see this sample against in the next query), you can ignore samples:

ignore_sample_at()