Serialization

Functionality to save / load your active learner.

In some active learning applications, an active learner might have a longer lifespan. For example, you might want to save your active learner to resume your annotation process at a later time. This process in which the current object is saved to disk is called serialization.

To allow saving / loading, PoolBasedActiveLearner provides save() and load(). The only mandataroy argument is a folder to which the active learner is saved to or loaded from respectively.

Usage

(De-)serialization is straightforward to use. Models on the GPU need to be transferred to the CPU first to avoid errors during deserialization.

Note

Serialization has changed in v2.0.0 and is not backwards compatible with files saved using small-text v1.x.
You likely need the same small-text version (and depending on the model also similar dependencies). This might be improved in future releases.

See

Example

from small_text import PoolBasedActiveLearner

active_learner = <...>   # care, this does not run;
                         # active_learner is assumed to be a trained active learner

# Only for models on the GPU: transfer them to the CPU before serialization
active_learner.classifier.model = active_learner.classifier.model.cpu()

project_folder = '/tmp/active-learning-project'
active_learner.save(project_folder)

active_learner = PoolBasedActiveLearner.load(project_folder)