Serialization
Functionality to save / load your active learner.
In some active learning applications, an active learner might have a longer lifespan. For example, you might want to save your active learner to resume your annotation process at a later time. This process in which the current object is saved to disk is called serialization.
To allow saving / loading, PoolBasedActiveLearner provides
save() and
load(). The only mandataroy argument is a folder
to which the active learner is saved to or loaded from respectively.
Usage
(De-)serialization is straightforward to use. Models on the GPU need to be transferred to the CPU first to avoid errors during deserialization.
Note
Serialization has changed in v2.0.0 and is not backwards compatible with files saved using small-text v1.x.
You likely need the same small-text version (and depending on the model also similar dependencies). This might be improved in future releases.
See
Example
from small_text import PoolBasedActiveLearner
active_learner = <...> # care, this does not run;
# active_learner is assumed to be a trained active learner
# Only for models on the GPU: transfer them to the CPU before serialization
active_learner.classifier.model = active_learner.classifier.model.cpu()
project_folder = '/tmp/active-learning-project'
active_learner.save(project_folder)
active_learner = PoolBasedActiveLearner.load(project_folder)