Initialization

Initialization (sampling) strategies provide the initial labelings from which the first classifier is created. Some of them may require knowledge about the true labels and therefore they are merely intended for experimental purposes.

In an application setting you must provide an initial set of labels instead (or use a cold start approach, which is not yet supported).

Initialization Strategies

For the single-label scenario:

random_initialization()
random_initialization_balanced()

For single-label and multi-label scenarios:

random_initialization_stratified()

Methods

small_text.initialization.strategies.random_initialization(x, n_samples=10)[source]

Randomly draws a subset from the given dataset.

Parameters

x (Dataset) – A dataset.
n_samples (int, default=10) – Number of samples to draw.

Returns

indices – Indices relative to x.

Return type

np.ndarray[int]

small_text.initialization.strategies.random_initialization_stratified(y, n_samples=10, multilabel_strategy='labelsets')[source]

Randomly draws a subset stratified by class labels.

Parameters

y (np.ndarray[int] or csr_matrix) – Labels to be used for stratification.
n_samples (int) – Number of samples to draw.
multilabel_strategy ({'labelsets'}, default='labelsets') – The multi-label strategy to be used in case of a multi-label labeling. This is only used if y is of type csr_matrix.

Returns

indices – Indices relative to y.

Return type

np.ndarray[int]