Initialization

Initialization strategies provide the initial labelings from which the first classifier is created. They are merely intended for experimental purposes and therefore some of them may require knowledge about the true labels.

Initialization Strategies

For single-label scenarios:

random_initialization()
random_initialization_balanced()

For single-label and multi-label scenarios:

random_initialization_stratified()

Methods

small_text.initialization.strategies.random_initialization(x, n_samples=10)[source]

Randomly draws a subset from the given dataset.

Parameters

x (Dataset) – A dataset.
n_samples (int, default=10) – Number of samples to draw.

Returns

indices – Indices relative to x.

Return type

np.array[int]

small_text.initialization.strategies.random_initialization_stratified(y, n_samples=10, multilabel_strategy='labelsets')[source]

Randomly draws a subset stratified by class labels.

Parameters

y (np.ndarray[int] or csr_matrix) – Labels to be used for stratification.
n_samples (int) – Number of samples to draw.
multilabel_strategy ({'labelsets'}, default='labelsets') – The multi-label strategy to be used in case of a multi-label labeling. This is only used if y is of type csr_matrix.

Returns

indices – Indices relative to y.

Return type

np.array[int]