Initialization
Initialization strategies provide the initial labelings from which the first classifier is created. They are merely intended for experimental purposes and therefore some of them may require knowledge about the true labels.
Initialization Strategies
For single-label scenarios:
For single-label and multi-label scenarios:
Methods
- small_text.initialization.strategies.random_initialization(x, n_samples=10)[source]
Randomly draws a subset from the given dataset.
- Parameters
x (Dataset) – A dataset.
n_samples (int, default=10) – Number of samples to draw.
- Returns
indices – Indices relative to x.
- Return type
np.array[int]
- small_text.initialization.strategies.random_initialization_stratified(y, n_samples=10, multilabel_strategy='labelsets')[source]
Randomly draws a subset stratified by class labels.
- Parameters
y (np.ndarray[int] or csr_matrix) – Labels to be used for stratification.
n_samples (int) – Number of samples to draw.
multilabel_strategy ({'labelsets'}, default='labelsets') – The multi-label strategy to be used in case of a multi-label labeling. This is only used if y is of type csr_matrix.
- Returns
indices – Indices relative to y.
- Return type
np.array[int]
See also
small_text.data.sampling.multilabel_stratified_subsets_sampling
Details on the labelsets multi-label strategy.
- small_text.initialization.strategies.random_initialization_balanced(y, n_samples=10)[source]
Randomly draws a subset which is (approximately) balanced in the distribution of its class labels.
- Parameters
y (np.ndarray[int] or csr_matrix) – Labels to be used for balanced sampling.
n_samples (int, default=10) – Number of samples to draw.
- Returns
indices – Indices relative to y.
- Return type
np.array[int]
Notes
This is only applicable to single-label classification.