Dataset API¶
All datset implementations inherit from the the abstract class Dataset
.
Several such implementations are available, depending on the choice of classifier (and on the installed optional dependencies).
Core¶
- class small_text.data.datasets.Dataset¶
Abstract class for all datasets.
- property x¶
Returns the features.
- Returns
x – Feature representation.
- Return type
object
- property y¶
Returns the labels.
- Returns
y – Label representation.
- Return type
object
- property target_labels¶
Returns a list of possible labels.
- Returns
target_labels – List of possible labels.
- Return type
numpy.ndarray
- class small_text.data.datasets.SklearnDataset(x, y, target_labels=None)¶
A dataset representations which is usable in combination with scikit-learn classifiers.
- Parameters
x (numpy.ndarray or scipy.sparse.csr_matrix) – Dense or sparse feature matrix.
y (list of int) – List of labels where each label belongs to the features of the respective row.
target_labels (list of int or None) – List of possible labels. Will be inferred from y if None is passed.
- __init__(x, y, target_labels=None)¶
- property x¶
Returns the features.
- Returns
x – Dense or sparse feature matrix.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- property y¶
Returns the labels.
- Returns
y – List of labels.
- Return type
numpy.ndarray
- property target_labels¶
Returns a list of possible labels.
- Returns
target_labels – List of possible labels.
- Return type
numpy.ndarray
Pytorch Integration¶
- class small_text.integrations.pytorch.datasets.PytorchTextClassificationDataset(data, vocab, target_labels=None, device=None)¶
Dataset class for classifiers from Pytorch Integration.
- __init__(data, vocab, target_labels=None, device=None)¶
- Parameters
data (list of tuples (text data [Tensor], label)) – Data set.
vocab (torchtext.vocab.vocab) – Vocabulary object.
- property x¶
Returns the features.
- Returns
x – Feature representation.
- Return type
object
- property y¶
Returns the labels.
- Returns
y – Label representation.
- Return type
object
- property data¶
Returns the internal list of tuples storing the data.
- Returns
data – Vocab object.
- Return type
list of tuples (text data [Tensor], label)
- property vocab¶
Returns the vocab.
- Returns
vocab – Vocab object.
- Return type
torchtext.vocab.Vocab
- property target_labels¶
Returns a list of possible labels.
- Returns
target_labels – List of possible labels.
- Return type
numpy.ndarray
- to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format)¶
Calls torch.Tensor.to on all Tensors in data.
- Returns
self – The object with to having been called on all Tensors in data.
- Return type
See also
Transformers Integration¶
- class small_text.integrations.transformers.datasets.TransformersDataset(data, target_labels=None, device=None)¶
Dataset class for classifiers from Transformers Integration.
- __init__(data, target_labels=None, device=None)¶
- Parameters
data (list of 3-tuples (text data [Tensor], mask [Tensor], label [int])) – Data set.
- property x¶
Returns the features.
- Returns
x – Feature representation.
- Return type
object
- property y¶
Returns the labels.
- Returns
y – Label representation.
- Return type
object
- property target_labels¶
Returns a list of possible labels.
- Returns
target_labels – List of possible labels.
- Return type
numpy.ndarray