Pytorch Integration Classes

Dataset Classes

class small_text.integrations.pytorch.datasets.PytorchTextClassificationDataset(data, vocab, multi_label=False, target_labels=None)[source]

Dataset class for classifiers from Pytorch Integration.

__init__(data, vocab, multi_label=False, target_labels=None)

Parameters

data (list of tuples (text data [Tensor], labels [int or list of int])) – The single items constituting the dataset. For single-label datasets, unlabeled instances the label should be set to small_text.base.LABEL_UNLABELED`, and for multi-label datasets to an empty list.
vocab (torchtext.vocab.vocab) – Vocabulary object.
multi_label (bool) – Indicates if this is a multi-label dataset.
target_labels (list of int or None) – This is a list of (integer) labels to be encountered within this dataset. This is important to set if your data does not contain some labels, e.g. due to dataset splits, where the labels should however be considered by entities such as the classifier. If None, the target labels will be inferred from the labels encountered in self.data.

property x

Returns the features.

Returns: x
Return type: list of Tensor

property data

Returns the internal list of tuples storing the data.

Returns: data – Vocab object.
Return type: list of tuples (text data [Tensor], label)

property vocab

Returns the vocab.

Returns: vocab – Vocab object.
Return type: torchtext.vocab.Vocab

property target_labels

Returns the target labels.

Returns: target_labels – List of target labels.
Return type: list of int

to(other, non_blocking=False, copy=False)

Calls torch.Tensor.to on all Tensors in data.

Returns: self – The object with to having been called on all Tensors in data.
Return type: PytorchTextClassificationDataset

Models

class small_text.integrations.pytorch.models.kimcnn.KimCNN(vocabulary_size, max_seq_length, num_classes=2, out_channels=100, embed_dim=300, padding_idx=0, kernel_heights=[3, 4, 5], dropout=0.5, embedding_matrix=None, freeze_embedding_layer=False)[source]

forward(x)

Parameters: x (torch.LongTensor or torch.cuda.LongTensor) – input tensor (batch_size, max_sequence_length) with padded sequences of word ids

Pytorch Integration Classes

Dataset Classes

Models

Classification