Pytorch Integration Classes¶

Dataset Classes¶

class small_text.integrations.pytorch.datasets.PytorchTextClassificationDataset(data, vocab, target_labels=None, device=None)

Dataset class for classifiers from Pytorch Integration.

__init__(data, vocab, target_labels=None, device=None)

Parameters

data (list of tuples (text data [Tensor], label)) – Data set.
vocab (torchtext.vocab.vocab) – Vocabulary object.

property x

Returns the features.

Returns: x – Feature representation.
Return type: object

property y

Returns the labels.

Returns: y – Label representation.
Return type: object

property data

Returns the internal list of tuples storing the data.

Returns: data – Vocab object.
Return type: list of tuples (text data [Tensor], label)

property vocab

Returns the vocab.

Returns: vocab – Vocab object.
Return type: torchtext.vocab.Vocab

property target_labels

Returns a list of possible labels.

Returns: target_labels – List of possible labels.
Return type: numpy.ndarray

to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format)

Calls torch.Tensor.to on all Tensors in data.

Returns: self – The object with to having been called on all Tensors in data.
Return type: PytorchTextClassificationDataset

Models¶

class small_text.integrations.pytorch.models.kimcnn.KimCNN(vocabulary_size, max_seq_length, num_classes=2, out_channels=100, embed_dim=300, padding_idx=0, kernel_heights=[3, 4, 5], dropout=0.5, embedding_matrix=None, freeze_embedding_layer=False)¶

Parameters

vocabulary_size (int) –
max_seq_length (int) –
num_classes (int) – Number of output classes.
embedding_matrix (2D FloatTensor) –

forward(x)¶

Parameters: x (torch.LongTensor or torch.cuda.LongTensor) – input tensor (batch_size, max_sequence_length) with padded sequences of word ids

Pytorch Integration Classes¶

Dataset Classes¶

Models¶

Classification¶