Pytorch Integration Classes

Dataset Classes

class small_text.integrations.pytorch.datasets.PytorchTextClassificationDataset(data, vocab, target_labels=None, device=None)

Dataset class for classifiers from Pytorch Integration.

__init__(data, vocab, target_labels=None, device=None)
Parameters
  • data (list of tuples (text data [Tensor], label)) – Data set.

  • vocab (torchtext.vocab.vocab) – Vocabulary object.

property x

Returns the features.

Returns

x – Feature representation.

Return type

object

property y

Returns the labels.

Returns

y – Label representation.

Return type

object

property data

Returns the internal list of tuples storing the data.

Returns

data – Vocab object.

Return type

list of tuples (text data [Tensor], label)

property vocab

Returns the vocab.

Returns

vocab – Vocab object.

Return type

torchtext.vocab.Vocab

property target_labels

Returns a list of possible labels.

Returns

target_labels – List of possible labels.

Return type

numpy.ndarray

to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format)

Calls torch.Tensor.to on all Tensors in data.

Returns

self – The object with to having been called on all Tensors in data.

Return type

PytorchTextClassificationDataset

Models

class small_text.integrations.pytorch.models.kimcnn.KimCNN(vocabulary_size, max_seq_length, num_classes=2, out_channels=100, embed_dim=300, padding_idx=0, kernel_heights=[3, 4, 5], dropout=0.5, embedding_matrix=None, freeze_embedding_layer=False)
Parameters
  • vocabulary_size (int) –

  • max_seq_length (int) –

  • num_classes (int) – Number of output classes.

  • embedding_matrix (2D FloatTensor) –

forward(x)
Parameters

x (torch.LongTensor or torch.cuda.LongTensor) – input tensor (batch_size, max_sequence_length) with padded sequences of word ids

Classification