meerqat.models.qa module#

Utility functions specific to Question Answering.

meerqat.models.qa.get_best_spans(start_probs, end_probs, weights=None, cannot_be_first_token=True)[source]#

Get the best scoring spans from start and end probabilities

notations:

N - number of distinct questions
M - number of passages per question in a batch
L - sequence length

Parameters:

start_probs (Tensor) – shape (N, M, L)
end_probs (Tensor) – shape (N, M, L)
weights (Tensor, optional) – shape (N, M) Used to weigh the spans scores, e.g. might be BM25 scores from the retriever
cannot_be_first_token (bool, optional) – (Default) null out the scores of start/end in the first token (e.g. “[CLS]”, used during training for irrelevant passages)

Returns:

passage_indices (Tensor) – shape (N, )
start_indices, end_indices (Tensor) – shape (N, ) start (inclusive) and end (exclusive) index of each span

class meerqat.models.qa.MultiPassageBERT(*args, fuse_ir_score=False, **kwargs)[source]#

Bases: BertForQuestionAnswering

PyTorch/Transformers implementation of Multi-passage BERT [1] (based on the global normalization [2]) i.e. groups passages per question before computing the softmax (and the NLL loss) so that spans scores are comparable across passages

Code based on transformers.BertForQuestionAnswering, dpr.models.Reader and allenai/document-qa

References

forward(input_ids, passage_scores=None, start_positions=None, end_positions=None, answer_mask=None, return_dict=None, **kwargs)[source]#

notations:

N - number of distinct questions
M - number of passages per question in a batch
L - sequence length

Parameters:

input_ids (Tensor[int]) – shape (N * M, L) There should always be a constant number of passages (relevant or not) per question
passage_scores (FloatTensor, optional) – shape (N * M, ) If self.fuse_ir_score, will be fused with start_logits and end_logits before computing loss
start_positions (Tensor[int], optional) – shape (N, M, max_n_answers) The answer might be found several time in the same passage, maximum max_n_answers times Defaults to None (i.e. don’t compute the loss)
end_positions (Tensor[int], optional) – shape (N, M, max_n_answers) The answer might be found several time in the same passage, maximum max_n_answers times Defaults to None (i.e. don’t compute the loss)
answer_mask (Tensor[int], optional) – shape (N, M, max_n_answers) Used to mask the loss for answers that are not max_n_answers times in the passage Required if start_positions and end_positions are specified
**kwargs (additional arguments are passed to BERT after being reshape like) –

class meerqat.models.qa.MultiPassageECA(config, **kwargs)[source]#

Bases: ECAEncoder

Like MultiPassageBERT with a ECA backbone instead of BERT

forward(text_inputs, *args, start_positions=None, end_positions=None, answer_mask=None, return_dict=True, **kwargs)[source]#

Parameters:

text_inputs (dict[str, torch.LongTensor]) – usual BERT inputs, see transformers.BertModel
face_inputs (dict[str, torch.FloatTensor]) –

{
“face”: (batch_size, n_images, n_faces, face_dim), “bbox”: (batch_size, n_images, n_faces, bbox_dim), “attention_mask”: (batch_size, n_images, n_faces)

}
image_inputs (dict[str, dict[str, torch.FloatTensor]]) –

{
model: {

”input”: (batch_size, n_images, image_dim) “attention_mask”: (batch_size, n_images)

}

}

class meerqat.models.qa.ViltMultiImageEmbeddings(config)[source]#

Bases: ViltEmbeddings

Similar to the ‘triplet’ strategy of UNITER, patches of multiple images are concatenated in the sequence dimension. The resulting embedding thus have a sequence length of #tokens + num_patches*num_images

forward(input_ids, attention_mask, token_type_ids, pixel_values, pixel_mask, passage_pixel_values, passage_pixel_mask, inputs_embeds)[source]#

Parameters:

`(batch_size (inputs_embeds (torch.FloatTensor of shape) –
Indices of input sequence tokens in the vocabulary. Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.
#tokens)`) – Indices of input sequence tokens in the vocabulary. Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.
`(batch_size –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
#tokens})`) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
`(batch_size –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: - 0 corresponds to a sentence A token, - 1 corresponds to a sentence B token.
#tokens)`) – Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: - 0 corresponds to a sentence A token, - 1 corresponds to a sentence B token.
pixel_values – Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
`(batch_size –
Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
num_channels – Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
height – Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
width)`) – Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
pixel_mask –
Mask to avoid performing attention on padding pixel values. Mask values selected in [0, 1]:
- 1 for pixels that are real (i.e. not masked),
- 0 for pixels that are padding (i.e. masked).
`(batch_size –
Mask to avoid performing attention on padding pixel values. Mask values selected in [0, 1]:
- 1 for pixels that are real (i.e. not masked),
- 0 for pixels that are padding (i.e. masked).
height –
Mask to avoid performing attention on padding pixel values. Mask values selected in [0, 1]:
- 1 for pixels that are real (i.e. not masked),
- 0 for pixels that are padding (i.e. masked).
width)`) –
Mask to avoid performing attention on padding pixel values. Mask values selected in [0, 1]:
- 1 for pixels that are real (i.e. not masked),
- 0 for pixels that are padding (i.e. masked).
`(batch_size –
Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
#tokens – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
hidden_size)`) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

class meerqat.models.qa.ViltMultiImageModel(config, add_pooling_layer=True)[source]#

Bases: ViltModel

Same as ViltModel with ViltMultiImageEmbeddings instead of ViltEmbeddings

forward(input_ids: Optional[LongTensor] = None, attention_mask: Optional[FloatTensor] = None, token_type_ids: Optional[LongTensor] = None, pixel_values: Optional[FloatTensor] = None, pixel_mask: Optional[LongTensor] = None, passage_pixel_values: Optional[FloatTensor] = None, passage_pixel_mask: Optional[LongTensor] = None, head_mask: Optional[FloatTensor] = None, inputs_embeds: Optional[FloatTensor] = None, image_embeds: Optional[FloatTensor] = None, image_token_type_idx: Optional[int] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None) → Union[BaseModelOutputWithPooling, Tuple[FloatTensor]][source]#

The [ViltModel] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters:

input_ids (torch.LongTensor of shape ({0})) – Indices of input sequence tokens in the vocabulary. Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details. [What are input IDs?](../glossary#input-ids)
attention_mask (torch.FloatTensor of shape ({0}), optional) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked. [What are attention masks?](../glossary#attention-mask)
token_type_ids (torch.LongTensor of shape ({0}), optional) – Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: - 0 corresponds to a sentence A token, - 1 corresponds to a sentence B token. [What are token type IDs?](../glossary#token-type-ids)
pixel_values (torch.FloatTensor of shape (batch_size, num_channels, height, width)) – Pixel values. Pixel values can be obtained using [ViltFeatureExtractor]. See [ViltFeatureExtractor.__call__] for details.
pixel_mask (torch.LongTensor of shape (batch_size, height, width), optional) –
Mask to avoid performing attention on padding pixel values. Mask values selected in [0, 1]:
- 1 for pixels that are real (i.e. not masked),
- 0 for pixels that are padding (i.e. masked).
What are attention masks?
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: - 1 indicates the head is not masked, - 0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape ({0}, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
image_embeds (torch.FloatTensor of shape (batch_size, num_patches, hidden_size), optional) – Optionally, instead of passing pixel_values, you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert pixel_values into patch embeddings.
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
return_dict (bool, optional) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
Returns –
[transformers.modeling_outputs.BaseModelOutputWithPooling] or tuple(torch.FloatTensor): A [transformers.modeling_outputs.BaseModelOutputWithPooling] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([ViltConfig]) and inputs.
- last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) – Sequence of hidden-states at the output of the last layer of the model.
- pooler_output (torch.FloatTensor of shape (batch_size, hidden_size)) – Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.
- hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
  
  Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
- attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
  
  Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples –
```python –
ViltProcessor (>>> from transformers import) –
ViltModel –
Image (>>> from PIL import) –
requests (>>> import) –
text (>>> # prepare image and) –
"http (>>> url =) – //images.cocodataset.org/val2017/000000039769.jpg”
Image.open (>>> image =) –
world" (>>> text = "hello) –
ViltProcessor.from_pretrained (>>> processor =) –
ViltModel.from_pretrained (>>> model =) –
processor (>>> inputs =) –
model (>>> outputs =) –
outputs.last_hidden_state (>>> last_hidden_states =) –
``` –

class meerqat.models.qa.MultiPassageVilt(config, add_pooling_layer=False)[source]#

Bases: ViltPreTrainedModel

Like MultiPassageBERT with a ViLT backbone instead of BERT

forward(input_ids, *args, start_positions=None, end_positions=None, answer_mask=None, return_dict=True, **kwargs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.