|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- kortukov/answer-equivalence-dataset |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Overview |
|
|
BEM - BERT Matching model from paper [Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation](https://arhttps://arxiv.org/abs/2202.07654xiv.org/abs/2202.07654) (reproduction). |
|
|
|
|
|
It is a [bert-base-uncased](https://huggingface.co/bert-base-uncased) model trained on the [Answer Equivalence dataset](https://huggingface.co/datasets/kortukov/answer-equivalence-dataset) |
|
|
|
|
|
Consider this example (pseudocode): |
|
|
```python |
|
|
question = 'how is the weather in california' |
|
|
reference answer = 'infrequent rain' |
|
|
candidate answer = 'rain' |
|
|
bem(question, reference, candidate) ~ 0 |
|
|
``` |
|
|
|
|
|
This model can be used as a metric to evaluate automatic question answering systems: when the produced answer is different from the reference, it might still be equivalent to the reference and hence count as correct. |
|
|
|
|
|
See the paper [Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation](https://arxiv.org/abs/2202.07654) for a detailed explanation of how the data was collected and how this metric compares to others such as exact match of F1. |
|
|
|
|
|
# Example use |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from torch.nn import functional as F |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("kortukov/answer-equivalence-bem") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("kortukov/answer-equivalence-bem") |
|
|
|
|
|
question = "What does Ban Bossy encourage?" |
|
|
reference = "leadership in girls" |
|
|
candidate = "positions of power" |
|
|
|
|
|
def tokenize_function(question, reference, candidate): |
|
|
text = f"[CLS] {candidate} [SEP]" |
|
|
text_pair = f"{reference} [SEP] {question} [SEP]" |
|
|
return tokenizer(text=text, text_pair=text_pair, add_special_tokens=False, padding='max_length', truncation=True, return_tensors='pt') |
|
|
|
|
|
inputs = tokenize_function(question, reference, candidate) |
|
|
out = model(**inputs) |
|
|
|
|
|
prediction = F.softmax(out.logits, dim=-1).argmax().item() |
|
|
``` |
|
|
|
|
|
|