meerqat.ir.metrics module#

Script and functions related to metrics and ranx.

(for docopt) Usage: metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching –provenance_key=<key>] metrics.py qrels <qrels>… –output=<path> metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>] metrics.py (win|tie|loss) <metrics> [–metric=<metric>]

Usages:

metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching]
metrics.py qrels <qrels>… –output=<path>
metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>]
metrics.py (win|tie|loss) <metrics> [–metric=<metric>]

Positional arguments:

<usage> Pick one usage.
<dataset> Path to the dataset
<passages> Path to the passages (also a Dataset)
<title2index> Path to the JSON file mapping article’s title to it’s index in the KB
[<article2passage>] Path to the JSON file mapping article’s index to its corresponging passage indices.
Optional, if not provided, we assume that <passages> is a collection of articles.
<qrels>… Paths to the Qrels to merge
<metrics> Path to the JSON metrics file (output of ranx)

Options:

–reference=<reference> Name of the column that holds the text that should hold the answer. Defaults to ‘passage’. –save=<save> Name of the column under which to save the relevant indices. Defaults to ‘provenance_indices’. –provenance_key=<key> Where are the provenance stored in item[‘output’].

Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.

--disable_caching

Disables Dataset caching (useless when using save_to_disk), see datasets.set_caching_enabled()

--output=<path>

qrels: output path to the JSON file
ranx: output path to the directory where to save metrics

--filter=<path>

Path towards the JSON file that contains a list of question ids to filter out

--kwargs=<path>

Path towards the JSON config file that contains kwargs

--cats=<path>

Path towards the JSON that maps categories to their question ids

--metric=<metric>

Metric on which to compute wins/ties/losses [default: precision@1].

meerqat.ir.metrics.numerical_relevant(answer, passage)[source]#

meerqat.ir.metrics.find_valid_numerical_answers(answer, passages)[source]#

meerqat.ir.metrics.find_relevant(retrieved, original_answer, alternative_answers, kb, reference_key='passage', question_type=QuestionType.String)[source]#

Parameters:

retrieved (List[int]) –
original_answer (str) – Included in alternative_answers so original_relevant is included in relevant
alternative_answers (List[str]) –
kb (Dataset) –
reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’
question_type (QuestionType, optional) – Relevant for InfoSeek. Defaults to String.

Returns:

original_relevant, relevant – Included in retrieved

Return type:

List[int]

meerqat.ir.metrics.find_relevant_item(item, passages, title2index, article2passage=None, reference_key='passage', save_as='provenance_indices', provenance_key='provenance', qrels={})[source]#

Applies find_relevant with passages of articles linked to the question.

Parameters:

item (dict) –
passages (Dataset) –
title2index (dict[str, int]) – Mapping article’s title to it’s index in the KB
article2passage (dict[int, List[int]], optional) – Mapping article’s index to its corresponging passage indices If None, we assume that passages is a collection of articles
reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’
save_as (str, optional) – Results will be saved under this name in the dataset, with an ‘original_answer_’ prefix for passages that contain the original answer Defaults to ‘provenance_indices’
provenance_key (str, optional) – Where are the provenance stored in item[‘output’]. Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.
qrels (dict) – Stores relevant indices. Compatible with ranx.Qrels

meerqat.ir.metrics.find_relevant_dataset(dataset_path, save_as='provenance_indices', **kwargs)[source]#: Loads dataset, maps it through find_relevant_item and saves it back.

meerqat.ir.metrics.fuse_qrels(qrels_paths)[source]#

Loads all qrels in qrels_paths and unions them under a single Qrels.

Parameters:: qrels_paths (List[str]) –
Returns:: fused_qrels
Return type:: ranx.Qrels

meerqat.ir.metrics.load_runs(runs_paths=[], runs_dict={}, filter_q_ids=[])[source]#

Loads runs from both run_paths and runs_dict. Eventually filters out some questions.

Parameters:

runs_paths (List[str], optional) –
runs_dict (dict[str, str], optional) – {name of the run: path of the run}
filter_q_ids (List[str]) – Question identifiers to filter from the runs

Returns:

runs

Return type:

List[ranx.Run]

meerqat.ir.metrics.compare(qrels_path, runs_paths=[], runs_dict={}, output_path=None, filter_q_ids=[], **kwargs)[source]#

Loads Qrels and Runs, feed them to ranx.compare and save result.

Parameters:

qrels_path (str) –
runs_paths (List[str], optional) –
runs_dict (dict[str, str], optional) – {name of the run: path of the run}
output_path (str, optional) – Path of the directory were to save output JSON and TeX files. Defaults not to save (only print results)
filter_q_ids (List[str]) – Question identifiers to filter from the Runs and Qrels
**kwargs – Passed to ranx.compare

meerqat.ir.metrics.cat_breakdown(qrels_path, runs_paths, cats, runs_dict={}, output_path=None, filter_q_ids=[], metrics=['mrr'])[source]#

qrels_path, runs_paths, runs_dict, output_path, filter_q_ids:: see compare
cats: dict[str, List[str]]: {category: list of question identifiers that belong to it}
metrics: List[str], optional: Which metrics to compute

meerqat.ir.metrics.get_wtl_table(metrics, wtl_key='W', wtl_metric='precision@1')[source]#

Formats either the wins, ties, or losses of the models against each other according to wtl_key in a pandas.DataFrame

metrics: dict: loaded from the JSON output of ranx
wtl_key: str, optional: Whether to compute the win (‘W’), tie (‘T’), or loss (‘L’)
wtl_metric: str, optional: What does it mean to win?