meerqat.ir.metrics module#
Script and functions related to metrics and ranx.
(for docopt) Usage: metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching –provenance_key=<key>] metrics.py qrels <qrels>… –output=<path> metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>] metrics.py (win|tie|loss) <metrics> [–metric=<metric>]
- Usages:
metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching]
metrics.py qrels <qrels>… –output=<path>
metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>]
metrics.py (win|tie|loss) <metrics> [–metric=<metric>]
- Positional arguments:
<usage> Pick one usage.
<dataset> Path to the dataset
<passages> Path to the passages (also a Dataset)
<title2index> Path to the JSON file mapping article’s title to it’s index in the KB
- [<article2passage>] Path to the JSON file mapping article’s index to its corresponging passage indices.
Optional, if not provided, we assume that <passages> is a collection of articles.
<qrels>… Paths to the Qrels to merge
<metrics> Path to the JSON metrics file (output of ranx)
- Options:
–reference=<reference> Name of the column that holds the text that should hold the answer. Defaults to ‘passage’. –save=<save> Name of the column under which to save the relevant indices. Defaults to ‘provenance_indices’. –provenance_key=<key> Where are the provenance stored in item[‘output’].
Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.
- --disable_caching
Disables Dataset caching (useless when using save_to_disk), see datasets.set_caching_enabled()
- --output=<path>
qrels: output path to the JSON file
ranx: output path to the directory where to save metrics
- --filter=<path>
Path towards the JSON file that contains a list of question ids to filter out
- --kwargs=<path>
Path towards the JSON config file that contains kwargs
- --cats=<path>
Path towards the JSON that maps categories to their question ids
- --metric=<metric>
Metric on which to compute wins/ties/losses [default: precision@1].
- meerqat.ir.metrics.find_relevant(retrieved, original_answer, alternative_answers, kb, reference_key='passage', question_type=QuestionType.String)[source]#
- Parameters:
retrieved (List[int]) –
original_answer (str) – Included in alternative_answers so original_relevant is included in relevant
alternative_answers (List[str]) –
kb (Dataset) –
reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’
question_type (QuestionType, optional) – Relevant for InfoSeek. Defaults to String.
- Returns:
original_relevant, relevant – Included in retrieved
- Return type:
List[int]
- meerqat.ir.metrics.find_relevant_item(item, passages, title2index, article2passage=None, reference_key='passage', save_as='provenance_indices', provenance_key='provenance', qrels={})[source]#
Applies
find_relevant
with passages of articles linked to the question.- Parameters:
item (dict) –
passages (Dataset) –
title2index (dict[str, int]) – Mapping article’s title to it’s index in the KB
article2passage (dict[int, List[int]], optional) – Mapping article’s index to its corresponging passage indices If None, we assume that passages is a collection of articles
reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’
save_as (str, optional) – Results will be saved under this name in the dataset, with an ‘original_answer_’ prefix for passages that contain the original answer Defaults to ‘provenance_indices’
provenance_key (str, optional) – Where are the provenance stored in item[‘output’]. Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.
qrels (dict) – Stores relevant indices. Compatible with ranx.Qrels
- meerqat.ir.metrics.find_relevant_dataset(dataset_path, save_as='provenance_indices', **kwargs)[source]#
Loads dataset, maps it through find_relevant_item and saves it back.
- meerqat.ir.metrics.fuse_qrels(qrels_paths)[source]#
Loads all qrels in qrels_paths and unions them under a single Qrels.
- Parameters:
qrels_paths (List[str]) –
- Returns:
fused_qrels
- Return type:
ranx.Qrels
- meerqat.ir.metrics.load_runs(runs_paths=[], runs_dict={}, filter_q_ids=[])[source]#
Loads runs from both run_paths and runs_dict. Eventually filters out some questions.
- Parameters:
runs_paths (List[str], optional) –
runs_dict (dict[str, str], optional) – {name of the run: path of the run}
filter_q_ids (List[str]) – Question identifiers to filter from the runs
- Returns:
runs
- Return type:
List[ranx.Run]
- meerqat.ir.metrics.compare(qrels_path, runs_paths=[], runs_dict={}, output_path=None, filter_q_ids=[], **kwargs)[source]#
Loads Qrels and Runs, feed them to ranx.compare and save result.
- Parameters:
qrels_path (str) –
runs_paths (List[str], optional) –
runs_dict (dict[str, str], optional) – {name of the run: path of the run}
output_path (str, optional) – Path of the directory were to save output JSON and TeX files. Defaults not to save (only print results)
filter_q_ids (List[str]) – Question identifiers to filter from the Runs and Qrels
**kwargs – Passed to ranx.compare
- meerqat.ir.metrics.cat_breakdown(qrels_path, runs_paths, cats, runs_dict={}, output_path=None, filter_q_ids=[], metrics=['mrr'])[source]#
- qrels_path, runs_paths, runs_dict, output_path, filter_q_ids:
see
compare
- cats: dict[str, List[str]]
{category: list of question identifiers that belong to it}
- metrics: List[str], optional
Which metrics to compute
- meerqat.ir.metrics.get_wtl_table(metrics, wtl_key='W', wtl_metric='precision@1')[source]#
Formats either the wins, ties, or losses of the models against each other according to wtl_key in a pandas.DataFrame
- metrics: dict
loaded from the JSON output of ranx
- wtl_key: str, optional
Whether to compute the win (‘W’), tie (‘T’), or loss (‘L’)
- wtl_metric: str, optional
What does it mean to win?