meerqat.ir.metrics module#

Script and functions related to metrics and ranx.

(for docopt) Usage: metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching –provenance_key=<key>] metrics.py qrels <qrels>… –output=<path> metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>] metrics.py (win|tie|loss) <metrics> [–metric=<metric>]

Usages:
  1. metrics.py relevant <dataset> <passages> <title2index> [<article2passage> –reference=<reference> –save=<save> –disable_caching]

  2. metrics.py qrels <qrels>… –output=<path>

  3. metrics.py ranx –qrels=<path> [<run>… –output=<path> –filter=<path> –kwargs=<path> –cats=<path>]

  4. metrics.py (win|tie|loss) <metrics> [–metric=<metric>]

Positional arguments:
  • <usage> Pick one usage.

  • <dataset> Path to the dataset

  • <passages> Path to the passages (also a Dataset)

  • <title2index> Path to the JSON file mapping article’s title to it’s index in the KB

  • [<article2passage>] Path to the JSON file mapping article’s index to its corresponging passage indices.

    Optional, if not provided, we assume that <passages> is a collection of articles.

  • <qrels>… Paths to the Qrels to merge

  • <metrics> Path to the JSON metrics file (output of ranx)

Options:

–reference=<reference> Name of the column that holds the text that should hold the answer. Defaults to ‘passage’. –save=<save> Name of the column under which to save the relevant indices. Defaults to ‘provenance_indices’. –provenance_key=<key> Where are the provenance stored in item[‘output’].

Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.

--disable_caching

Disables Dataset caching (useless when using save_to_disk), see datasets.set_caching_enabled()

--output=<path>
  1. qrels: output path to the JSON file

  2. ranx: output path to the directory where to save metrics

--filter=<path>

Path towards the JSON file that contains a list of question ids to filter out

--kwargs=<path>

Path towards the JSON config file that contains kwargs

--cats=<path>

Path towards the JSON that maps categories to their question ids

--metric=<metric>

Metric on which to compute wins/ties/losses [default: precision@1].

meerqat.ir.metrics.numerical_relevant(answer, passage)[source]#
meerqat.ir.metrics.find_valid_numerical_answers(answer, passages)[source]#
meerqat.ir.metrics.find_relevant(retrieved, original_answer, alternative_answers, kb, reference_key='passage', question_type=QuestionType.String)[source]#
Parameters:
  • retrieved (List[int]) –

  • original_answer (str) – Included in alternative_answers so original_relevant is included in relevant

  • alternative_answers (List[str]) –

  • kb (Dataset) –

  • reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’

  • question_type (QuestionType, optional) – Relevant for InfoSeek. Defaults to String.

Returns:

original_relevant, relevant – Included in retrieved

Return type:

List[int]

meerqat.ir.metrics.find_relevant_item(item, passages, title2index, article2passage=None, reference_key='passage', save_as='provenance_indices', provenance_key='provenance', qrels={})[source]#

Applies find_relevant with passages of articles linked to the question.

Parameters:
  • item (dict) –

  • passages (Dataset) –

  • title2index (dict[str, int]) – Mapping article’s title to it’s index in the KB

  • article2passage (dict[int, List[int]], optional) – Mapping article’s index to its corresponging passage indices If None, we assume that passages is a collection of articles

  • reference_key (str, optional) – Used to get the reference field in kb Defaults to ‘passage’

  • save_as (str, optional) – Results will be saved under this name in the dataset, with an ‘original_answer_’ prefix for passages that contain the original answer Defaults to ‘provenance_indices’

  • provenance_key (str, optional) – Where are the provenance stored in item[‘output’]. Special values ‘wikidata’ and ‘wikipedia’ will use a single provenance article, the one from the subject-entity (stored in ‘wikidata_id’ and ‘wikipedia_title’, respectively). If ‘wikidata’, title2index should actually be QID-to-index. Defaults to ‘provenance’.

  • qrels (dict) – Stores relevant indices. Compatible with ranx.Qrels

meerqat.ir.metrics.find_relevant_dataset(dataset_path, save_as='provenance_indices', **kwargs)[source]#

Loads dataset, maps it through find_relevant_item and saves it back.

meerqat.ir.metrics.fuse_qrels(qrels_paths)[source]#

Loads all qrels in qrels_paths and unions them under a single Qrels.

Parameters:

qrels_paths (List[str]) –

Returns:

fused_qrels

Return type:

ranx.Qrels

meerqat.ir.metrics.load_runs(runs_paths=[], runs_dict={}, filter_q_ids=[])[source]#

Loads runs from both run_paths and runs_dict. Eventually filters out some questions.

Parameters:
  • runs_paths (List[str], optional) –

  • runs_dict (dict[str, str], optional) – {name of the run: path of the run}

  • filter_q_ids (List[str]) – Question identifiers to filter from the runs

Returns:

runs

Return type:

List[ranx.Run]

meerqat.ir.metrics.compare(qrels_path, runs_paths=[], runs_dict={}, output_path=None, filter_q_ids=[], **kwargs)[source]#

Loads Qrels and Runs, feed them to ranx.compare and save result.

Parameters:
  • qrels_path (str) –

  • runs_paths (List[str], optional) –

  • runs_dict (dict[str, str], optional) – {name of the run: path of the run}

  • output_path (str, optional) – Path of the directory were to save output JSON and TeX files. Defaults not to save (only print results)

  • filter_q_ids (List[str]) – Question identifiers to filter from the Runs and Qrels

  • **kwargs – Passed to ranx.compare

meerqat.ir.metrics.cat_breakdown(qrels_path, runs_paths, cats, runs_dict={}, output_path=None, filter_q_ids=[], metrics=['mrr'])[source]#
qrels_path, runs_paths, runs_dict, output_path, filter_q_ids:

see compare

cats: dict[str, List[str]]

{category: list of question identifiers that belong to it}

metrics: List[str], optional

Which metrics to compute

meerqat.ir.metrics.get_wtl_table(metrics, wtl_key='W', wtl_metric='precision@1')[source]#

Formats either the wins, ties, or losses of the models against each other according to wtl_key in a pandas.DataFrame

metrics: dict

loaded from the JSON output of ranx

wtl_key: str, optional

Whether to compute the win (‘W’), tie (‘T’), or loss (‘L’)

wtl_metric: str, optional

What does it mean to win?