RAGchain.benchmark.dataset package
Submodules
RAGchain.benchmark.dataset.antique module
- class RAGchain.benchmark.dataset.antique.AntiqueEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
AntiqueEvaluator is a class for evaluating pipeline performance on antique dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None, random_state=None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. :param random_state: A random state to fix the shuffled corpus to ingest. Types are like these. int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
Notice: If ingest size too big, It takes a long time. So we shuffle corpus and slice by ingest size for test. Put retrieval gt corpus in passages because retrieval retrieves ground truth in db.
If you want to use context_recall metrics, you should ingest all data.
RAGchain.benchmark.dataset.asqa module
- class RAGchain.benchmark.dataset.asqa.ASQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
ASQAEvaluator is a class for evaluating pipeline performance on ASQA dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data.
RAGchain.benchmark.dataset.base module
- class RAGchain.benchmark.dataset.base.BaseDatasetEvaluator(run_all: bool = True, metrics: List[str] | None = None)
Bases:
BaseEvaluator
,ABC
- abstract ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
RAGchain.benchmark.dataset.dstc11_track5 module
- class RAGchain.benchmark.dataset.dstc11_track5.DSTC11Track5Evaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
DSTC11Track5Evaluator is a class for evaluating pipeline performance on DSTC-11-Track-5 dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None, random_state=None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. You must ingest all data for using context_recall metrics. :param random_state: A random state to fix the shuffled corpus to ingest. Types are like these. int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
RAGchain.benchmark.dataset.eli5 module
- class RAGchain.benchmark.dataset.eli5.Eli5Evaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
Eli5Evaluator is a class for evaluating pipeline performance on Eli5 dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data.
RAGchain.benchmark.dataset.ko_strategy_qa module
- class RAGchain.benchmark.dataset.ko_strategy_qa.KoStrategyQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
,BaseStrategyQA
Ko-StrategyQA dataset evaluator
- dataset_name = 'NomaDamas/Ko-StrategyQA'
- evaluate(validate_passages: bool = True) EvaluateResult
Evaluate pipeline performance on Ko-StrategyQA dataset. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. If you want to use context_recall and context_precision metrics, you should ingest all data.
RAGchain.benchmark.dataset.mr_tydi module
- class RAGchain.benchmark.dataset.mr_tydi.MrTydiEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None, language: str = 'english')
Bases:
BaseDatasetEvaluator
MrTydiEvaluator is a class for evaluating pipeline performance on Mr.tydi dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate pipeline performance on Mr. Tydi dataset. This method always validate passages.
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None, random_state=None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. You must ingest all data for using context_recall metrics. If the ingest size is excessively large, it results in prolonged processing times. To address this, we shuffle the corpus and slice it according to the ingest size for testing purposes. The reason for transforming the retrieval ground truth corpus into passages and ingesting it is to enable retrieval to retrieve the retrieval ground truth within the database. :param random_state: A random state to fix the shuffled corpus to ingest. Types are like these. int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
RAGchain.benchmark.dataset.msmarco module
- class RAGchain.benchmark.dataset.msmarco.MSMARCOEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None, version: str = 'v1.1')
Bases:
BaseDatasetEvaluator
MSMARCO is a class for evaluating pipeline performance on MSMARCO dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. You must ingest all data for using context_recall metrics.
RAGchain.benchmark.dataset.natural_question module
- class RAGchain.benchmark.dataset.natural_question.NaturalQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
NATURALQAEvaluator is a class for evaluating pipeline performance on natural qa dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. You must ingest all data for using context_recall metrics.
RAGchain.benchmark.dataset.nfcorpus module
- class RAGchain.benchmark.dataset.nfcorpus.NFCorpusEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
NFCorpusEvaluator is a class for evaluating pipeline performance on NFCorpus dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None, random_state=None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. :param random_state: A random state to fix the shuffled corpus to ingest. Types are like these. int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
Notice: If the ingest size is excessively large, it results in prolonged processing times. To address this, we shuffle the corpus and slice it according to the ingest size for testing purposes. The reason for transforming the retrieval ground truth corpus into passages and ingesting it is to enable retrieval to retrieve the retrieval ground truth within the database.
RAGchain.benchmark.dataset.qasper module
- class RAGchain.benchmark.dataset.qasper.QasperEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int, metrics: List[str] | None = None, random_state: int = 42)
Bases:
BaseDatasetEvaluator
QasperEvaluator is a class for evaluating pipeline performance on Qasper dataset.
- dataset_name = 'NomaDamas/qasper'
- evaluate(**kwargs) EvaluateResult
Evaluate pipeline performance on Qasper dataset. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Set ingest params for evaluating pipeline. In this method, we don’t ingest passages, because Qasper dataset is not designed for ingest all paragraphs and retrieve it. It only has questions that are related to certain papers. So, we ingest each paper’s paragraphs when we evaluate it. :param retrievals: retrievals to ingest :param db: db to ingest :param ingest_size: Default is None. You don’t need to set this params. If you set, it will ignore this param.
- preprocess(data)
Preprocess Qasper dataset to make it suitable for evaluating pipeline.
RAGchain.benchmark.dataset.search_qa module
- class RAGchain.benchmark.dataset.search_qa.SearchQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
SearchQAEvaluator is a class for evaluating pipeline performance on search qa dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None, random_state=None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. If the ingest size is excessively large, it results in prolonged processing times. To address this, we shuffle the corpus and slice it according to the ingest size for testing purposes. The reason for transforming the retrieval ground truth corpus into passages and ingesting it is to enable retrieval to retrieve the retrieval ground truth within the database. This dataset has many retrieval ground truths per query, so it is recommended to set the ingest size to a small value. :param random_state: A random state to fix the shuffled corpus to ingest. Types are like these. int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
RAGchain.benchmark.dataset.strategy_qa module
- class RAGchain.benchmark.dataset.strategy_qa.StrategyQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
,BaseStrategyQA
StrategyQAEvaluator is a class for evaluating pipeline performance on StrategyQA dataset.
- dataset_name = 'voidful/StrategyQA'
- evaluate(validate_passages: bool = True, **kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data. If you want to use context_recall and context_precision metrics, you should ingest all data.
RAGchain.benchmark.dataset.triviaqa module
- class RAGchain.benchmark.dataset.triviaqa.TriviaQAEvaluator(run_pipeline: BaseRunPipeline, evaluate_size: int | None = None, metrics: List[str] | None = None)
Bases:
BaseDatasetEvaluator
TriviaQAEvaluator is a class for evaluating pipeline performance on TriviaQA dataset.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- ingest(retrievals: List[BaseRetrieval], db: BaseDB, ingest_size: int | None = None)
Ingest dataset to retrievals and db. :param retrievals: The retrievals that you want to ingest. :param db: The db that you want to ingest. :param ingest_size: The number of data to ingest. If None, ingest all data.