RAGchain.benchmark package
Subpackages
- RAGchain.benchmark.answer package
- RAGchain.benchmark.dataset
package
- Submodules
- RAGchain.benchmark.dataset.antique module
- RAGchain.benchmark.dataset.asqa module
- RAGchain.benchmark.dataset.base module
- RAGchain.benchmark.dataset.dstc11_track5 module
- RAGchain.benchmark.dataset.eli5 module
- RAGchain.benchmark.dataset.ko_strategy_qa module
- RAGchain.benchmark.dataset.mr_tydi module
- RAGchain.benchmark.dataset.msmarco module
- RAGchain.benchmark.dataset.natural_question module
- RAGchain.benchmark.dataset.nfcorpus module
- RAGchain.benchmark.dataset.qasper module
- RAGchain.benchmark.dataset.search_qa module
- RAGchain.benchmark.dataset.strategy_qa module
- RAGchain.benchmark.dataset.triviaqa module
- Module contents
- RAGchain.benchmark.extra package
- RAGchain.benchmark.retrieval package
Submodules
RAGchain.benchmark.auto module
- class RAGchain.benchmark.auto.AutoEvaluator(pipeline: BaseRunPipeline, questions: List[str], metrics=None)
Bases:
BaseEvaluator
Evaluate metrics without ground truths. You only need to pass questions and your pipeline. You have to ingest properly to retrievals and DBs. Recommend to use IngestPipeline to ingest.
- evaluate(**kwargs) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
RAGchain.benchmark.base module
- class RAGchain.benchmark.base.BaseEvaluator(run_all: bool = True, metrics: List[str] | None = None)
Bases:
ABC
- answer_gt_metrics = ['BLEU', 'METEOR', 'ROUGE', 'EM_answer']
- answer_no_gt_ragas_metrics = ['answer_relevancy', 'faithfulness']
- answer_passage_metrics = ['KF1']
- abstract evaluate(validate_passages: bool = True) EvaluateResult
Evaluate metrics and return the results :param validate_passages: If True, validate passages in retrieval_gt already ingested. If False, you can’t use context_recall and KF1 metrics. We recommend to set True for robust evaluation. :return: EvaluateResult
- retrieval_gt_metrics = ['Hole', 'TopK_Accuracy', 'EM_retrieval', 'F1_score', 'Recall', 'Precision']
- retrieval_gt_metrics_rank_aware = ['AP', 'NDCG', 'CG', 'Ind_DCG', 'DCG', 'Ind_IDCG', 'IDCG', 'RR']
- retrieval_gt_ragas_metrics = ['context_recall']
- retrieval_no_gt_ragas_metrics = ['context_precision']
- static uuid_to_str(id_list: List[str | UUID]) List[str]
- class RAGchain.benchmark.base.DummyRetrieval
Bases:
BaseRetrieval
- retrieve(query: str, top_k: int = 5, *args, **kwargs) List[Passage]
retrieve passages at ingested vector representation of passages.
- retrieve_id(query: str, top_k: int = 5, *args, **kwargs) List[str | UUID]
retrieve passage ids at ingested vector representation of passages.
- retrieve_id_with_scores(query: str, top_k: int = 5, *args, **kwargs) tuple[List[str | UUID], List[float]]
retrieve passage ids and similarity scores at ingested vector representation of passages.