RAGchain.pipeline package
Submodules
RAGchain.pipeline.base module
- class RAGchain.pipeline.base.BaseIngestPipeline
Bases:
ABC
Base class for all pipelines
- class RAGchain.pipeline.base.BaseRunPipeline
Bases:
ABC
- default_chat_prompt = RAGchainChatPromptTemplate(input_variables=['passages', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['passages'], template="Given the information, answer the question. If you don't know the answer, don't make up the answer, just say you don't know.Information : \n{passages}")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='Question: {question}')), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Answer: '))])
- default_prompt = RAGchainPromptTemplate(input_variables=['passages', 'question'], template="\n Given the information, answer the question. If you don't know the answer, don't make up \n the answer, just say you don't know.\n\n Information :\n {passages}\n\n Question: {question}\n\n Answer:\n ")
- abstract get_passages_and_run(questions: List[str], top_k: int = 5) tuple[List[str], List[List[Passage]], List[List[float]]]
Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.
- Parameters:
-
questions – List of questions.
top_k – The number of passages to retrieve.
It is the same as retrieval_options top_k. Default is 5.
RAGchain.pipeline.basic module
- class RAGchain.pipeline.basic.BasicIngestPipeline(file_loader: ~langchain_community.document_loaders.base.BaseLoader, db: ~RAGchain.DB.base.BaseDB, retrieval: ~RAGchain.retrieval.base.BaseRetrieval, text_splitter: ~RAGchain.preprocess.text_splitter.base.BaseTextSplitter = <RAGchain.preprocess.text_splitter.text_splitter.RecursiveTextSplitter object>, ignore_existed_file: bool = True)
Bases:
BaseIngestPipeline
Basic ingest pipeline class. This class handles the ingestion process of documents into a database and retrieval system. First, load file from directory using file loader. Second, split a document into passages using text splitter. Third, save passages to a database. Fourth, ingest passages to retrieval module.
- Example:
>>> from RAGchain.pipeline.basic import BasicIngestPipeline >>> from RAGchain.DB import PickleDB >>> from RAGchain.retrieval import BM25Retrieval >>> from RAGchain.preprocess.loader import FileLoader
>>> file_loader = FileLoader(target_dir="./data") >>> db = PickleDB("./db") >>> retrieval = BM25Retrieval(save_path="./bm25.pkl") >>> pipeline = BasicIngestPipeline(file_loader=file_loader, db=db, retrieval=retrieval) >>> pipeline.run.invoke(None)
- class RAGchain.pipeline.basic.BasicRunPipeline(retrieval: BaseRetrieval, llm: BaseLanguageModel, prompt: RAGchainPromptTemplate | RAGchainChatPromptTemplate | None = None)
Bases:
BaseRunPipeline
Basic run pipeline class. This class handles the run process of document question answering. First, retrieve passages from retrieval module. Second, run LLM module to get answer. Finally, you can get answer and passages as return value.
- Example:
>>> from RAGchain.pipeline.basic import BasicRunPipeline >>> from RAGchain.retrieval import BM25Retrieval >>> from langchain.llms.openai import OpenAI
>>> retrieval = BM25Retrieval(save_path="./bm25.pkl") >>> pipeline = BasicRunPipeline(retrieval=retrieval, llm=OpenAI()) >>> answer, passages, rel_scores = pipeline.get_passages_and_run(questions=["Where is the capital of Korea?"]) >>> # Run with Langchain LCEL >>> answer = pipeline.run.invoke("Where is the capital of Korea?")
- get_passages_and_run(questions: List[str], top_k: int = 5) tuple[List[str], List[List[Passage]], List[List[float]]]
Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.
- Parameters:
-
questions – List of questions.
top_k – The number of passages to retrieve.
It is the same as retrieval_options top_k. Default is 5.
- run: Runnable | None
RAGchain.pipeline.google_search module
- class RAGchain.pipeline.google_search.GoogleSearchRunPipeline(llm: BaseLLM, prompt: RAGchainPromptTemplate | None = None)
Bases:
BaseRunPipeline
- get_passages_and_run(questions: List[str], top_k: int = 5) tuple[List[str], List[List[Passage]], List[List[float]]]
Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.
- Parameters:
-
questions – List of questions.
top_k – The number of passages to retrieve.
It is the same as retrieval_options top_k. Default is 5.
- run: Runnable | None
RAGchain.pipeline.rerank module
- class RAGchain.pipeline.rerank.RerankRunPipeline(retrieval: BaseRetrieval, reranker: BaseReranker, llm: BaseLanguageModel, prompt: RAGchainPromptTemplate | RAGchainChatPromptTemplate | None = None, use_passage_count: int = 5)
Bases:
BaseRunPipeline
Rerank pipeline is for question answering with retrieved passages using reranker. Af first, retrieval module will retrieve retrieve_size passages for reranking. Then, reranker rerank passages and use use_passage_count passages for llm question.
- Example:
>>> from RAGchain.pipeline.rerank import RerankRunPipeline >>> from RAGchain.retrieval import BM25Retrieval >>> from RAGchain.reranker import MonoT5Reranker >>> from langchain.llms.openai import OpenAI
>>> retrieval = BM25Retrieval(save_path="./bm25.pkl") >>> reranker = MonoT5Reranker() >>> llm = OpenAI() >>> pipeline = RerankRunPipeline(retrieval, reranker, llm) >>> answer, passages, rel_scores = pipeline.get_passages_and_run(["What is the purpose of this framework based on the document?"]) >>> print(answer[0])
- get_passages_and_run(questions: List[str], top_k: int = 5) tuple[List[str], List[List[Passage]], List[List[float]]]
Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.
- Parameters:
-
questions – List of questions.
top_k – The number of passages to retrieve.
It is the same as retrieval_options top_k. Default is 5.
- run: Runnable | None
RAGchain.pipeline.visconde module
- class RAGchain.pipeline.visconde.ViscondeRunPipeline(retrieval: BaseRetrieval, llm: BaseLLM, decompose: QueryDecomposition | None = None, prompt: RAGchainPromptTemplate | None = None, use_passage_count: int = 3)
Bases:
BaseRunPipeline
- get_passages_and_run(questions: List[str], top_k: int = 50) tuple[List[str], List[List[Passage]], List[List[float]]]
Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.
- Parameters:
-
questions – List of questions.
top_k – The number of passages to retrieve.
It is the same as retrieval_options top_k. Default is 5.
- run: Runnable | None
- strategyqa_prompt = RAGchainPromptTemplate(input_variables=['passages', 'question'], template='For each example, use the documents to create an "Answer" and an "Explanation" to the "Question". Just answer yes or no.\n\n Example 1:\n\n [Document 1]: \n Title: San Tropez (song). \n Content: "San Tropez" is the fourth track from the album Meddle by the band Pink Floyd. \n This song was one of several to be considered for the band\'s "best of" album, Echoes: The Best of Pink Floyd.\n\n [Document 2]: \n Title: French Riviera. \n Content: The French Riviera (known in French as the Côte d\'Azur [kot daˈzyʁ]; Occitan: Còsta d\'Azur [\n ˈkɔstɔ daˈzyɾ]; literal translation "Azure Coast") is the Mediterranean coastline of the southeast corner of \n France. There is no official boundary, but it is usually considered to extend from Cassis, Toulon or Saint-Tropez \n on the west to Menton at the France–Italy border in the east, where the Italian Riviera joins. The coast is \n entirely within the Provence-Alpes-Côte d\'Azur (Région Sud) region of France. The Principality of Monaco is a \n semi-enclave within the region, surrounded on three sides by France and fronting the Mediterranean.\n\n [Document 3]: \n Title: Moon Jae-in. \n Content: Moon also promised transparency in his presidency, moving the presidential residence from the palatial and \n isolated Blue House to an existing government complex in downtown Seoul.\n\n [Document 4]: \n Title: Saint-Tropez. \n Content: Saint-Tropez (US: /ˌsæn troʊˈpeɪ/ SAN-troh-PAY, French: [sɛ̃ tʁɔpe]; Occitan: Sant-Tropetz , pronounced [san(t) tʀuˈpes]) is a town on the French Riviera, \n 68 kilometres (42 miles) west of Nice and 100 kilometres (62 miles) east of Marseille in the Var department of \n the Provence-Alpes-Côte d\'Azur region of Occitania, Southern France.\n\n\n Question: Did Pink Floyd have a song about the French Riviera?\n Explanation: According to [Document 1], "San Tropez" is a song by Pink Floyd about \n the French Riviera. This is further supported by [Document 4], which states that Saint-Tropez is a town on the French Riviera. \n Therefore, the answer is yes\n Answer: yes.\n\n Example 2:\n \n {passages}\n \n Question: {question}\n Answer:\n ')