RAGchain.pipeline package

Submodules

RAGchain.pipeline.base module

class RAGchain.pipeline.base.BaseIngestPipeline

Bases: ABC

Base class for all pipelines

class RAGchain.pipeline.base.BaseRunPipeline

Bases: ABC

default_chat_prompt = RAGchainChatPromptTemplate(input_variables=['passages', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['passages'], template="Given the information, answer the question. If you don't know the answer, don't make up the answer, just say you don't know.Information : \n{passages}")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='Question: {question}')), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Answer: '))])

default_prompt = RAGchainPromptTemplate(input_variables=['passages', 'question'], template="\n Given the information, answer the question. If you don't know the answer, don't make up \n the answer, just say you don't know.\n\n Information :\n {passages}\n\n Question: {question}\n\n Answer:\n ")

abstract get_passages_and_run(questions: List[str], top_k: int = 5) → tuple[List[str], List[List[Passage]], List[List[float]]]

Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.

Parameters:

questions – List of questions.
top_k – The number of passages to retrieve.

It is the same as retrieval_options top_k. Default is 5.

RAGchain.pipeline.basic module

class RAGchain.pipeline.basic.BasicIngestPipeline(file_loader: ~langchain_community.document_loaders.base.BaseLoader, db: ~RAGchain.DB.base.BaseDB, retrieval: ~RAGchain.retrieval.base.BaseRetrieval, text_splitter: ~RAGchain.preprocess.text_splitter.base.BaseTextSplitter = <RAGchain.preprocess.text_splitter.text_splitter.RecursiveTextSplitter object>, ignore_existed_file: bool = True)

Bases: BaseIngestPipeline

Basic ingest pipeline class. This class handles the ingestion process of documents into a database and retrieval system. First, load file from directory using file loader. Second, split a document into passages using text splitter. Third, save passages to a database. Fourth, ingest passages to retrieval module.

Example:

>>> from RAGchain.pipeline.basic import BasicIngestPipeline
>>> from RAGchain.DB import PickleDB
>>> from RAGchain.retrieval import BM25Retrieval
>>> from RAGchain.preprocess.loader import FileLoader

>>> file_loader = FileLoader(target_dir="./data")
>>> db = PickleDB("./db")
>>> retrieval = BM25Retrieval(save_path="./bm25.pkl")
>>> pipeline = BasicIngestPipeline(file_loader=file_loader, db=db, retrieval=retrieval)
>>> pipeline.run.invoke(None)

class RAGchain.pipeline.basic.BasicRunPipeline(retrieval: BaseRetrieval, llm: BaseLanguageModel, prompt: RAGchainPromptTemplate | RAGchainChatPromptTemplate | None = None)

Bases: BaseRunPipeline

Basic run pipeline class. This class handles the run process of document question answering. First, retrieve passages from retrieval module. Second, run LLM module to get answer. Finally, you can get answer and passages as return value.

Example:

>>> from RAGchain.pipeline.basic import BasicRunPipeline
>>> from RAGchain.retrieval import BM25Retrieval
>>> from langchain.llms.openai import OpenAI

>>> retrieval = BM25Retrieval(save_path="./bm25.pkl")
>>> pipeline = BasicRunPipeline(retrieval=retrieval, llm=OpenAI())
>>> answer, passages, rel_scores = pipeline.get_passages_and_run(questions=["Where is the capital of Korea?"])
>>> # Run with Langchain LCEL
>>> answer = pipeline.run.invoke("Where is the capital of Korea?")

get_passages_and_run(questions: List[str], top_k: int = 5) → tuple[List[str], List[List[Passage]], List[List[float]]]

Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.

Parameters:

questions – List of questions.
top_k – The number of passages to retrieve.

It is the same as retrieval_options top_k. Default is 5.

run: Runnable | None

RAGchain.pipeline.google_search module

class RAGchain.pipeline.google_search.GoogleSearchRunPipeline(llm: BaseLLM, prompt: RAGchainPromptTemplate | None = None)

Bases: BaseRunPipeline

get_passages_and_run(questions: List[str], top_k: int = 5) → tuple[List[str], List[List[Passage]], List[List[float]]]

Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.

Parameters:

questions – List of questions.
top_k – The number of passages to retrieve.

It is the same as retrieval_options top_k. Default is 5.

run: Runnable | None

RAGchain.pipeline.rerank module

class RAGchain.pipeline.rerank.RerankRunPipeline(retrieval: BaseRetrieval, reranker: BaseReranker, llm: BaseLanguageModel, prompt: RAGchainPromptTemplate | RAGchainChatPromptTemplate | None = None, use_passage_count: int = 5)

Bases: BaseRunPipeline

Rerank pipeline is for question answering with retrieved passages using reranker. Af first, retrieval module will retrieve retrieve_size passages for reranking. Then, reranker rerank passages and use use_passage_count passages for llm question.

Example:

>>> from RAGchain.pipeline.rerank import RerankRunPipeline
>>> from RAGchain.retrieval import BM25Retrieval
>>> from RAGchain.reranker import MonoT5Reranker
>>> from langchain.llms.openai import OpenAI

                      

>>> retrieval = BM25Retrieval(save_path="./bm25.pkl")
>>> reranker = MonoT5Reranker()
>>> llm = OpenAI()
>>> pipeline = RerankRunPipeline(retrieval, reranker, llm)
>>> answer, passages, rel_scores = pipeline.get_passages_and_run(["What is the purpose of this framework based on the document?"])
>>> print(answer[0])

                      

get_passages_and_run(questions: List[str], top_k: int = 5) → tuple[List[str], List[List[Passage]], List[List[float]]]

Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.

Parameters:

questions – List of questions.
top_k – The number of passages to retrieve.

It is the same as retrieval_options top_k. Default is 5.

run: Runnable | None

RAGchain.pipeline.visconde module

class RAGchain.pipeline.visconde.ViscondeRunPipeline(retrieval: BaseRetrieval, llm: BaseLLM, decompose: QueryDecomposition | None = None, prompt: RAGchainPromptTemplate | None = None, use_passage_count: int = 3)

Bases: BaseRunPipeline

get_passages_and_run(questions: List[str], top_k: int = 50) → tuple[List[str], List[List[Passage]], List[List[float]]]

Run the pipeline for evaluator, and get retrieved passages and rel scores. It is the same with pipeline.run.batch, but returns passages and rel scores. Return List of answers, List of passages, Relevance score of passages.

Parameters:

questions – List of questions.
top_k – The number of passages to retrieve.

It is the same as retrieval_options top_k. Default is 5.

run: Runnable | None

strategyqa_prompt = RAGchainPromptTemplate(input_variables=['passages', 'question'], template='For each example, use the documents to create an "Answer" and an "Explanation" to the "Question". Just answer yes or no.\n\n Example 1:\n\n [Document 1]: \n Title: San Tropez (song). \n Content: "San Tropez" is the fourth track from the album Meddle by the band Pink Floyd. \n This song was one of several to be considered for the band\'s "best of" album, Echoes: The Best of Pink Floyd.\n\n [Document 2]: \n Title: French Riviera. \n Content: The French Riviera (known in French as the Côte d\'Azur [kot daˈzyʁ]; Occitan: Còsta d\'Azur [\n ˈkɔstɔ daˈzyɾ]; literal translation "Azure Coast") is the Mediterranean coastline of the southeast corner of \n France. There is no official boundary, but it is usually considered to extend from Cassis, Toulon or Saint-Tropez \n on the west to Menton at the France–Italy border in the east, where the Italian Riviera joins. The coast is \n entirely within the Provence-Alpes-Côte d\'Azur (Région Sud) region of France. The Principality of Monaco is a \n semi-enclave within the region, surrounded on three sides by France and fronting the Mediterranean.\n\n [Document 3]: \n Title: Moon Jae-in. \n Content: Moon also promised transparency in his presidency, moving the presidential residence from the palatial and \n isolated Blue House to an existing government complex in downtown Seoul.\n\n [Document 4]: \n Title: Saint-Tropez. \n Content: Saint-Tropez (US: /ˌsæn troʊˈpeɪ/ SAN-troh-PAY, French: [sɛ̃ tʁɔpe]; Occitan: Sant-Tropetz , pronounced [san(t) tʀuˈpes]) is a town on the French Riviera, \n 68 kilometres (42 miles) west of Nice and 100 kilometres (62 miles) east of Marseille in the Var department of \n the Provence-Alpes-Côte d\'Azur region of Occitania, Southern France.\n\n\n Question: Did Pink Floyd have a song about the French Riviera?\n Explanation: According to [Document 1], "San Tropez" is a song by Pink Floyd about \n the French Riviera. This is further supported by [Document 4], which states that Saint-Tropez is a town on the French Riviera. \n Therefore, the answer is yes\n Answer: yes.\n\n Example 2:\n \n {passages}\n \n Question: {question}\n Answer:\n ')

RAGchain.pipeline package

Submodules

RAGchain.pipeline.base module

RAGchain.pipeline.basic module

RAGchain.pipeline.google_search module

RAGchain.pipeline.rerank module

RAGchain.pipeline.visconde module

Module contents