RAGchain.utils package

Subpackages

Submodules

RAGchain.utils.evidence_extractor module

class RAGchain.utils.evidence_extractor.EvidenceExtractor(llm: BaseLanguageModel, system_prompt: str | None = None)

Bases: Runnable[RetrievalResult, str]

EvidenceExtractor is a class that extracts relevant evidences based on a given question and a list of passages.

Example:

>>> from RAGchain.utils.evidence_extractor import EvidenceExtractor
>>> from RAGchain.schema import Passage
>>> from langchain.llms.openai import OpenAI
>>>
>>> passages = [
...     Passage(content="Lorem ipsum dolor sit amet"),
...     Passage(content="Consectetur adipiscing elit"),
...     Passage(content="Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua")
... ]
>>>
>>> question = "What is Lorem ipsum?"
>>> extractor = EvidenceExtractor(OpenAI())
>>> result = extractor.extract(question, passages)

>>> print(result)

property InputType: Type[Input]: The type of input this runnable accepts specified as a type annotation.

property OutputType: Type[str]: The type of output this runnable produces specified as a type annotation.

batch(inputs: List[Input], config: RunnableConfig | List[RunnableConfig] | None = None, *, return_exceptions: bool = False, **kwargs: Any | None) → List[Output]

Default implementation runs invoke in parallel using a thread pool executor.

The default implementation of batch works well for IO bound runnables.

Subclasses should override this method if they can batch more efficiently; e.g., if the underlying runnable uses an API which supports a batch mode.

extract(question: str, passages: List[Passage]) → str

Extract method extracts relevant document evidences based on a question and a list of passages.

Parameters:

question – The question for which relevant document fragments need to be extracted.
passages – A list of Passage objects that contain the content of the documents.

Returns:

The extracted relevant document fragments.

invoke(input: Input, config: RunnableConfig | None = None) → Output

Transform a single input into an output. Override to implement.

Args:: input: The input to the runnable. config: A config to use when invoking the runnable.

The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys. Please refer to the RunnableConfig for more details.
Returns:: The output of the runnable.

RAGchain.utils.file_cache module

class RAGchain.utils.file_cache.FileCache(db: BaseDB)

Bases: Runnable[List[Document], List[Document]]

This class is used to delete duplicate documents from given DB. You can use this after you load your file to Document using file loader. It will automatically check duplicate documents using source metadata and return non-duplicate documents.

Example:

>>> from RAGchain.utils.file_cache import FileCache
>>> from RAGchain.DB import PickleDB
>>> from langchain.document_loaders import TextLoader
>>>
>>> db = PickleDB(save_path='./pickle_db.pkl')
>>> file_cache = FileCache(db)
>>> documents = TextLoader('./data.txt').load()
>>> documents = file_cache.delete_duplicate(documents)

property InputType: type: The type of input this runnable accepts specified as a type annotation.

property OutputType: type: The type of output this runnable produces specified as a type annotation.

delete_duplicate(documents: List[Document]) → List[Document]

invoke(input: Input, config: RunnableConfig | None = None) → Output

Transform a single input into an output. Override to implement.

Args:: input: The input to the runnable. config: A config to use when invoking the runnable.

The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys. Please refer to the RunnableConfig for more details.
Returns:: The output of the runnable.

RAGchain.utils.query_decompose module

This code is inspired by Visconde paper and its github repo. @inproceedings{10.1007/978-3-031-28238-6_44, author = {Pereira, Jayr and Fidalgo, Robson and Lotufo, Roberto and Nogueira, Rodrigo}, title = {Visconde: Multi-Document QA With GPT-3 And Neural Reranking}, year = {2023}, isbn = {978-3-031-28237-9}, publisher = {Springer-Verlag}, address = {Berlin, Heidelberg}, url = {https://doi.org/10.1007/978-3-031-28238-6_44}, doi = {10.1007/978-3-031-28238-6_44}, booktitle = {Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II}, pages = {534–543}, numpages = {10}, location = {Dublin, Ireland} }

class RAGchain.utils.query_decompose.QueryDecomposition(llm: BaseLLM)

Bases: Runnable[str, List[str]]

Query Decomposition class. You can decompose a multi-hop questions to multiple single-hop questions using LLM. The default decomposition prompt is from Visconde paper, and its prompt is few-shot prompts from strategyQA dataset.

decompose(query: str) → List[str]: decompose query to little piece of questions. :param query: str, query to decompose. :return: List[str], list of decomposed query. Return input query if query is not decomposable.

decompose_prompt = PromptTemplate(input_variables=['question'], template='Decompose a question in self-contained sub-questions. Use "The question needs no decomposition" when no decomposition is needed.\n \n Example 1:\n \n Question: Is Hamlet more common on IMDB than Comedy of Errors?\n Decompositions: \n 1: How many listings of Hamlet are there on IMDB?\n 2: How many listing of Comedy of Errors is there on IMDB?\n \n Example 2:\n \n Question: Are birds important to badminton?\n \n Decompositions:\n The question needs no decomposition\n \n Example 3:\n \n Question: Is it legal for a licensed child driving Mercedes-Benz to be employed in US?\n \n Decompositions:\n 1: What is the minimum driving age in the US?\n 2: What is the minimum age for someone to be employed in the US?\n \n Example 4:\n \n Question: Are all cucumbers the same texture?\n \n Decompositions:\n The question needs no decomposition\n \n Example 5:\n \n Question: Hydrogen\'s atomic number squared exceeds number of Spice Girls?\n \n Decompositions:\n 1: What is the atomic number of hydrogen?\n 2: How many Spice Girls are there?\n \n Example 6:\n \n Question: {question}\n \n Decompositions:"\n ')

invoke(input: Input, config: RunnableConfig | None = None) → Output

Transform a single input into an output. Override to implement.

Args:: input: The input to the runnable. config: A config to use when invoking the runnable.

The config supports standard keys like ‘tags’, ‘metadata’ for tracing purposes, ‘max_concurrency’ for controlling how much work to do in parallel, and other keys. Please refer to the RunnableConfig for more details.
Returns:: The output of the runnable.

RAGchain.utils.rede_search_detector module

class RAGchain.utils.rede_search_detector.RedeSearchDetector(threshold: float | None = None, embedding: Embeddings | None = None)

Bases: object

This class is implementation of REDE, the method for detect knowledge-seeking turn in few-shot setting. It contains train function for your custom model, and inference function for detect knowledge-seeking turn. You will need non-knowledge seeking turn dialogues. Plus, it will be great you have few knowledge-seeking turn dialogues.

The method is implementation of below paper:

@article{jin2021towards,: title={Towards zero and few-shot knowledge-seeking turn detection in task-orientated dialogue systems}, author={Jin, Di and Gao, Shuyang and Kim, Seokhwan and Liu, Yang and Hakkani-Tur, Dilek}, journal={arXiv preprint arXiv:2109.08820}, year={2021}

}

detect(sentences: List[str]) → List[bool]

Parameters:: sentences – Sentences to detect. List[str].
Returns:: True if the sentence is knowledge-seeking turn, else False. List[bool].

evaluate(test_knowledge_seeking_sentences: List[str], test_non_knowledge_seeking_sentences: List[str]): Evaluate rede search detector using test dataset. :param test_knowledge_seeking_sentences: knowledge-seeking turn sentences for test. List[str]. :param test_non_knowledge_seeking_sentences: non-knowledge-seeking turn sentences for test. List[str].

find_representation_transform(knowledge_seeking_sentences: List[str], L: int | None = None)

Parameters:

knowledge_seeking_sentences – Knowledge-seeking turn sentences. List[str].
L – Number of dimensions of the transformed representation. If None, use whole dimension.

Default is None.

find_threshold(valid_knowledge_seeking_sentences: List[str], valid_non_knowledge_seeking_sentences: List[str]): Find threshold using Youden’s index from validation data predictions. :param valid_knowledge_seeking_sentences: knowledge-seeking turn sentences for validation. List[str]. You can put same sentences that you used for find_representation_transform function. :param valid_non_knowledge_seeking_sentences: non-knowledge-seeking turn sentences for validation. List[str].

representation_formation(vectors: ndarray) → ndarray

Parameters:: vectors – Vectors after encoding. np.ndarray.
Returns:: Transformed vectors. np.ndarray.

train_density_estimation(gmm: GaussianMixture, non_knowledge_seeking_sentences: List[str])

Parameters:

gmm – Gaussian Mixture Model for classify knowledge-seeking turn. GaussianMixture. n_components must be 1.
non_knowledge_seeking_sentences – Non-knowledge-seeking turn sentences. List[str].

RAGchain.utils.semantic_clustering module

class RAGchain.utils.semantic_clustering.SemanticClustering(embedding_function: Embeddings, clustering_algorithm: str = 'kmeans')

Bases: object

This class is used to cluster the passages based on their semantic information. First, we vectorize to embedding vector for representing each passages’ semantic information. Second, we cluster the embedding vectors by using various clustering algorithm.

There are no optimal clustering algorithm for all cases. So, you can try various clustering algorithm.

cluster(passages: List[Passage], **kwargs) → List[List[Passage]]

clustering :param passages: list of passages to be clustered. :param kwargs: kwargs for clustering algorithm.

Returns:: 2-d list of clustered Passages. Each cluster is a list of passages.

RAGchain.utils.util module

class RAGchain.utils.util.FileChecker(file_path: str)

Bases: object

FileChecker is a class to check file type and existence.

check_type(file_type: str | None = None, file_types: List[str] | None = None)

Parameters:: file_type – str, file type to check. Default is None. You must use this when you want to check only one file type.

When you use this, you don’t need to use file_types. :param file_types: List[str], file types to check. Default is None. You must use this when you want to check multiple file types. When you use this, you don’t need to use file_type.

is_exist(): check file existence. :return: bool, True if file exists, else False.

RAGchain.utils.util.set_api_base(api_base: str)

RAGchain.utils.util.slice_stop_words(input_str: str, stop_words: List[str])

RAGchain.utils.util.text_modifier(text: str, modify_words: List[str] | None = None) → List[str]: You have to separate each word with underbar ‘_’

RAGchain.utils package

Subpackages

Submodules

RAGchain.utils.evidence_extractor module

RAGchain.utils.file_cache module

RAGchain.utils.query_decompose module

RAGchain.utils.rede_search_detector module

RAGchain.utils.semantic_clustering module

RAGchain.utils.util module

Module contents