# Couchbase

Couchbase is a highly acclaimed distributed NoSQL cloud database known for its exceptional flexibility, performance, scalability, and cost-effectiveness, making it ideal for cloud, mobile, AI, and edge computing applications.

Vector Search is a part of the [Full Text Search Service](https://docs.couchbase.com/server/current/learn/services-and-indexes/services/search-service.html) (Search Service) in Couchbase.

You can apply these with both [Couchbase Capella](https://www.couchbase.com/products/capella/) and a self-managed Couchbase Server.

## Configuration

This guide will fit Couchbase Capella UI. If you are using a self-managed Couchbase Server, you can see [here](https://docs.couchbase.com/server/current/getting-started/do-a-quick-install.html).

To use the Couchbase vector database, you need to configure it in your YAML configuration file.

First, you need to set the Couchbase cluster connection information.

### Edit Cluster Access
Set Access `username`, `password` and `connection_string` for the Couchbase cluster.

And set bucket, scope and access level(Read/Write) for the Couchbase cluster.

### Allowed IP Addresses

You need to allow the IP address of the VectorDB server in the Couchbase cluster.

### Cluster, Bucket, Scope, Collection

`Cluster`, `Bucket` must be prepared in advance.

`Scope` and `Collection` should be prepared in advance, otherwise they will be created automatically.

### Create Index for Query

![couchbase_search_index.png](../../_static/integration/couchbase_search_index.png)

This should correspond to the `dimension` of the embeddings generated by the specified embedding model.

### Example YAML file

```yaml
- name: openai_couchbase
  db_type: couchbase
  embedding_model: openai_embed_3_large
  bucket_name: autorag   # replace your bucket name
  scope_name: autorag   # replace your scope name
  collection_name: autorag   # replace your collection name
  index_name: autorag_search   # replace your index name
  connection_string: ${COUCHBASE_CONNECTION_STRING}
  username: ${COUCHBASE_USERNAME"}
  password: ${COUCHBASE_PASSWORD"}
```

Here is a simple example of a YAML configuration file that uses the Couchbase vector database and the OpenAI:

```yaml
vectordb:
  - name: openai_couchbase
    db_type: couchbase
    embedding_model: openai_embed_3_large
    bucket_name: autorag   # replace your bucket name
    scope_name: autorag   # replace your scope name
    collection_name: autorag   # replace your collection name
    index_name: autorag_search   # replace your index name
    connection_string: ${COUCHBASE_CONNECTION_STRING}
    username: ${COUCHBASE_USERNAME"}
    password: ${COUCHBASE_PASSWORD"}
node_lines:
- node_line_name: retrieve_node_line  # Arbitrary node line name
  nodes:
    - node_type: retrieval
      strategy:
        metrics: [retrieval_f1, retrieval_recall, retrieval_precision]
      top_k: 3
      modules:
        - module_type: vectordb
          vectordb: openai_couchbase
- node_line_name: post_retrieve_node_line  # Arbitrary node line name
  nodes:
    - node_type: prompt_maker
      strategy:
        metrics: [bleu, meteor, rouge]
      modules:
        - module_type: fstring
          prompt: "Read the passages and answer the given question. \n Question: {query} \n Passage: {retrieved_contents} \n Answer : "
    - node_type: generator
      strategy:
        metrics: [bleu, rouge]
      modules:
        - module_type: llama_index_llm
          llm: openai
          model: [ gpt-4o-mini ]
```

### Parameters

1. `embedding_model: str`
   - Purpose: Specifies the name or identifier of the embedding model to be used.
   - Example: "openai_embed_3_large"
   - Note: This should correspond to a valid embedding model that your system can use to generate vector embeddings. For more information see [custom your embedding model](https://docs.auto-rag.com/local_model.html#configure-the-embedding-model) documentation.

2. `embedding_batch: int = 100`
   - Purpose: Determines the number of embeddings to process in a single batch.
   - Default: 100
   - Note: Adjust this based on your system's memory and processing capabilities. Larger batches may be faster but require more memory.

3. `bucket_name: str`
   - Purpose: Specifies the name of the bucket where the vectors will be stored.
   - Example: "my_bucket"
   - Note: Bucket must be prepared in advance.

4. `scope_name: str`
   - Purpose: Specifies the name of the scope where the vectors will be stored.
   - Example: "my_scope"
   - Note: If the scope doesn't exist, it will be created. If it exists, it will be loaded.

5. `collection_name: str`
    - Purpose: Specifies the name of the collection where the vectors will be stored.
    - Example: "my_collection"
    - Note: If the collection doesn't exist, it will be created. If it exists, it will be loaded.

6. `index_name: str`
    - Purpose: Specifies the name of the Couchbase index to be used for querying.
    - Example: "my_vector_index"
    - Note: Index must be prepared in advance.

7. `connection_string: str`
    - Purpose: Specifies the connection string for the Couchbase cluster.
    - Note: This should be the connection string for your Couchbase cluster.

8. `username: str`
    - Purpose: Specifies the username for authentication with the Couchbase cluster.
    - Note: This should be the username for your Couchbase cluster.

9. `password: str`
    - Purpose: Specifies the password for authentication with the Couchbase cluster.
    - Note: This should be the password for your Couchbase cluster.

10. `ingest_batch: int = 100`
    - Purpose: Determines the number of vectors to ingest in a single batch.
    - Default: 100
    - Note: Adjust this based on your system's memory and processing capabilities. Larger batches may be faster but require more memory.

11. `text_key: str = "text"`
    - Purpose: Specifies the key in the document where the text data is stored.
    - Default: "text"
    - Note: This should correspond to the key in the document where the text data is stored.

12. `embedding_key: str = "embedding"`
    - Purpose: Specifies the key in the document where the vector embeddings are stored.
    - Default: "embedding"
    - Note: This should correspond to the key in the document where the vector embeddings are stored.

13. `scoped_index: bool = True`
    - Purpose: Specifies whether the index is scoped to the collection.
    - Default: True
    - Note: If True, searches in the scope. If False, searches across the entire cluster.

## Usage

Here's a brief overview of how to use the main functions of the Couchbase vector database:

1. **Adding Vectors**:
   ```python
   await couchbase_db.add(ids, texts)
   ```
   This method adds new vectors to the database.
   It takes a list of IDs and corresponding texts, generates embeddings, and inserts them into the Couchbase Collection.

2. **Querying**:
   ```python
   ids, scores = await couchbase_db.query(queries, top_k)
   ```
   Performs a similarity search on the stored vectors.
   It returns the IDs and their scores.
   Below you can see how the score is determined.

3. **Fetching Vectors**:
   ```python
   vectors = await couchbase_db.fetch(ids)
   ```
   Retrieves the vectors associated with the given IDs.

4. **Checking Existence**:
   ```python
   exists = await couchbase_db.is_exist(ids)
   ```
   Checks if the given IDs exist in the database.

5. **Deleting Vectors**:
   ```python
   await couchbase_db.delete(ids)
   ```
   Deletes the vectors associated with the given IDs from the database.