---
myst:
   html_meta:
      title: AutoRAG - Deploy API endpoint
      description: Learn how to deploy optimized RAG pipeline to Flask API server in AutoRAG
      keywords: AutoRAG,RAG,RAG deploy,RAG API,Flask
---
# API endpoint

## Running API server

As mentioned in the tutorial, you can run api server as follows:

```python
from autorag.deploy import ApiRunner
import nest_asyncio

nest_asyncio.apply()

runner = ApiRunner.from_yaml('your/path/to/pipeline.yaml', project_dir='your/project/directory')
runner.run_api_server()
```

or

```python
from autorag.deploy import ApiRunner
import nest_asyncio

nest_asyncio.apply()

runner = ApiRunner.from_trial_folder('/your/path/to/trial_dir')
runner.run_api_server()
```

```bash
autorag run_api --trial_dir /trial/dir/0 --host 0.0.0.0 --port 8000
```

## Use NGrok Tunnel for public access

For accessing the API server from the public, you can use the NGrok tunnel service.
It automatically creates ngrok tunnel to your local server.

You can see the logs of the public URL like below:

```
INFO     [api.py:199] >> Public API URL:          api.py:199
         https://8a31-14-52-132-205.ngrok-free.app
```
This is the URL to your local server, so use it as the host at request.


## Endpoints

### 1. `/v1/run` (POST)

- **Summary**: Run a query and get generated text with retrieved passages.
- **Request Body**:
  - **Content Type**: `application/json`
  - **Schema**:
    - **Properties**:
      - `query` (string, required): The query string.
      - `result_column` (string, optional): The result column name. Default is `generated_texts`.
- **Responses**:
  - **200 OK**:
    - **Content Type**: `application/json`
    - **Schema**:
      - **Properties**:
        - `result` (string or array of strings): The result text or list of texts.
        - `retrieved_passage` (array of objects): List of retrieved passages.
          - **Properties**:
            - `content` (string): The content of the passage.
            - `doc_id` (string): Document ID.
            - `score` (string): The relevance score of the retrieved passage.
            - `filepath` (string, nullable): File path.
            - `file_page` (integer, nullable): File page number.
            - `start_idx` (integer, nullable): Start index.
            - `end_idx` (integer, nullable): End index.

---

### 2. `/v1/retrieve` (POST)

This API endpoint allows developers to retrieve documents based on a specified query.
It will ignore generator and prompt maker, only return retrieved passages.

The request must include a JSON object with the following structure:

```json
{
  "query": "your query string here"
}
```

#### Parameters
- **query** (string, required): The search string used to retrieve documents.

### Example Request
```json
{
  "query": "latest trends in AI"
}
```

#### Success Response
**HTTP Status Code:** `200 OK`

### Response Body

#### Properties
- **passages** (array): An array of documents retrieved based on the query.
  - **doc_id** (string): The unique identifier for each document.
  - **content** (string): The content of the retrieved document.
  - **score** (number, float): The relevance score of the retrieved document.
  - **filepath** (string, optional): The file path of the document.
  - **file_page** (integer, optional): The page number of the document.
  - **start_idx** (integer, optional): The start index of the retrieved passage from the parsed data.
  - **end_idx** (integer, optional): The end index of the retrieved passage from the parsed data.

#### Example Response
```json
{
  "passages": [
    {
      "doc_id": "doc123",
      "content": "Artificial Intelligence is transforming industries.",
      "score": 0.98,
      "filepath": "path/to/file",
      "file_page": 2,
      "start_idx": 100,
      "end_idx": 150
    },
    {
      "doc_id": "doc456",
      "content": "The future of AI includes advancements in machine learning.",
      "score": 0.92,
      "filepath": null,
      "file_page": null,
      "start_idx": null,
      "end_idx": null
    }
  ]
}
```

---

### 3. `/v1/stream` (POST)

- **Summary**: Stream generated text with retrieved passages.
- **Description**: This endpoint streams the generated text line by line. The `retrieved_passage` is sent first, followed by the `result` streamed incrementally.
- **Request Body**:
  - **Content Type**: `application/json`
  - **Schema**:
    - **Properties**:
      - `query` (string, required): The query string.
      - `result_column` (string, optional): The result column name. Default is `generated_texts`.
- **Responses**:
  - **200 OK**:
    - **Content Type**: `text/event-stream`
    - **Schema**:
      - **Properties**:
        - `type` (generated_text or retrieved_passage): If it is generated_text, you can see only the generated text. If it is retrieved_passage, you can see the retrieved passage and passage_index.
        - `generated_text` (string): The generated text from the generator (LLM). The result of the RAG system.
        - `retrieved_passage` (object): Retrieved passage.
          - **Properties**:
            - `content` (string): The content of the passage.
            - `doc_id` (string): Document ID.
            - `score` (string): The relevance score of the retrieved passage.
            - `filepath` (string, nullable): File path.
            - `file_page` (integer, nullable): File page number.
            - `start_idx` (integer, nullable): Start index.
            - `end_idx` (integer, nullable): End index.
        - `passage_index` (integer): Index of the retrieved passage.

---

### 4. `/version` (GET)

- **Summary**: Get the API version.
- **Description**: Returns the current version of the API as a string.
- **Responses**:
  - **200 OK**:
    - **Content Type**: `application/json`
    - **Schema**:
      - **Properties**:
        - `version` (string): The version of the API.

---

## API client usage example

Certainly! Below, I'll provide both Python sample code using the `requests` library and a `curl` command for each of the API endpoints described in the OpenAPI specification.

### Python Sample Code

First, ensure you have the `requests` library installed. You can install it using pip if you haven't already:

```bash
pip install requests
```

Here's the Python client code for each endpoint:

```python
import requests
from autorag.utils.util import decode_multiple_json_from_bytes

# Base URL of the API
BASE_URL = "http://example.com:8000"  # Replace with the actual base URL of the API

def run_query(query, result_column="generated_texts"):
    url = f"{BASE_URL}/v1/run"
    payload = {
        "query": query,
        "result_column": result_column
    }
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

def stream_query(query, result_column="generated_texts"):
    url = f"{BASE_URL}/v1/stream"
    payload = {
        "query": query,
        "result_column": result_column
    }
    with requests.Session() as session:
      response = session.post(url, json=payload, stream=True)
      retrieved_passages = [] # This will store retrieved passages

      # Check if the request was successful
      if response.status_code == 200:
          # Process the streaming response
          for i, chunk in enumerate(response.iter_content(chunk_size=None)):
              if chunk:
                  data_list = decode_multiple_json_from_bytes(chunk)
                  for data in data_list:
                      if data["type"] == "retrieved_passage":
                          retrieved_passages.append(data["retrieved_passage"])
                      else:
                          print(data["generated_text"], end="") # Stream the generated texts
      else:
          print(f"Request failed with status code: {response.status_code}")
          print(f"Response content: {response.text}")

def get_version():
    url = f"{BASE_URL}/version"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

# Example usage
if __name__ == "__main__":
    # Run a query
    result = run_query("example query")
    print("Run Query Result:", result)

    # Stream a query
    print("Stream Query Result:")
    stream_query("example query")

    # Get API version
    version = get_version()
    print("API Version:", version)
```

### `curl` Commands

Here are the equivalent `curl` commands for each endpoint:

#### `/v1/run` (POST)

```bash
curl -X POST "http://example.com/v1/run" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query", "result_column": "generated_texts"}'
```

#### `/v1/stream` (POST)

```bash
curl -X POST "http://example.com/v1/stream" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query", "result_column": "generated_texts"}' \
     --no-buffer
```

### `/v1/retrieve` (POST)

```bash
curl -X POST "http://example.com/v1/retrieve" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query"}'
```

#### `/version` (GET)

```bash
curl -X GET "http://example.com/version"
```