# Using the Takeoff API (Client-side)

{% hint style="danger" %}
These docs are outdated! Please check out <https://docs.titanml.co/docs/category/titan-takeoff> for the latest information on the Titan Takeoff server.\
\
If there's anything that's not covered there, please contact us on our [discord](https://discord.com/invite/83RmHTjZgf).
{% endhint %}

### Streaming tokens

The takeoff API is lets you send requests to the API and get back streamed tokens. Especially for long text generations, token streaming is a really effective way of improving the usability of large language models.&#x20;

Even if the model inferences very quickly, if the goal is to generate 100-1000s of tokens, waiting for the full process to finish before showing results to users can feel like a long time, even if the time per token is low.

Turning streamed tokens to non-streamed tokens is easy on the client side, by streaming the tokens into a buffer and returning the buffer once it's full.

Here we provide two API endpoint serving in FastAPI: `/generate`, and `/generate_stream`

one using normal json response and one using streaming response.

### Example&#x20;

We are going to use the python **requests** library to call the model as an API:

#### Streaming Response

```
import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate_stream"
    json = {"text":input_text}
     
    response = requests.post(url, json=json, stream=True)
    response.encoding = 'utf-8'
    
    for text in response.iter_content(chunk_size=1, decode_unicode=True):
        if text:
            print(text, end="", flush=True)
```

This will print, token-by-token, the output of the previous model.

The same can be done on the command line using curl:

```
curl -X POST http://localhost:8000/generate_stream -N -H "Content-Type: application/json" -d '{"text":"List 3 things to do in London"}'
```

#### Normal Response

```
import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate"
    json = {"text":input_text}
     
    response = requests.post(url, json=json)

    if "message" in response.json():
        print(response.json()["message"])
```

This will print the entire response as the output of the previous model.

The same can be done on the command line using curl:

```
curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{"text":"List 3 things to do in London"}'
```

##


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://titanml.gitbook.io/iris-takeoff-documentation/using-the-takeoff-api-client-side.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
