# Using the Takeoff API (Client-side)

{% hint style="danger" %}
These docs are outdated! Please check out <https://docs.titanml.co> for the latest information on the TitanML platform.\
\
If there's anything that's not covered there, please contact us on our [discord](https://discord.com/invite/83RmHTjZgf).
{% endhint %}

### Streaming tokens

The takeoff API is lets you send requests to the API and get back streamed tokens. Especially for long text generations, token streaming is a really effective way of improving the usability of large language models.&#x20;

Even if the model inferences very quickly, if the goal is to generate 100-1000s of tokens, waiting for the full process to finish before showing results to users can feel like a long time, even if the time per token is low.

Turning streamed tokens to non-streamed tokens is easy on the client side, by streaming the tokens into a buffer and returning the buffer once it's full.

Here we provide two API endpoint serving in FastAPI: `/generate`, and `/generate_stream`

one using normal json response and one using streaming response.

### Example&#x20;

We are going to use the python **requests** library to call the model as an API:

#### Streaming Response

```
import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate_stream"
    json = {"text":input_text}
     
    response = requests.post(url, json=json, stream=True)
    response.encoding = 'utf-8'
    
    for text in response.iter_content(chunk_size=1, decode_unicode=True):
        if text:
            print(text, end="", flush=True)
```

This will print, token-by-token, the output of the previous model.

The same can be done on the command line using curl:

```
curl -X POST http://localhost:8000/generate_stream -N -H "Content-Type: application/json" -d '{"text":"List 3 things to do in London"}'
```

#### Normal Response

```
import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate"
    json = {"text":input_text}
     
    response = requests.post(url, json=json)

    if "message" in response.json():
        print(response.json()["message"])
```

This will print the entire response as the output of the previous model.

The same can be done on the command line using curl:

```
curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{"text":
```
