🛫
Iris Takeoff Documentation
  • What is Iris Takeoff? 🦅
  • Getting Started
  • Using the Takeoff API (Client-side)
  • Built-in Interfaces
  • Shutting Down
  • Supported models
  • Using a local model
  • Other bits!
  • Generation Parameters
  • Takeoff Roadmap
Powered by GitBook
On this page
  • Streaming tokens
  • Example

Using the Takeoff API (Client-side)

PreviousGetting StartedNextBuilt-in Interfaces

Last updated 1 year ago

These docs are outdated! Please check out for the latest information on the Titan Takeoff server. If there's anything that's not covered there, please contact us on our .

Streaming tokens

The takeoff API is lets you send requests to the API and get back streamed tokens. Especially for long text generations, token streaming is a really effective way of improving the usability of large language models.

Even if the model inferences very quickly, if the goal is to generate 100-1000s of tokens, waiting for the full process to finish before showing results to users can feel like a long time, even if the time per token is low.

Turning streamed tokens to non-streamed tokens is easy on the client side, by streaming the tokens into a buffer and returning the buffer once it's full.

Here we provide two API endpoint serving in FastAPI: /generate, and /generate_stream

one using normal json response and one using streaming response.

Example

We are going to use the python requests library to call the model as an API:

Streaming Response

import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate_stream"
    json = {"text":input_text}
     
    response = requests.post(url, json=json, stream=True)
    response.encoding = 'utf-8'
    
    for text in response.iter_content(chunk_size=1, decode_unicode=True):
        if text:
            print(text, end="", flush=True)

This will print, token-by-token, the output of the previous model.

The same can be done on the command line using curl:

curl -X POST http://localhost:8000/generate_stream -N -H "Content-Type: application/json" -d '{"text":"List 3 things to do in London"}'

Normal Response

import requests

if __name__ == "__main__":
    
    input_text = 'List 3 things to do in London.'
    
    url = "http://localhost:8000/generate"
    json = {"text":input_text}
     
    response = requests.post(url, json=json)

    if "message" in response.json():
        print(response.json()["message"])

This will print the entire response as the output of the previous model.

The same can be done on the command line using curl:

curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{"text":"List 3 things to do in London"}'

https://docs.titanml.co/docs/category/titan-takeoff
discord