Using the Takeoff API (Client-side)
These docs are outdated! Please check out https://docs.titanml.co for the latest information on the TitanML platform. If there's anything that's not covered there, please contact us on our discord.
Streaming tokens
The takeoff API is lets you send requests to the API and get back streamed tokens. Especially for long text generations, token streaming is a really effective way of improving the usability of large language models.
Even if the model inferences very quickly, if the goal is to generate 100-1000s of tokens, waiting for the full process to finish before showing results to users can feel like a long time, even if the time per token is low.
Turning streamed tokens to non-streamed tokens is easy on the client side, by streaming the tokens into a buffer and returning the buffer once it's full.
Here we provide two API endpoint serving in FastAPI: /generate
, and /generate_stream
one using normal json response and one using streaming response.
Example
We are going to use the python requests library to call the model as an API:
Streaming Response
This will print, token-by-token, the output of the previous model.
The same can be done on the command line using curl:
Normal Response
This will print the entire response as the output of the previous model.
The same can be done on the command line using curl:
Last updated