🛫
Iris Takeoff Documentation
  • What is Iris Takeoff? 🦅
  • Getting Started
  • Using the Takeoff API (Client-side)
  • Built-in Interfaces
  • Shutting Down
  • Supported models
  • Using a local model
  • Other bits!
  • Generation Parameters
  • Takeoff Roadmap
Powered by GitBook
On this page
  • Getting a model
  • Launching the Server
  • Testing the model
  • End to End Demo

Getting Started

PreviousWhat is Iris Takeoff? 🦅NextUsing the Takeoff API (Client-side)

Last updated 1 year ago

These docs are outdated! Please check out for the latest information on the Titan Takeoff server. If there's anything that's not covered there, please contact us on our .

To get started with Iris Takeoff, all you need is to have and python installed on your local system. If you wish to use the server with gpu suport, then you will need to install docker with .

The first step is to install the TitanML local python package, , using pip.

pip install titan-iris

Getting a model

Once Iris is installed, the next step is to select a model to inference. Iris Takeoff supports many of the most powerful generative text models, such as , , and . See the support page for a list of models that work.

These can be found on their respective HuggingFace pages. We support directly passing in a HuggingFace model name, or passing in a local path to one of these models saved locally using .

Going forward in this demo we will be using the . This is a good open source model that is trained to follow instructions, and is small enough to easily inference even on CPUs.

Launching the Server

To start the server, run the takeoff command:

iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port 8000
Note:
    - Touse an NVIDIA gpu (if one is available) use the flag `--device cuda` instead
    - Takeoff uses port 8000 by default
    - You don't explicitly need to add a port unless 8000 is already busy

If you want to start a server using llama2, see .


This will trigger a prompt to login to a TitanML account to be able to download the docker container:

You will have to create a free TitanML account to download the docker.

Once this is done the model be downloaded, prepared for inference, and a server started on your device. We can see this by running:

Input:

docker ps  

Output:

Note:
    - By default, this will use the port 8000 on your machine. Please make sure it's free. 
    - If you want to specify a different port, you can do:
        `iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port <your_port>`
    - For more information, please check `Using the Takeoff API` page

Testing the model

With this port exposed, you can now send commands to the model and have tokens streamed back to you.

This is the foundation that can power a large number of LLM apps. We provide a minimal chat interface that runs locally in order to test the performance and speed of the model. This chat interface is limited, but enough to test inference speeds and model quality.

To start the chat interface run the following command:

iris takeoff --infer --port 8000
Note:
    - This will open up a chat interface with whichever model is hosted on port 8000.
    - If you chose a different port, pass in a different port here.
    - Again, infer uses port 8000 by default, so you only need this flag if a differetn port is used.

The output should look something like this:

Here is a video of the working chat server:

If you are happy with the performance, then we can look to build an application on top of the server.

End to End Demo

https://docs.titanml.co/docs/category/titan-takeoff
discord
docker
cuda support
Iris
Falcon
MPT
Llama
.save_pretrained()
falcon 7B Instruct model
here
An example of what you should see in the command line
The Iris login confirmation page.
The output of docker ps. You should see a tytn/fabulinus server running.