Quickstart
Getting started quickly with finetuning and inference
Last updated
Getting started quickly with finetuning and inference
Last updated
These docs are outdated! Please check out https://docs.titanml.co for the latest information on the TitanML platform. If there's anything that's not covered there, please contact us on our discord.
New to TitanML? Here's a brief, end-to-end tutorial to help you get your first inference-optimised Titan model up and running.
If you're creating an account on the platform for the first time, use the 'Sign up' button on the TitanML webapp (which we call TitanHub!) agree to the terms and conditions, and enter your details to start your 7-day free trial. If you have already set up an account with guidance from TitanML, you can skip to Step 1.
Iris is the TitanML command line interface, and can be installed as follows:
Then use the following command to log in:
You will then be prompted to open a confirmation link in your browser. Once you have entered the verification code displayed on your command line into the browser tab, a message indicating successful login should be printed to the command line. This means you're ready to start posting models and datasets!
If you're trying out TitanML for the first time, we recommend starting with a GLUE benchmark task; the datasets tend to be smaller, so you'll get your results more quickly.
Use the iris finetune
command to submit a model & dataset pair for fine-tuning. You can either pass a path to a local folder or a path to a HuggingFace repository as your model and dataset arguments.
In this example, we passed the HuggingFace paths to an ELECTRA Large RTE model and the RTE subset of the GLUE dataset. Since GLUE tasks are a type of sequence classification, we need to add a bit of extra information to the command. Namely, we specify which text fields within the dataset are to be classified (in the case of RTE, these are 'sentence1' and 'sentence2') and the number of possible labels they could be given ('entailment' and 'not_entailment' [or 0 and 1], i.e. 2 labels).
The complete iris finetune
command looks like this:
To optimize your finetuned model for inference, find it in the models pane. Then, copy the models uuid into the iris distil command:
Posting a distillation job sets in motion the process of distilling and compressing a large, pretrained and fine-tuned model into 3 smaller, inference-optimised models. The compressed models come in 'extra-small', 'small' and 'medium' sizes, and each size has its own cost and accuracy specifications.
Navigate to TitanHub at app.titanml.co and you'll see an experiment named my_test_rte on your homepage. You can follow the progress of your jobs on the Hub, but you'll need to wait for them to finish to see your results!
When your results are ready, TitanHub will show you all the information you need to choose the model which best suits your needs. On your homepage, any experiments which have finished running will have a green status symbol, like this:
When you select an experiment, you will see a graph on your homepage which plots performance vs. computational cost for each of the three Titan-compressed models (as well as the baseline). You can choose between F1 score, loss and accuracy to measure performance, and between model size and cost per million queries to measure cost:
Click any of the 3 TYTN data points to see more detailed information on each model size. You'll see a sidebar display which gives some information about the model you've trained, and a box you can use to interact with your model.
Once you have chosen a model, you have two download options. If you want to manage the inference process yourself, you can download the model in ONNX format. For the 'medium' size of the RTE model we just created, you would use:
You can also download the model as a Docker image, using a command equivalent to docker pull
:
You can find both of these commands in the 'model download' tab on the sidebar display (you won't see them unless you have clicked on a data point on the graph!)
Once you have pulled your compressed model as a Docker image, the simplest way to deploy it is with iris infer
. Try out the command below to see your model in action on a GLUE task:
You'll likely also want to run inference using the Triton Inference Server; you can read more about this here.
And you're done! Now you can fine-tune and deploy an inference-optimised TitanML model. The rest of this documentation covers the process in more detail.