Using iris distil

These docs are outdated! Please check out https://docs.titanml.co for the latest information on the TitanML platform. If there's anything that's not covered there, please contact us on our discord.

Short cut! If you'd rather use the GUI than the command line you can find the command builder on the web app at app.titanml.co

Remember to ensure you have the latest version of iris installed before running any command! You can check this by running pip install update titan-iris.

The iris distil command launches jobs using your chosen model/dataset pair to the backend. You can use it to upload a model and/or dataset from either the Hugging Face Hub or a local folder. The TitanML Olympus backend will then compress the model, and you can use the TitanHub web interface or iris get to access the results.

The rest of this section provides more details on how to upload online/local models and how to use the TitanML Store, but for now, we'll look at a basic example of an iris distil command:

iris distil \
	--task sequence_classification \
	--dataset glue \
	--subset mrpc \
	--model TitanML/Roberta-Large-MRPC \
	--text-fields sentence1 \
	--text-fields sentence2 \
	--num-labels 2 \
	--name my_test_mrpc

You can also abbreviate the command arguments as follows:

iris distil -t sequence_classification -d glue -ss mrpc -m TitanML/Roberta-Large-MRPC -tf sentence1 -tf sentence2 -nl 2 -n test_experiment

In this example command, we request the TitanML platform to distil the RoBERTa Large model based on the Microsoft Research Paraphrase Corpus subset of the GLUE dataset. We pass the name of the dataset on the Hugging Face Hub as dataset, and the name of the model on the Hugging Face Hub as model. Since this is a sequence classification task, we also need to specify which columns in the dataset are to be classified; in this case, the columns with the headers 'sentence1' and 'sentence2' are to be classified. Finally, we add how many possible labels each sequence can be classified under. In an MRPC task like our example, there are two labels - 'equivalent' and 'not equivalent' - so we specify this in our command.

Want to run an experiment with a local dataset and/or model? Simply substitute the Hugging Face name(s) in the command with the local path to the directory containing the dataset and/or model:

iris distil \
	--task sequence_classification \
	--text-fields your_first_column \
	--text-fields your_second_column \
	--num-labels 2 \
	--dataset ./users/demo/custom_fake_dataset \
	--model ./users/finetunedmodels/alrightmodel1 \
	--name my_test_with_local

Make sure you read the page on iris upload before using any local models or datasets, to ensure that your folder contains the necessary items.

iris distil will privately upload your local dataset and model to the TitanML Store, as well as launching the experiment. If you resubmit the same job, the platform will use the server-cached versions of your dataset and model from the TitanML Store, as opposed to retriggering a download. The cache is sensitive to changes in model weights, so should you change your model (e.g. train it for another epoch), any new jobs will upload and use the updated model instead.

Any uploaded datasets and models can only be accessed from your account. If you want to delete a dataset or model from the TitanML Store, you can do so directly from the TitanHub webapp.

Once you have successfully passed iris distil to the command line with the required arguments, you should see a 'success' message.

You have now launched a job! When you log into the TitanHub (more on this here), you will be able to see the compressed models TitanML has generated from your request under the โ€˜Modelsโ€™ tab.

Now letโ€™s take a closer look at the arguments of the iris distil command.

Last updated