Using iris upload

These docs are outdated! Please check out https://docs.titanml.co for the latest information on the TitanML platform. If there's anything that's not covered there, please contact us on our discord.

Remember to ensure you have the latest version of iris installed before running any command! You can check this by running pip install update titan-iris.

Although you can upload models and datasets on the fly by using their respective file paths with iris distil or iris finetune, it may be useful to have access to your models and datasets from other machines. iris upload allows models to be uploaded without immediately using them for a job. iris upload <filepath> <name> also returns an artefact ID, which is a canonical reference to the uploaded artefact (model or a dataset). You can pass this ID anytime as an argument to iris distil.

There are a few constraints on the types of models and datasets that can be passed to iris upload. These constraints also apply whenever you run an iris distil or iris finetune job with a local directory.

  • Model and dataset paths must be paths to folders.

  • The dataset folder must contain the following:

    • Dataset files which have already been split into training and validation sets, titled train.csv and val.csv respectively (only .csv files are accepted!)

    • A common scheme (i.e. the same column names and number of columns) shared by both the training and validation datasets.

    • For sequence classification jobs, a label column titled 'Label' which contains integer values.

  • The model folder must contain the following:

    • A pytorch.bin model file (of course!)

    • A tokeniser file named tokenizer_config.json.

  • Model weights must be stored using HuggingFace Safetensors. Either save your model from transformers using safe_serialization=trueor use iris makesafe to get a .safetensors file.

    • To run iris makesafe, simply pass the path to the folder containing the model, e.g. iris makesafe iris upload /Users/myaccount/All-code/Kaggle/Models/twitter_sc. This will add a safetensors file to your model directory.

Below are some examples:

Iris Upload & Distil with sentence classification

# uploading a dataset
iris upload /Users/myaccount/All-code/Kaggle/Datasets/clothing_data # path to a folder containing train.csv and val.csv files

# plugging in the dataset ID from iris upload and the name of a model from HF Hub
iris distil --model bert-base-uncased --dataset 38f8a758-8cb4-4029-bd30-f440f850d77e --task sequence_classification --name test-floatds --text-fields text --num-labels 1 -s
# 38f8a758-8cb4-4029-bd30-f440f850d77e is an example of a dataset ID generated by iris upload
# uploading a model and a dataset - N.B. the iris upload command must be used once for each model/dataset you want to upload
iris upload /Users/myaccount/All-code/Kaggle/Models/twitter_sc # path to a folder containing a model, including a tokenizer.json file
iris upload /Users/myaccount/All-code/Kaggle/Datasets/twitter_sc # path to a folder containing train.csv and val.csv files
# plugging in the model and dataset IDs output by iris upload
iris distil --model 9b25cca2-f547-46d6-ad35-6e06808383e0 --dataset 2cca2468-46b9-4eb9-afb4-836ff4fe03b6 --task sequence_classification --name test-model+ds --text-fields Sentence  --num-labels 4

Iris Upload & Distil with question-answering

# uploading a model and a dataset
iris upload /Users/myaccount/All code/Kaggle/custom_squad
iris upload /Users/myaccount/All code/Kaggle/custom_squad_dataset

# plugging in the model and dataset IDs output by iris upload
iris distil --model f29a298f-5f29-4769-a24e-881ad4ac81d4  --dataset 293ae8bc-88c8-443e-80cc-92f4ab96520d --task question_answering --name josh-test-qads -s -hn

Remember that you can choose to upload a model, a dataset or both from a local file.

Last updated