# A closer look at iris distil arguments

{% hint style="danger" %}
These docs are outdated! Please check out <https://docs.titanml.co> for the latest information on the TitanML platform.\
\
If there's anything that's not covered there, please contact us on our [discord](https://discord.com/invite/83RmHTjZgf).
{% endhint %}

In the previous section, we saw that there are several arguments which specify the job you are submitting to iris, including your chosen model/dataset pair and the task you want to carry out.

## Required arguments:

All iris jobs require the following (syntax-agnostic!) arguments :

1. \--**model (-m): str**

   The model which you wish to fine-tune/optimise, in one of [three](https://titanml.gitbook.io/iris-documentation/getting-started/iris-commands#iris-post) forms.
2. \--**dataset (-d): str**

   The dataset which you wish to use for fine-tuning and optimisation, in one of [three](https://titanml.gitbook.io/iris-documentation/getting-started/iris-commands#iris-post) forms.
3. \--**task (-t): str**

   Which task to compress the model with consideration of. Currently supported tasks are `sequence_classification` , `question_answering` , `token_classification.`See [here](https://titanml.gitbook.io/iris-documentation/titan-optimise-knowledge-distillation/using-iris-distil/broken-reference) for more information on tasks.
4. \--**name -(n): str**

   The name of the experiment.

   This name will become the title of the experiment on the ‘Dashboard’ and ‘Models’ tabs in the TitanHub, prefixed by an iris-assigned ID. For example, the example job we posted earlier would be titled 183-test\_experiment under ‘Models.’

## Optional arguments:

The following additional named arguments are optional. Some of them only apply to particular tasks or datasets.

### Dataset specification

These arguments can be used to specify how your dataset should be ingested.

* **--subset (-ss): str**\
  Required if and only if your dataset has a subset.

  Also known as a config on HuggingFace. For example, if `glue` is the dataset, `mrpc`, `mnli` and `rte` are possible subsets.
* **--train-split-name (-tsn): str** \
  The name of the subsplit containing the training data. The default is 'train'. For example, if you wanted to train on the [emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset, you could provide the `split` subsplit to train on, with `-tsn split`
* **--val-split-name (-vsn): str** \
  The name of the subsplit containing the validation data. The default is 'validation'. For example, if you wanted to train on the [tiny\_shakespeare](https://huggingface.co/datasets/tiny_shakespeare), you could provide the validation subsplit to train on, with `-vsn validation`, or you could validate on the test set with `-vsn test`.

### Task specific arguments

These arguments are used for specific tasks.

* **--num-labels (-nl): int.**

  Required if and only if`task=sequence_classification`

  This indicates the number of classification labels used. For regression tasks, this should be 1.
* **--label-names (-ln): int:str.**

  Required if `task=token_classification`, optional if `task=sequence_classification`

  This indicates the labelled classes into which your tokens/sequences are to be classified. Specify as a mapping with no spaces: `-ln 0:label1 -ln 1:label2` and so on.
* **--text-fields (-tf): str**&#x20;

  Required if `task=sequence_classification`or if `task=language_modelling`\
  This indicates which columns in the dataset is the 'input' column. For sequence classification, multiple text fields can be tokenized together: for example, [GLUE/RTE](https://huggingface.co/datasets/glue/viewer/rte/test),  `-tf hypothesis -tf context`. For language modelling, indicates the column that contains the text to be modeled. For example, for [tiny\_shakespeare](https://huggingface.co/datasets/tiny_shakespeare), use, `-tf text`
* **--has-negative (-hn): bool**

  Required if and only if `task=question_answering`

  Iris assumes by default that the dataset you use for question-answering only contains questions which are answerable from context. If this is not the case, you will need to use the   `--has_negative` flag to indicate t your metrics are split along questions that have an answer and those that do not. **If you are using SQuAD-v2, iris will automatically affix the flag.**

### Miscellaneous arguments

* **--short-run (-s)** \
  This flag indicates that training is only to be run for a couple of batches (so results will of course be much less accurate if an experiment is run with this flag). Use it if you need a quick way to check that the end-to-end pipeline is working.&#x20;
* -**-file (-f): filepath** \
  If you specify your experiment parameters (i.e. all of the iris post commands required for your experiment) in a .yaml file and pass the path into iris post, Iris can read the file and set up an experiment with your desired parameters. You may want to use it for tasks such as token classification which take a lot of arguments. Using filepath obviates the need for other arguments.
* **--json (-j)**\
  Whether to output json from iris's commands. Default is false.&#x20;

Next, we'll look at some examples of how to use `iris distil` for different use-cases.
