# Benchmark experiments for knowledge distillation

{% hint style="danger" %}
These docs are outdated! Please check out <https://docs.titanml.co> for the latest information on the TitanML platform.\
\
If there's anything that's not covered there, please contact us on our [discord](https://discord.com/invite/83RmHTjZgf).
{% endhint %}

On this page you'll find a few examples of knowledge distillation experiments you can run with public HuggingFace models for different use-cases. For any of these experiments, you can substitute the model, dataset or both with a path to a suitable local folder if you want to try using your own models/datasets.

If you want to try running each experiment with a different model, we have a small selection of sample models for each task [here.](https://huggingface.co/TitanML)

### Question Answering on SQuAD v2

```bash
iris distil \
	--model TitanML/Electra-Large-SQUADV2 \
	--dataset squad_v2 \
	--task question_answering \
	--name my_test_squadv2 \
	--has-negative
	

```

Note that since this experiment uses SQuAD v2, using the flag `--has_negative` is not necessary. However, any other dataset containing questions which are not answerable from context must be passed to `iris distil` with the flag.

Remember you can always use the abbreviated `iris distil` arguments as listed [here](https://titanml.gitbook.io/iris-documentation/titan-optimise-knowledge-distillation/using-iris-distil/broken-reference); this goes for any task, and applies to both local and remote models/datasets. E.g.&#x20;

```bash
iris distil -m TitanML/Electra-Large-SQUADV2 -d squad_v2 -t question_answering -hn -n my_test_squad
```

### Sequence classification with GLUE MRPC

This is the same as the example we used [here](https://titanml.gitbook.io/iris-documentation/titan-optimise-knowledge-distillation/using-iris-distil).

```bash
iris distil \
	--model TitanML/Electra-Large-MRPC \
	--dataset glue \
	--task sequence_classification \
	--subset mrpc \
	-tf sentence1 \
	-tf sentence2 \
	-nl 2 \
	--name my_test_mrpc
	
```

Remember you can skip the subset argument if you're not using a dataset (like GLUE) with subsets!

### Token Classification with conll2003

conll2003 has 9 token labels as shown below; pass each one to `iris distil` in the form {index}:{label}.&#x20;

```bash
iris distil \
        --model TitanML/Electra-Large-CONLL2003 \
        --dataset conll2003 \
        --subset conll2003 \
        --task token_classification \
        -ln 0:O \
        -ln 1:B-PER -ln 2:I-PER \
        -ln 3:B-ORG -ln 4:I-ORG \
        -ln 5:B-LOC -ln 6:I-LOC \
        -ln 7:B-MISC -ln 8:I-MISC \
        --name my_test_conll
        
```
