# Benchmark experiments for finetuning

{% hint style="danger" %}
These docs are outdated! Please check out <https://docs.titanml.co> for the latest information on the TitanML platform.\
\
If there's anything that's not covered there, please contact us on our [discord](https://discord.com/invite/83RmHTjZgf).
{% endhint %}

On this page you'll find a few examples of knowledge distillation experiments you can run with public HuggingFace models for different use-cases. For any of these experiments, you can substitute the model, dataset or both with a path to a suitable local folder if you want to try using your own models/datasets.

If you want to try running each experiment with a different model, we have a small selection of sample models for each task [here.](https://huggingface.co/TitanML)

### Question Answering on SQuAD v2

```bash
iris finetune \
	--model bert-base-uncased \
	--dataset squad_v2 \
	--task question_answering \
	--name my_test_squadv2 \
	--has-negative
```

Note that since this experiment uses SQuAD v2, using the flag `--has_negative` is not necessary. However, any other dataset containing questions which are not answerable from context must be passed to `iris distil` with the flag.

Remember you can always use the abbreviated `iris finetune` arguments as listed [here](broken://pages/IyzPmNQzHNsJcaa8jZYS); this goes for any task, and applies to both local and remote models/datasets. E.g.&#x20;

```bash
iris finetune -m TitanML/Electra-Large-SQUADV2 -d squad_v2 -t question_answering -hn -n my_test_squad
```

### Sequence classification with GLUE MRPC

This is the same as the example we used [here](/iris-documentation/titan-optimise-knowledge-distillation/using-iris-distil.md).

```bash
iris finetune \
	--model bert-base-uncased \
	--dataset glue \
	--task sequence_classification \
	--subset mrpc \
	-tf sentence1 \
	-tf sentence2 \
	-nl 2 \
	--name my_test_mrpc
	
```

Remember you can skip the subset argument if you're not using a dataset (like GLUE) with subsets!

### Token Classification with conll2003

conll2003 has 9 token labels as shown below; pass each one to `iris distil` in the form {index}:{label}.&#x20;

```bash
iris finetune \
        --model bert-base-uncased \
        --dataset conll2003 \
        --subset conll2003 \
        --task token_classification \
        -ln 0:O \
        -ln 1:B-PER -ln 2:I-PER \
        -ln 3:B-ORG -ln 4:I-ORG \
        -ln 5:B-LOC -ln 6:I-LOC \
        -ln 7:B-MISC -ln 8:I-MISC \
        --labels-column ner_tags \
        --name my_test_conll
        
```

### Language Modelling with tiny\_shakespeare

tiny\_shakespeare is a dataset consisting of the works of shakespeare (see [here](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) for more information). To train a large language model to produce text in the style of shakespeare, try the following:

```bash
iris finetune \
        --model facebook/opt-125m \
        --dataset tiny_shakespeare \
        --task language_modelling \
        --name shakespeare
        
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://titanml.gitbook.io/iris-documentation/titan-train-finetuning-service/using-iris-finetune/benchmark-experiments-for-finetuning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
