Guide to TitanML... | TitanML Documentation

What does it do?

Quickly experiment with inferencing different LLMs
Create inference servers that are local and private (think HF Inference Servers but local)

Supported models: Most OS Generative model architectures

What does it do?

Supported models: Both generative and non-generative language models

What does it do?

Compression of Natural Language Understanding tasks
Helps when latency, memory, or cost is a severe bottleneck
Uses the latest compression techniques like pruning & knowledge distillation for non-generative tasks

Supported models: