Name: GPT-2 in Rust Implementation
Author: Muhammad Fiaz

🚀 Quick Start

Prerequisites: Rust and Cargo installed (Edition 2024 or latest). A CUDA-capable GPU is recommended, or use --device cpu.

1

Clone the repo

git clone https://github.com/muhammad-fiaz/gpt-2-rust.git
cd gpt-2-rust

2

Download model weights

cargo run --release -- --download --size small --weights-dir weights

Options: small (117M), medium (345M), large (762M), xl (1.5B), or all.

3

Generate text

cargo run --release -- --generate \
  --model weights/small/model.safetensors \
  --prompt "The future of artificial intelligence is" \
  --size small \
  --max-new-tokens 100 \
  --temperature 0.8

✨ Features

🧠 Custom Causal Self-Attention 💡 Token & Positional Embeddings 🏗️ Pre-norm Block Architecture ⚡ GELU-activated MLP 🔗 LM Head with Weight Tying 📜 Pure-Rust BPE Tokenizer 📡 Real-Time Word Streaming 🖥️ GPU Weight Offloading ⚡ CUDA / WGPU / CPU Backends 📦 Single CLI Binary

📐 Model Sizes

Variant	Params	Layers	Heads	Embed dim
`small`	117 M	12	12	768
`medium`	345 M	24	16	1024
`large`	762 M	36	20	1280
`xl`	1.5 B	48	25	1600

💻 CLI Reference

Single binary, four modes:

`--download` — Fetch weights

cargo run --release -- --download --size small --weights-dir weights

`--generate` — Text generation

cargo run --release -- --generate \
  --model weights/small/model.safetensors \
  --prompt "Your prompt" \
  --size small \
  --max-new-tokens 100 \
  --temperature 0.8 \
  --top-k 50 \
  --top-p 0.9 \
  --device cuda

`--evaluate` — Perplexity

cargo run --release -- --evaluate \
  --model weights/small/model.safetensors \
  --format safetensors \
  --data data/input.txt \
  --seq-len 128 \
  --batch-size 4

`--train` — Pre-train / fine-tune

cargo run --release -- --train \
  --data data/input.txt \
  --artifact-dir artifacts/ \
  --size small \
  --epochs 3 \
  --batch-size 4 \
  --seq-len 128 \
  --lr 3e-4

🔧 Tech Stack

Burn v0.21 CUDA / WGPU / NdArray tiktoken-rs v0.5 safetensors v0.4 memmap2 v0.9

📚 Citations

Attention Is All You Need

arXiv · HuggingFace

@inproceedings{vaswani2017attention, title = {Attention is all you need}, author = {Vaswani, Ashish and others}, booktitle = {NeurIPS}, pages = {5998--6008}, year = {2017} }

GPT-2

@article{radford2019language, title = {Language models are unsupervised multitask learners}, author = {Radford, Alec and others}, journal = {OpenAI blog}, year = {2019} }

Burn Framework

@misc{burn2024, title = {Burn: A Flexible and Modern Deep Learning Framework in Rust}, url = {https://burn.dev/}, year = {2024} }

GPT-2 Rust

@misc{gpt2rust2026, author = {Muhammad Fiaz}, title = {GPT-2 Rust: Native Rust GPT-2 with Burn}, howpublished = {https://github.com/muhammad-fiaz/gpt-2-rust}, year = {2026} }