2019 vs 2026

same prompt, both models, side by side — running on a single DGX Spark in my office
nanochat-d12 — 2019-era
286M params · trained in ~8 hours on DGX Spark · GPT-2-grade capability
Qwen 3.6 — 2026 modest-size open-weight model
36B MoE (3B active) · Q4 · 128K context · runs locally · ~7 years after GPT-2