AI that runs entirely on your device

Ternary (1.58-bit) language models compiled to WebAssembly. No server, no API key, nothing leaves your browser. Pick a model to download — your choice, on demand.

0 servers 100% offline after load CPU · SIMD, no GPU 1.58-bit ternary weights

Choose a model to download

🏆 DISTILLED STORY — BEST QUALITY

🏆

meeny v6 BPB 0.507

ternary · 40M · 16k vocab · ~34 MB · 4096 ctx

Best story model. 40M ternary distilled from a 300M fp teacher (30B tokens). BPB 0.507 — beats TinyStories-1M (0.706) by wide margin. Fully offline.

📖 STORY MODELS

📖

Storyteller 120M

TinyStories 120M · ternary · 8k vocab · ~47 MB

120M ternary model trained from scratch on TinyStories. Larger context, richer vocabulary. Fast, kid-friendly. Trained end-to-end, no distillation.

✨

meeny v2 ULTRALIGHT · BPB 0.519

ternary · 6.2M · 6k vocab · ~7 MB

6.2M params, just 7 MB. Ternary distilled, 6k vocab. BPB 0.519. Best size-to-quality ratio. Instant load.

🔬

1-bit TinyStories BINARY · BPB 0.575

1-bit · 7.7M · 6k vocab · ~7.5 MB

Every weight is a single bit {-1,+1}. Research demo: the floor of weight quantization, running in your browser.

💬 CHAT & INSTRUCTION

⚡

Qwen3 0.6B BEST CHAT

Qwen3-0.6B · 1.58-bit ternary (QAT) · ~550 MB

Strongest on-device chat — Qwen3-0.6B distilled to 1.58-bit ternary, runs fully offline. ChatML, instruction-following, reasoning. The flagship on-device assistant.

🦾

miny 360M

SmolLM2 360M · 1.58-bit ternary · instruction · ~210 MB

Full instruction-following chat — SmolLM2-360M distilled to 1.58-bit ternary. ChatML, follows complex prompts, summarizes, reasons. Lighter than Qwen3.

🧠

Assistant

Sprapp 0.5B · ternary · RAG + tools · ~120 MB

General chat with local search index (RAG, cites or abstains) and structured tool-calling. Grounded answers, works offline.

🗂 LEGACY

🚀

meeny v4

ternary · 40M · 16k vocab · BPB 0.685 · ~34 MB

Superseded by meeny v6. BPB 0.685, same architecture, weaker teacher (1B, fewer tokens).

Downloads once, cached in your browser (IndexedDB) — works offline afterward. · clear cache

model 📲 personalize

LoRA adapter 1 thread MTP speculative Constrained decode

Loading…

temp · tokens 500 · top-k 40 · CPU/SIMD · ⌘/Ctrl+Enter