TL;DR: George Hotz's tiny corp is now shipping tinybox — a $12,000 computer that runs 120B parameter models offline. No API costs. No rate limits. No data leaving your machine. For AI builders hitting OpenAI's bills, this changes the math.
The Hardware Play
While everyone argues about which cloud API is cheapest, George Hotz did something else: he built the machine.
The tinybox red ($12,000) ships with 4x AMD 9070XT GPUs, 778 TFLOPS of FP16 compute, and 64GB of GPU RAM. The green version ($65,000) upgrades to RTX PRO 6000 Blackwell with 3,086 TFLOPS and 384GB GPU RAM.
Both run Ubuntu 24.04. Both ship within a week. Both eliminate your monthly API bill forever.
Why This Matters for AI Builders
I spend roughly $300-500/month on API calls running AI Insider. That's just for content generation and research — not training, not fine-tuning, not anything heavy.
At $12,000 upfront, the tinybox pays for itself in 24-40 months of avoided API costs. But that's not the real value.
The real value is what you CAN'T do with API access:
1. No rate limits. Run 1,000 parallel inference calls. No 429 errors. No exponential backoff. No "please try again later."
2. No data leaving your machine. Process customer data, medical records, legal documents — anything that your lawyers would never approve sending to OpenAI.
3. Run any model. Llama 3.3 405B? Fine-tuned variants? That weird research model from a paper? If it runs on PyTorch, it runs on tinybox.
4. No dependency. When Anthropic has an outage, your agent stops. When your tinybox has an outage, you fix it yourself.
The MLPerf Reality Check
Tiny corp didn't just ship hardware — they proved it. The tinybox was benchmarked in MLPerf Training 4.0 against machines costing 10x more.
This matters because MLPerf is the industry standard. It's not tiny corp marketing — it's third-party verification that the performance claims are real.
For reference: MLPerf measures actual training throughput on standard models (ResNet, BERT, etc.), not synthetic benchmarks. When tiny corp says "best performance per dollar," they have receipts.
The Strategic Question
Here's what I'm thinking about:
If you're building an AI product that relies on API calls, you're building on rented land. Every inference costs money. Every price increase from your provider hits your margins. Every rate limit shapes your product decisions.
With owned hardware:
- Your inference cost is electricity (~$0.001 per call instead of $0.01-0.10)
- Your throughput is unlimited
- Your latency is local network, not internet round-trip
- Your model choice is unconstrained
The tradeoff is obvious: upfront capital, maintenance responsibility, and no automatic model updates.
But for production workloads running thousands of calls per day? The math is increasingly clear.
What I'd Actually Do
If I were scaling AI Insider to a serious operation:
1. Calculate your monthly API spend — be honest about all the calls
2. Project 24-month total cost — multiply by 24, add 20% for growth
3. Compare to tinybox — $12K + electricity + your time
If tinybox wins, you also gain optionality: fine-tuning, privacy, experimentation.
For most individual builders, API access is still the right answer — the $12K is real money, and cloud providers handle the ops.
But if you're hitting $500+/month consistently? Start doing the math.
The Exabox Tease
Tiny corp also announced the exabox — coming 2027, approximately $10 million, delivering ~1 exaflop of compute.
That's 720x RDNA5 GPUs, 25,920 GB of GPU RAM, and 1.2 PB/s of memory bandwidth. It's a datacenter in a shipping container.
If tinybox commoditizes the petaflop, exabox commoditizes the exaflop. The goal is clear: make serious AI compute available outside of hyperscaler lock-in.
Links
- [tinygrad.org](https://tinygrad.org/) — tinybox specs and ordering
- [MLPerf Training results](https://mlcommons.org/benchmarks/training/) — third-party benchmarks
- [HN discussion (414 points)](https://news.ycombinator.com/item?id=47470773) — community reactions
*Have you done the API vs. hardware math for your workload? I'm curious what the breakeven looks like for different use cases.*