Build a local model routing gateway for cost-sensitive AI tools
The Problem
Developers and indie hackers building AI tools face API costs as the top objection to adoption, often using local models like Qwen/Ollama for 80% of tasks and APIs only for complex reasoning[signal premise]. Existing gateways like LiteLLM and Portkey support cloud providers well but lack optimized routing to local models, leading to inefficient cost management. Indie founders spend on SaaS proxies at $0.50-$0.195 per $1 LLM spend but seek self-hosted alternatives to avoid vendor lock-in and overhead.
Real Demand Evidence
Found on hackernews ↗·1 month ago
Just hundreds of dollars at a minimum. Outsourcing our brains at a per token rate.
Core Insight
Self-hosted local model routing gateway that intelligently routes 80% of tasks to free local Ollama/Qwen instances and APIs only for complex reasoning, filling gaps in competitors' weak local integration, learned routing, and LLM-first design without hosted overhead.
- Target Customer
- Indie hackers/solo founders developing cost-sensitive AI tools (e.g., using Ollama/Qwen locally); market of ~1M+ indie hackers on platforms like Indie Hackers/Product Hunt, with AI devtools segment growing rapidly.
- Revenue Model
- Open source core (free self-host) + SaaS proxy tier at $0.25-$0.40 per $1 LLM spend (undercutting LiteLLM's $0.50 and Portkey's $0.195), plus premium self-host enterprise at $99/mo for advanced metrics/auditing
Competitive Landscape
Open source (self-hosted free); Proxy SaaS starts at $0.50 per $1 of LLM spend
LiteLLM provides load balancing and fallback routing but lacks advanced learned or data-driven routing models optimized for local models like Ollama/Qwen, focusing more on broad provider coverage without deep local integration.
Free tier up to $1k/month spend; Pro at $0.195 per $1 spend; Enterprise custom
Portkey excels in prompt/model-aware routing and observability for cloud LLMs but is application-focused with constraints at enterprise scale and limited support for routing to local/self-hosted models like Ollama.
Pay-per-use: provider costs + 5% + $0.23/1M input tokens + $0.28/1M output tokens
OpenRouter offers marketplace transparency and cache-aware routing for hosted providers but introduces network overhead as a pure gateway without native support for local model routing like Ollama/Qwen.
Open source free; Enterprise subscription starts at custom pricing (contact sales)
Kong provides enterprise-grade semantic routing and governance but treats AI as secondary to general API management, lacking LLM-first learned routing tailored for cost-sensitive local model prioritization.
Unify focuses on hosted data-driven routing by cost/speed/quality with benchmarks but does not emphasize self-hosting or seamless integration for local models, requiring hosted dependency.
Willingness to Pay
- $0.195 per $1 of spend (Pro tier)
Portkey’s design is intentionally application‑focused, which introduces constraints at enterprise scale.
https://www.truefoundry.com/blog/a-definitive-guide-to-ai-gateways-in-2026-competitive-landscape-comparison
- Provider credits via unified billing
Unified Billing: Manage credits for multiple providers through a single Cloudflare account (closed beta).
https://www.getmaxim.ai/articles/top-5-ai-gateways-for-multi-model-routing/
- $0.50 per $1 of LLM spend
Teams needing broad provider coverage (LiteLLM usage context).
https://www.getmaxim.ai/articles/top-5-ai-gateways-for-multi-model-routing/
Get the best signals delivered to your inbox weekly
Every Monday we pick the top scored opportunities from 9 sources and send them straight to you. Free forever.
No spam. No credit card. Unsubscribe anytime.