Build a local model routing gateway for cost-sensitive AI tools

8/15

DemandSome InterestBuildWeekend ProjectMarketCrowded

The Problem

Developers and indie hackers building AI tools face API costs as the top objection to adoption, often using local models like Qwen/Ollama for 80% of tasks and APIs only for complex reasoning[signal premise]. Existing gateways like LiteLLM and Portkey support cloud providers well but lack optimized routing to local models, leading to inefficient cost management. Indie founders spend on SaaS proxies at $0.50-$0.195 per $1 LLM spend but seek self-hosted alternatives to avoid vendor lock-in and overhead.

Real Demand Evidence

Found on hackernews ↗·1 month ago

Just hundreds of dollars at a minimum. Outsourcing our brains at a per token rate.

Core Insight

Self-hosted local model routing gateway that intelligently routes 80% of tasks to free local Ollama/Qwen instances and APIs only for complex reasoning, filling gaps in competitors' weak local integration, learned routing, and LLM-first design without hosted overhead.

Target Customer: Indie hackers/solo founders developing cost-sensitive AI tools (e.g., using Ollama/Qwen locally); market of ~1M+ indie hackers on platforms like Indie Hackers/Product Hunt, with AI devtools segment growing rapidly.
Revenue Model: Open source core (free self-host) + SaaS proxy tier at $0.25-$0.40 per $1 LLM spend (undercutting LiteLLM's $0.50 and Portkey's $0.195), plus premium self-host enterprise at $99/mo for advanced metrics/auditing

Competitive Landscape

LiteLLM

Open source (self-hosted free); Proxy SaaS starts at $0.50 per $1 of LLM spend

Direct

LiteLLM provides load balancing and fallback routing but lacks advanced learned or data-driven routing models optimized for local models like Ollama/Qwen, focusing more on broad provider coverage without deep local integration.

Portkey

Free tier up to $1k/month spend; Pro at $0.195 per $1 spend; Enterprise custom

Direct

Portkey excels in prompt/model-aware routing and observability for cloud LLMs but is application-focused with constraints at enterprise scale and limited support for routing to local/self-hosted models like Ollama.

OpenRouter

Pay-per-use: provider costs + 5% + $0.23/1M input tokens + $0.28/1M output tokens

Indirect

OpenRouter offers marketplace transparency and cache-aware routing for hosted providers but introduces network overhead as a pure gateway without native support for local model routing like Ollama/Qwen.

Kong AI Gateway

Open source free; Enterprise subscription starts at custom pricing (contact sales)

Adjacent

Kong provides enterprise-grade semantic routing and governance but treats AI as secondary to general API management, lacking LLM-first learned routing tailored for cost-sensitive local model prioritization.

Unify

Direct

Unify focuses on hosted data-driven routing by cost/speed/quality with benchmarks but does not emphasize self-hosting or seamless integration for local models, requiring hosted dependency.

Willingness to Pay

Portkey’s design is intentionally application‑focused, which introduces constraints at enterprise scale.
https://www.truefoundry.com/blog/a-definitive-guide-to-ai-gateways-in-2026-competitive-landscape-comparison
$0.195 per $1 of spend (Pro tier)
Unified Billing: Manage credits for multiple providers through a single Cloudflare account (closed beta).
https://www.getmaxim.ai/articles/top-5-ai-gateways-for-multi-model-routing/
Provider credits via unified billing
Teams needing broad provider coverage (LiteLLM usage context).
https://www.getmaxim.ai/articles/top-5-ai-gateways-for-multi-model-routing/
$0.50 per $1 of LLM spend

Get the best signals delivered to your inbox weekly

Every Monday we pick the top scored opportunities from 9 sources and send them straight to you. Free forever.

No spam. No credit card. Unsubscribe anytime.