How Freeloader Routes Your Requests
When a request hits Freeloader's proxy, a lot happens in milliseconds before a single token is generated. Here's the full picture.
Step 1: Trust tier check
Every request is evaluated against its trust tier header (or the global default). If a prompt is tagged private, it will never leave your infrastructure — only local Ollama models qualify.
Step 2: Budget gate
If your monthly budget is $0 (the default), any provider that charges money is immediately excluded from the candidate list. No exceptions.
Step 3: Model selection
From the remaining candidates, we score models by: current rate limit headroom (prefer models with capacity), latency (prefer faster models for short prompts), and capability match (use context window and feature flags to filter).
Step 4: Fallback chain
If the primary model fails or rate-limits, Freeloader automatically retries with the next candidate — transparent to your application. You never see a 429.
Step 5: Response normalization
Different providers return subtly different response shapes. Freeloader normalizes everything to the OpenAI format before returning it to your app.