How Freeloader Routes Your Requests

When a request hits Freeloader's proxy, a lot happens in milliseconds before a single token is generated. Here's the full picture.

Step 1: Trust tier check

Every request is evaluated against its trust tier header (or the global default). If a prompt is tagged private, it will never leave your infrastructure — only local Ollama models qualify.

Step 2: Budget gate

If your monthly budget is $0 (the default), any provider that charges money is immediately excluded from the candidate list. No exceptions.

Step 3: Model selection

From the remaining candidates, we score models by: current rate limit headroom (prefer models with capacity), latency (prefer faster models for short prompts), and capability match (use context window and feature flags to filter).

Step 4: Fallback chain

If the primary model fails or rate-limits, Freeloader automatically retries with the next candidate — transparent to your application. You never see a 429.

Step 5: Response normalization

Different providers return subtly different response shapes. Freeloader normalizes everything to the OpenAI format before returning it to your app.