Fair use. Scale when you need.
Rate limits keep the platform stable for everyone. Free tier starts generous — upgrade to Pro or Enterprise when your traffic grows. All limits are soft: we return headers so you can self-regulate.
Rate limit tiers
Limits are enforced per API key and apply to all models uniformly. Token limits count both input and output tokens.
Rate limit headers
Every response includes headers that tell you your current limit status:
Retry strategy
When you hit a 429, wait at least the number of seconds specified in Retry-After before trying again. For robust integrations, use exponential backoff with jitter:
import time, random def backoff_retry(fn, max_retries=5): for attempt in range(max_retries): try: return fn() except bentoo.RateLimitError as e: wait = (2 ** attempt) + random.uniform(0, 1) time.sleep(wait) raise
Burst handling
Bentoo AI uses a sliding window algorithm with a small burst bucket. You can briefly exceed your average RPM by up to 20% before hard throttling kicks in. This smooths out traffic spikes without dropping legitimate requests.
Retry-After and implement exponential backoff.
Upgrading your limits
Hitting limits regularly? Upgrade in the dashboard:
- Free → Pro ($49/month): 1,000 RPM, 10M TPM, unlimited daily tokens. Instant activation.
- Pro → Enterprise (custom): Dedicated capacity, custom RPM/TPM, reserved inference clusters. Contact sales@bentoo.ai.
Limits take effect immediately after payment confirmation — no restart or key rotation required.