Docs Get Started Rate Limits

Fair use. Scale when you need.

Rate limits keep the platform stable for everyone. Free tier starts generous — upgrade to Pro or Enterprise when your traffic grows. All limits are soft: we return headers so you can self-regulate.

3 min read v2.4.1 Updated May 14, 2026 Level Beginner

Rate limit tiers

Limits are enforced per API key and apply to all models uniformly. Token limits count both input and output tokens.

Plan

Requests / min

Tokens / min

Tokens / day

Free

100K

200K

Pro

1,000

10M

Unlimited

Enterprise

Custom

Unlimited

Streaming requests count differently A single streaming connection counts as one request regardless of how many chunks are delivered. This means streaming is often more rate-limit-friendly than polling.

Rate limit headers

Every response includes headers that tell you your current limit status:

Header

Example

Description

X-RateLimit-Limit

1000

Maximum requests allowed per minute for your plan.

X-RateLimit-Remaining

847

Requests remaining in the current window.

X-RateLimit-Reset

1715689200

Unix timestamp when the current rate-limit window resets.

Retry-After

Seconds to wait before retrying (only present on 429 responses).

Retry strategy

When you hit a 429, wait at least the number of seconds specified in Retry-After before trying again. For robust integrations, use exponential backoff with jitter:

import time, random

def backoff_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except bentoo.RateLimitError as e:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise

Burst handling

Bentoo AI uses a sliding window algorithm with a small burst bucket. You can briefly exceed your average RPM by up to 20% before hard throttling kicks in. This smooths out traffic spikes without dropping legitimate requests.

Do not ignore 429s Repeatedly retrying without backoff will get your IP temporarily blocked. Always honor Retry-After and implement exponential backoff.

Upgrading your limits

Hitting limits regularly? Upgrade in the dashboard:

Free → Pro ($49/month): 1,000 RPM, 10M TPM, unlimited daily tokens. Instant activation.
Pro → Enterprise (custom): Dedicated capacity, custom RPM/TPM, reserved inference clusters. Contact sales@bentoo.ai.

Limits take effect immediately after payment confirmation — no restart or key rotation required.

Previous Pricing Next

Welcome back

Reset password

Password reset!

Create your account

Account created!

Terms of Service

Fair use. Scale when you need.

Rate limit tiers

Rate limit headers

Retry strategy

Burst handling

Upgrading your limits