Default Limits
| Endpoint | Limit | Notes |
|---|---|---|
| All endpoints (global) | 100 req/min | Default for every route |
POST /v1/authorize | 10 req/min | Stricter — creates auth requests |
POST /v1/token | 20 req/min | Stricter — exchanges codes for tokens |
POST /v1/token/refresh | 20 req/min | Stricter — refreshes grant tokens |
GET /.well-known/jwks.json | Exempt | Public key distribution is never throttled |
Response Headers
Every response includes rate limit headers so your application can track its budget:| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp (seconds) when the window resets |
Retry-After | Seconds to wait before retrying (only on 429 responses) |
429 Error Response
When you exceed a rate limit, the API returns a429 Too Many Requests status with the following body:
Retry-After header tells you exactly how long to wait.
Reading Rate Limits from SDKs
All three SDKs automatically parse rate limit headers from every response. You can read them viaclient.lastRateLimit (TypeScript/Python) or client.LastRateLimit() (Go).
After a Successful Call
Handling 429 Errors
When a429 is returned, the error object includes rate limit info with the retryAfter value:
Retry Strategy
Use exponential backoff with jitter to avoid thundering-herd problems when multiple clients hit the limit simultaneously.Best Practices
The JWKS endpoint (
/.well-known/jwks.json) is exempt from rate limits. Prefer offline token verification over online POST /v1/tokens/verify calls to avoid hitting limits entirely.- Cache tokens — Grant tokens are valid JWTs. Store and reuse them until they expire instead of requesting new ones per operation.
- Use offline verification — Call
verifyGrantToken()with the JWKS URI to validate tokens locally. The JWKS endpoint is never rate-limited. - Use webhooks instead of polling — Subscribe to webhook events like
grant.createdandgrant.revokedrather than polling grant or audit endpoints. - Honor
Retry-After— When you receive a429, always use theRetry-Afterheader value as your minimum wait time. - Spread requests — If your system makes burst requests (e.g., batch token exchanges), add short delays between calls.
Self-Hosted Deployments
If you’re running the Grantex auth service yourself, rate limits are fully configurable. The global limit is set inapps/auth-service/src/server.ts and per-route limits are set in individual route files.
See the Self-Hosting guide for deployment instructions.