Skip to main content
Effective monitoring ensures your Grantex deployment is healthy, performant, and secure. This guide covers the native Prometheus metrics endpoint, Grafana dashboard templates, alert thresholds, and logging best practices.

Prometheus Metrics Endpoint

The auth service exposes a GET /metrics endpoint in Prometheus exposition format:
curl https://your-auth-service/metrics
This endpoint is unauthenticated (no API key required) and rate-limited to 10 requests/minute per IP.

Counters

MetricLabelsDescription
grantex_token_exchange_totalstatusToken exchange attempts
grantex_authorize_totalstatusAuthorization requests
grantex_grants_revoked_totalGrants revoked (including cascade)
grantex_webhook_deliveries_totalstatusWebhook delivery outcomes
grantex_anomalies_detected_totaltype, severityAnomalies detected

Histograms

MetricLabelsDescription
grantex_authorize_duration_secondsAuthorization request duration
grantex_token_exchange_duration_secondsToken exchange duration
grantex_http_request_duration_secondsmethod, route, status_codeHTTP request duration (all routes)

Gauges

MetricDescription
grantex_active_grantsCurrent active grants count
grantex_anomalies_unacknowledgedUnacknowledged anomalies

Environment Variables

VariableDefaultDescription
METRICS_ENABLEDtrueSet to false to disable metrics collection

Grafana Dashboards

Pre-built Grafana dashboards are available at deploy/grafana/:
DashboardDescription
overview-dashboard.jsonToken exchange rate, success rate gauge, latency p50/p99, grants revoked, active grants, webhook deliveries, anomalies, HTTP error rate
per-agent-dashboard.jsonPer-agent drill-down with a $agent_id template variable

Import Instructions

  1. In Grafana, go to Dashboards > Import
  2. Upload the JSON file or paste its contents
  3. Select your Prometheus data source when prompted (${DS_PROMETHEUS})
  4. Click Import

Health Check Endpoint

The auth service exposes a GET /health endpoint that returns the service status:
curl https://your-auth-service/health
{ "status": "ok" }
Use this endpoint for:
  • Load balancer health checks — poll /health every 10–30 seconds
  • Uptime monitoring — UptimeRobot, Pingdom, Cloud Monitoring
  • Kubernetes liveness probeslivenessProbe.httpGet.path: /health

Alerting Thresholds

Recommended thresholds for production alerting:
MetricWarningCriticalAction
Token exchange failure rate> 5%> 15%Check auth service logs
Token refresh failure rate> 5%> 15%Check for refresh token reuse or clock skew
Anomalies detected> 5/hour> 10/hourReview anomaly details
Webhook delivery success< 98%< 95%Verify endpoint availability
429 rate> 50/min> 200/minClient misconfiguration or abuse
Auth request latency (p99)> 500ms> 2sDatabase or Redis performance issue
Health check failures1 consecutive3 consecutiveService restart

Alertmanager Rules

groups:
  - name: grantex
    rules:
      - alert: HighTokenExchangeFailureRate
        expr: |
          sum(rate(grantex_token_exchange_total{status!="success"}[5m]))
          / sum(rate(grantex_token_exchange_total[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Token exchange failure rate > 5%"

      - alert: HighAuthLatency
        expr: |
          histogram_quantile(0.99, rate(grantex_authorize_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Authorization p99 latency > 2s"

      - alert: WebhookDeliveryFailure
        expr: |
          sum(rate(grantex_webhook_deliveries_total{status="failed"}[5m]))
          / sum(rate(grantex_webhook_deliveries_total[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Webhook delivery failure rate > 5%"

Logging

Structured Logging

The auth service uses Pino for JSON-structured logging:
{
  "level": "info",
  "msg": "grant.created",
  "timestamp": "2026-03-01T12:00:00.000Z",
  "grantId": "grnt_abc123",
  "agentId": "ag_def456",
  "principalId": "user_789",
  "scopes": ["calendar:read", "email:send"],
  "latencyMs": 45
}

What to Log

EventLog LevelKey Fields
Grant createdinfograntId, agentId, principalId, scopes
Grant revokedinfograntId, revokedBy, cascadeCount
Token exchangedinfograntId, agentId
Token refreshedinfograntId, agentId
Token verification failedwarnreason, tokenId
Auth request deniedwarnagentId, principalId, reason
Rate limit hitwarnip, endpoint, retryAfter
Anomaly detectedwarntype, severity, agentId
Webhook delivery failederrorwebhookId, url, statusCode, attempt
Database connection errorerrorerror, pool

Webhook-Based Monitoring

Subscribe to webhook events for real-time alerting without polling:
import { Grantex } from '@grantex/sdk';

const grantex = new Grantex({ apiKey: process.env.GRANTEX_API_KEY! });

await grantex.webhooks.create({
  url: 'https://your-app.com/webhooks/grantex-alerts',
  events: ['grant.revoked', 'token.issued'],
  secret: process.env.WEBHOOK_SECRET!,
});