Identity & Directory
Identity & Directory
Manages users and groups from all sources — one directory that every protocol shares for access decisions
Overview
Manages users, groups, and account lifecycle from all identity sources in one place. Replaces per-product user databases with a single directory that every gateway protocol shares — proxy, SSH bastion, RADIUS. Applies to every access decision: authentication, group-based authorization, session creation, and instant revocation.
Core capabilities:
- Distributed directory cache with O(1) indexed lookups
- Multi-provider identity aggregation (LDAP, SCIM 2.0, OIDC RP)
- Full and delta synchronization with configurable intervals
- Automatic credential revocation on user disable (OIDC tokens, sessions, bastion)
- Event-driven callbacks for user disable and user update notifications
- Nested group resolution with DAG traversal and cycle detection
- Multi-provider merge with priority-based conflict resolution
- Real-time webhook updates from SCIM providers (Okta, Azure AD, OneLogin)
- External IdP authentication via OIDC Relying Party (PKCE, DPoP, PAR)
The subsystem is organized into three layers:
Providers (data sources): - LDAP provider: LDAP client with connection pooling and bind auth - SCIM provider: SCIM 2.0 pull sync and webhook push sync - OIDC RP provider: OIDC Relying Party for external IdP SSO Directory (unified cache): - Cluster-wide cache with indexed queries and sync orchestration Consumers (downstream modules): - Authentication, authorization, proxy, bastion, firewallArchitecture
Data flow from providers to consumers:
LDAP Server -----> [ldap provider] ----+ | SCIM Provider ---> [scim provider] ----+--> [directory cache] --> cluster storage (Okta, Azure AD) (pull + webhook) | (indexes, TTL) | | OIDC IdP --------> [oidc rp] ----------+ | (Azure, Google) (SSO claims) v [consumer modules query directory] auth, proxy, firewall, bastionDirectory cache indexes (all O(1)):
email -> username, username -> groups, groupname -> members, disabled usersSynchronization modes:
LDAP full sync: rebuilds entire cache (default: every 60 minutes) LDAP delta sync: incremental via modifyTimestamp (default: every 5 minutes) SCIM pull sync: scheduled per-provider (default: every 15 minutes) SCIM push sync: real-time webhooks with HMAC-SHA256 verificationCluster behavior:
Each node maintains independent provider connections and sync loops. Directory data is replicated to all nodes (eventual consistency). Queries are local-only for low latency with no quorum requirements. OIDC auth sessions are replicated cluster-wide for cross-node callback handling.Relationships
Child modules:
- directory: Central cache and query API. All other modules consume directory for user lookups, group checks, and auth status verification.
- identity.ldap: Primary on-premise provider. Supplies users and groups via full/delta sync. Handles password bind authentication.
- identity.scim: Cloud identity provider. Syncs from Okta, Azure AD, OneLogin via pull and webhook push. Multi-provider merge by priority.
- identity.oidc_rp: External IdP authentication. Enables SSO via Authorization Code Flow with PKCE, DPoP, and PAR support.
Key consumers:
- Authentication modules: query directory for user existence, disabled status, password expiry, and group memberships.
- Proxy: fetches fresh group memberships on every request for authorization and identity header injection.
- Firewall: uses group memberships for ACL rule evaluation.
- Sessions: receives revocation calls when users are disabled.
- OIDC provider: receives token revocation on user disable.
- Bastion: sessions terminated on user disable.
Directory Cache
Fast user and group lookups for every request — synchronized from LDAP, replicated across all nodes
Overview
Provides fast, indexed user and group lookups for every authentication and authorization decision. Synchronizes from LDAP and replicates data across all cluster nodes so every request resolves identity locally. Applies to every access check — no remote LDAP query on the hot path.
Core capabilities:
- Periodic full and delta syncs from LDAP via the ldap provider module
- Cluster-wide data distribution replicated to all nodes
- Automatic index maintenance on data changes
- Fast O(1) query API for user, group, email, and membership lookups
- Paginated listing of users and groups with server-side offset/limit
- Comprehensive authentication status checks (exists, disabled, password expiry)
- Automatic credential revocation when users are disabled
- Callback registration for user disable and user update events
- Background sync loops running independently on all cluster nodes
Data flow:
LDAP Server -> [ldap module] -> [directory module] -> cluster-wide cache | [auth modules query directory]Indexes maintained automatically:
- Email to username (fast email lookup) - Username to groups (fast group membership lookup) - Group name to members (fast member listing) - Disabled users listEventual consistency model: each node syncs independently, data is replicated to all nodes. Queries run on the local node for low latency with no quorum requirements.
Config
Configuration is primarily driven by LDAP settings since directory syncs from LDAP. The directory module uses these config keys:
[identity.ldap] url = "ldaps://ldap.example.com" delta_sync = "5m" # Delta sync interval (default: 5 minutes) full_sync = "60m" # Full sync interval (default: 60 minutes) base_dn = "dc=example,dc=com" user_base_dn = "ou=users,dc=example,dc=com" group_base_dn = "ou=groups,dc=example,dc=com" bind_dn = "cn=service,dc=example,dc=com" password = "service-password" # ... additional LDAP config (see ldap module)Sync behavior:
Full sync: retrieves ALL users and groups from LDAP, rebuilds entire cache. Users are processed in small batches to prevent cluster overload on large directories. Indexes are built in bulk after all users are stored, avoiding per-user-per-group round trips that scale as O(users x groups). Runs on startup plus periodic interval. Delta sync: retrieves only MODIFIED users/groups since last sync timestamp, updates changed entries cluster-wide. Indexes updated incrementally. Default interval: 5 minutes.Data TTL: 24 hours by default. Entries evicted automatically if not refreshed by sync. This acts as a safety net for stale data.
Hot-reloadable: sync intervals, LDAP connection settings. Cold (restart required): none specific to directory module.
Troubleshooting
Common symptoms and diagnostic steps:
User not found in directory cache:
- Check LDAP connectivity: 'auth ldap' or LDAP health check - Verify user exists in LDAP: search by username in LDAP directly - Check sync status: 'directory status' to see last sync time and health - Force full sync: 'directory sync' to trigger immediate LDAP sync - Check delta_sync interval: user may not yet be synced if recently created - TTL expiry: if sync has not run in 24 hours, entries may have been evictedStale group memberships (user has old groups):
- Delta sync only picks up changes since last sync timestamp - Force full sync to rebuild entire cache from LDAP - Check modifyTimestamp field in LDAP (delta_field config) - Group membership changes in LDAP must update the user's modifyTimestamp - Verify index consistency: indexes rebuild automatically on data changesUser disabled but still has active sessions:
- Directory auto-revokes OIDC tokens and web sessions on disable - Revocations are asynchronous and replicated to all nodes (eventual) - SSH bastion sessions terminate on next token refresh cycle - Check if disable was detected: look for directory disable log entriesSync failures (directory shows unhealthy):
- LDAP connection failures: check network, TLS certs, bind credentials - Timeout errors: increase search_timeout in LDAP config - Large directory: increase page_size for paginated LDAP searches - Memory pressure: large user counts may stress the cluster cache - Check metrics: directory_sync_total{result="failure"} for error counts - Check logs: search for "directory sync" in telemetry outputCluster operation queue saturation during sync:
- Can occur during full sync on directories with many groups per user - Check 'directory status' -- if last sync timestamp matches the error, sync is the cause - The directory uses batched sync with bulk indexing to prevent this, but extremely large directories or low concurrency limits may trigger it - Mitigation: increase operations.max_concurrent_ops in cluster configIndex inconsistency (email lookup fails but user exists):
- Indexes are maintained automatically, should auto-repair on next sync - Force full sync to rebuild all indexes from scratch - Check if email field is populated in LDAP user entry - Verify ldap_attribute_map.email matches LDAP schemaMetrics for monitoring:
- directory_sync_total{type="full|delta", result="success|failure"} - directory_users_synced (gauge): users synchronized in last sync - directory_groups_synced (gauge): groups synchronized in last sync - directory_sync_duration{type="full|delta"} (histogram): sync timing - Use these to track sync health, LDAP connectivity, and capacity growthSecurity
Security properties and enforcement:
No password caching:
The directory module never stores passwords. Authentication always goes through LDAP bind (via the ldap provider module). The cache contains only identity attributes: username, email, groups, status flags.Automatic credential revocation on user disable:
When a user is marked as disabled (Disabled=true), the directory module automatically revokes all authentication credentials: 1. OIDC tokens: all access and refresh tokens deleted cluster-wide 2. Web sessions: all user sessions invalidated immediately 3. SSH bastion: sessions terminate on next token refresh All revocations are replicated asynchronously to all cluster nodes.Account status enforcement:
- Disabled accounts: checked by all auth flows before granting access - Password expiry: tracked and enforced, expiry time available in AuthStatus - Group membership: drives authorization decisions across all modulesCluster-wide consistency:
- Data replicated to all nodes automatically - Disabled user detection triggers revocation on all nodes - No single point of failure for authorization decisionsInput validation:
- Username lookups are case-sensitive (as stored in LDAP) - Group name matching is case-insensitive in consumer modules - Email index uses normalized form for consistent lookupsInterpreting tool output:
'directory status': Healthy: Status="Ready / Healthy", Sync Errors=0, Consecutive Errors=0 Degraded: Status shows errors, Consecutive Errors > 0 — LDAP may be unreachable Stale: Last Sync time is old (> 2x sync interval) — sync may be stuck Action: Degraded → 'auth ldap' to check LDAP connection health 'directory user <username>': Found: Shows username, email, groups, disabled status, password expiry Not found: User does not exist in directory cache — check LDAP source Disabled=true: User is locked — sessions will be revoked at next refresh Expired password: User will get password-expired session type on next login Action: User missing → 'directory sync' to force immediate LDAP syncRelationships
Module dependencies and interactions:
- LDAP provider: Primary data source. Directory calls LDAP for all sync operations (search users, search groups, full and delta sync).
- SCIM provider: Alternative data source. SCIM providers sync users/groups into the directory, bypassing LDAP for cloud-sourced identities (Okta, Azure AD, OneLogin).
- Cluster cache: All user/group data stored with 24h TTL and replicated to all cluster nodes. Indexes maintained automatically on data changes.
- Sessions: Session revocation on user disable. Directory triggers revocation of all web sessions cluster-wide.
- OIDC provider: Token revocation on user disable. Directory triggers deletion of all access and refresh tokens.
- Firewall: Consumes group membership for ACL rule matching. Groups fetched at peer chain update time via directory queries.
- Group changes trigger firewall and ACL updates automatically.
- Configuration: Hot-reloadable sync intervals and LDAP settings.
- Telemetry: Structured logging for sync operations, user disable events, and error conditions.
Logs
Log entries by component. Search with: logs search “directory” Levels: ERROR > WARN > INFO > DEBUG > TRACE.
Init (module startup):
directory.init INFO Directory service disabled - no LDAP configured directory.init INFO Waiting for LDAP connection pool to initialize directory.init INFO Initializing directory service directory.init INFO Cluster and memory storage ready, starting initial sync directory.init ERROR Initial sync failed directory.init INFO Directory service initializedCallback registration:
directory.callback INFO Registered user updated callback directory.callback INFO Registered user disabled callbackFull sync (periodic and on-demand):
directory.sync.full INFO Full sync loop started directory.sync.full ERROR Full sync failed directory.sync.full INFO Starting full sync from LDAP directory.sync.full ERROR Failed to call LDAP GetAllUsers directory.sync.full ERROR Failed to get users from LDAP directory.sync.full WARN (dynamic license enforcement message) directory.sync.full INFO Retrieved users from LDAP directory.sync.full INFO Retrieved groups from LDAP directory.sync.full INFO Syncing users and groups to cluster storage directory.sync.full WARN Failed to store user directory.sync.full WARN Failed to store group directory.sync.full INFO Full sync completedDelta sync (periodic incremental):
directory.sync.delta INFO Delta sync loop started directory.sync.delta ERROR Delta sync failed directory.sync.delta INFO Starting delta sync from LDAP directory.sync.delta WARN (dynamic license enforcement message) directory.sync.delta DEBUG Retrieved modified users from LDAP directory.sync.delta DEBUG Retrieved modified groups from LDAP directory.sync.delta DEBUG No changes detected directory.sync.delta INFO Syncing modified objects to cluster storage directory.sync.delta INFO Delta sync completedSingle-user sync (on-demand):
directory.syncuser ERROR Failed to call LDAP GetUser directory.syncuser ERROR Failed to get LDAP response directory.syncuser ERROR Invalid LDAP response type directory.syncuser DEBUG User not found in LDAP directory.syncuser ERROR Failed to broadcast cache update directory.syncuser WARN Cache update had errors directory.syncuser INFO User synced successfullyAdmin:
directory.admin INFO Manual full sync requestedIndex maintenance (OnUserSet / OnUserDelete callbacks):
directory.index WARN Failed to update email index directory.index WARN Failed to update user-groups index directory.index WARN Failed to update group-members index directory.index WARN Failed to update disabled index directory.index INFO User disabled, revoking OIDC tokens and sessions directory.index WARN Failed to initiate OIDC token revocation directory.index WARN Failed to initiate session revocation directory.index DEBUG Calling user disabled callback via hexdcall directory.index WARN Failed to call user disabled callback directory.index WARN Failed to call user updated callback directory.index WARN Failed to remove from email index directory.index WARN Failed to remove from user-groups index directory.index WARN Failed to remove from disabled indexBulk index (after full sync):
directory.index.bulk INFO Bulk indexes builtMetrics
Prometheus metrics. Query with: metrics prometheus directory_<name>
Sync counters:
directory_sync_total counter {type, result} Sync operations completed Labels: type="full"|"delta", result="success"Sync gauges:
directory_users_synced gauge {} Users synchronized in last full sync directory_groups_synced gauge {} Groups synchronized in last full syncSync latency:
directory_sync_duration histogram {type} Sync processing time Labels: type="full"|"delta"Alerts:
changes(directory_sync_total{result="success"}[10m]) == 0 No successful syncs (LDAP connectivity) directory_sync_duration{type="full"} > 60s Full sync taking too long changes(directory_users_synced[1h]) == 0 No syncs completingLDAP Provider
Connects to LDAP directories for user and group synchronization — Active Directory, FreeIPA, OpenLDAP
Overview
Connects to LDAP directories to search, bind, and synchronize user and group data. Provides the foundation for the directory cache — all LDAP queries flow through pooled, TLS-secured connections. Supports Active Directory, FreeIPA, and OpenLDAP with configurable search filters and delta sync.
Core capabilities:
- Pre-populated connection pool with configurable size
- TLS connections with optional custom CA certificate validation
- User search with custom LDAP filters and paginated result sets
- Group search with nested group resolution (recursive member expansion)
- User authentication via LDAP bind (no password caching)
- Delta sync support via modifyTimestamp queries for incremental updates
- Health checks with per-server latency reporting
- Configurable timeouts for search, bind, connection, and pool operations
- Smart startup retry logic with permanent vs transient error classification
- Hot-reloadable configuration for connection settings
Each cluster node maintains its own LDAP connection pool independently. The LDAP module does not replicate data or maintain caches — the directory module handles cluster-wide caching. No quorum requirements for LDAP operations.
Supported LDAP schemas:
- FreeIPA - Active Directory - OpenLDAP - Generic LDAP servers (via configurable attribute mapping)Config
Required configuration in config.toml:
[identity.ldap] url = "ldaps://ldap.example.com" # Primary LDAP server (ldaps:// for TLS) base_dn = "dc=example,dc=com" # Base DN for all searches user_base_dn = "ou=users,dc=example,dc=com" # Base DN for user searches group_base_dn = "ou=groups,dc=example,dc=com" # Base DN for group searches bind_dn = "cn=service,dc=example,dc=com" # Service account DN for binding password = "service-password" # Service account password user_attribute = "uid" # Primary user identifier attribute user_filter = "(&(objectClass=inetOrgPerson)(uid=*))" # User search filter delta_field = "modifyTimestamp" # Field for delta sync queries page_size = 1000 # LDAP paged search size ldap_connection_pool = 5 # Connection pool size ca_pem = """-----BEGIN CERTIFICATE-----...""" # CA cert for TLS validation # Timeout Configuration (Go duration strings) search_timeout = "30s" # Search operation timeout (default: 30s) bind_timeout = "10s" # Bind/auth operation timeout (default: 10s) connection_timeout = "10s" # New connection timeout (default: 10s) pool_wait_timeout = "5s" # Pool connection wait timeout (default: 5s) # Delta sync and full sync intervals (used by directory module) delta_sync = "5m" # Delta sync interval full_sync = "60m" # Full sync interval[identity.ldap_attribute_map] username = "uid" # Username attribute full_name = "cn" # Full name attribute email = "mail" # Email attribute given_name = "givenName" # First name attribute surname = "sn" # Last name attribute member_of = "memberOf" # Group membership attribute # Additional attributes configurable per LDAP schemaFallback URLs:
Multiple LDAP URLs can be configured. The connection pool tries the primary URL first and falls back to alternates on failure.Hot-reloadable: connection settings, timeouts, attribute mappings. New connections use current config automatically. Cold (restart required): none, but pool recreated on config change.
Troubleshooting
Common symptoms and diagnostic steps:
LDAP connection failures on startup:
- Permanent errors (fail fast, no retry): * LDAP Code 49: Invalid Credentials (wrong bind_dn or password) * LDAP Code 32: No Such Object (wrong base_dn) * LDAP Code 34: Invalid DN Syntax * Certificate validation failures (wrong or expired CA) - Transient errors (retry with exponential backoff up to 2 minutes): * Connection timeout, connection refused * DNS resolution failures, network errors - Check: 'auth ldap' for LDAP health statusConnection pool exhaustion:
- Symptom: pool_wait_timeout errors, slow LDAP operations - Check pool metrics: ldap_pool_available, ldap_pool_utilization_pct - Increase ldap_connection_pool size in config - Check for slow LDAP queries holding connections: ldap_operation_duration - Check for stale connections: ldap_pool_reconnects counterAuthentication failures (Bind operation):
- LDAP Code 49: Invalid credentials for user - Check ldap_bind_failures{reason="invalid_credentials"} metric - Verify user DN resolution: user_attribute + user_base_dn must match - Check if user is locked/disabled in LDAP (not same as bind failure)Search returning no results:
- Verify user_filter syntax: must be valid LDAP filter expression - Check user_base_dn: must be correct organizational unit - Verify page_size: too small may cause incomplete results - Check search_timeout: large directories may need longer timeout - Test with custom filter via SearchUsers operationDelta sync not detecting changes:
- Verify delta_field attribute exists in LDAP schema (e.g., modifyTimestamp) - Check timestamp format: must be generalizedTime - Some LDAP servers don't update modifyTimestamp on group membership changes - Force full sync via directory module if delta sync misses changesHealth check reporting unhealthy:
- HealthCheck probes each configured URL individually (primary + fallbacks) - Returns per-server latency and error details - At least one server must be reachable for Healthy=true - Check network connectivity, firewall rules, TLS certificatesKey metrics for monitoring:
- ldap_operations_total{operation, status}: operation success/failure rates - ldap_operation_duration{operation}: latency histograms - ldap_pool_utilization_pct: connection pool health - ldap_pool_errors{reason}: pool-level errors - ldap_bind_success / ldap_bind_failures: authentication ratesSecurity
Security properties and hardening:
Transport security:
All connections use TLS (ldaps://). The module supports custom CA certificate validation via the ca_pem config field. TLS is mandatory; plain LDAP (ldap://) connections are not supported in production.No password caching:
User authentication is performed via LDAP bind on every request. Passwords are never stored, cached, or logged by the module. The service account password is the only credential stored in config.LDAP injection prevention:
All user-supplied values in LDAP filters are properly escaped using the go-ldap library's escaping functions. This prevents filter injection attacks where crafted usernames could alter search semantics.DN validation:
Distinguished Names are validated to prevent directory traversal attacks where crafted DNs could access entries outside the configured base DN.Startup credential validation:
The module refuses to start if LDAP credentials are invalid (Code 49). This prevents running with misconfigured service accounts that would silently fail all authentication attempts.Connection pool security:
- Service account bind performed on every new connection - Stale connections detected and reconnected automatically - Pool connections are not shared between user bind operations - Each user bind gets a dedicated connection from the poolModule data:
Module data storage has been moved to Hexon KV (NATS JetStream). LDAP is no longer used as a moduledata storage backend.Relationships
Module dependencies and interactions:
- Directory: Primary consumer. Directory module calls LDAP for all sync operations (full and delta sync), user authentication via bind, individual user lookups, and readiness checks before starting syncs.
- Authentication modules: Use LDAP bind indirectly via the directory for authentication decisions. Some modules use bind directly for password verification.
- webauthn/totp: Module data storage is handled by the moduledata module (Hexon KV), not by LDAP.
- Configuration: Hot-reloadable. New connections automatically use current config settings. Attribute mappings and timeouts are reloadable.
- Telemetry: Structured logging for connection events, search operations, bind results, and error conditions.
Logs
Log entries by operation. Search with: logs search “ldap” Levels: ERROR > WARN > INFO > DEBUG. No AUDIT entries in this module.
Initialization:
ldap.init INFO LDAP provider disabled - no URL configured ldap.init INFO Initializing LDAP connection pool ldap.init ERROR Failed to initialize LDAP connection pool ldap.init INFO LDAP provider initialized and readyConnection Pool:
ldap.pool DEBUG Initializing connection pool ldap.pool DEBUG Creating connection N/M (per-connection progress) ldap.pool DEBUG Connection N/M created successfully ldap.pool INFO Connection pool initialized successfully ldap.pool WARN Transient error, retrying in Xs (attempt N) ldap.pool ERROR Permanent error during connection - refusing to start ldap.pool ERROR Exceeded max retry duration - refusing to startConnection Lifecycle:
ldap.conn DEBUG Using custom CA certificate / Using system CA certificates ldap.conn DEBUG Attempting to connect with HA failover ldap.conn DEBUG Dialing LDAP URL ldap.conn DEBUG Successfully connected ldap.conn DEBUG Binding with service account ldap.conn DEBUG Successfully bound with service account ldap.conn WARN LDAP server failed, trying next ldap.conn ERROR Failed to dial LDAP ldap.conn ERROR Failed to bindMetrics
Prometheus metrics. Query with: metrics prometheus ldap_<name>
Operations:
ldap_operations_total counter {operation, status} LDAP operation count operation=bind, status=success|invalid_credentials|error Bind outcomes operation=search_users, status=success|error, paged=true|false User search outcomes operation=search_groups, status=success|error, paged=true|false Group search outcomes ldap_operation_duration latency {operation, status} LDAP operation latency Same label sets as operations_totalBind:
ldap_bind_success counter {} Successful user binds ldap_bind_failures counter {reason} Failed user binds reason=invalid_credentials Wrong password reason=ldap_error Server/network errorConnection Pool:
ldap_pool_errors counter {reason} Pool-level errors reason=not_initialized Pool not ready reason=pool_closed Pool shut down reason=config_unavailable Config missing on reconnect reason=reconnect_failed Reconnect attempt failed reason=timeout Pool wait timeout ldap_pool_reconnects counter {reason} Pool reconnections reason=stale_connection Stale conn on acquire reason=stale_on_release Stale conn on release ldap_pool_acquire_duration latency {reconnected} Time to acquire connection reconnected=true|false ldap_pool_available gauge {} Available connections in pool ldap_pool_capacity gauge {} Total pool capacity ldap_pool_utilization_pct gauge {} Pool utilization percentageSearch Results:
ldap_search_results gauge {type} Result set size type=users|groups ldap_paged_search_pages gauge {type} Pages fetched in paged search type=users|groupsAlerts:
ldap_pool_utilization_pct > 90 Pool near exhaustion rate(ldap_pool_errors{reason="timeout"}[5m]) > 0 Pool starvation rate(ldap_bind_failures{reason="ldap_error"}[5m]) > 0 LDAP server issues rate(ldap_operations_total{status="error"}[5m]) > 5 Search failuresOIDC Relying Party
Federates authentication to external identity providers — Azure AD, Okta, Google, or any OIDC-compliant IdP
Overview
Authenticates users through external identity providers using OpenID Connect. Keeps existing IdPs (Azure AD, Okta, Google) as the authentication source while the gateway handles sessions and access policy. Applies when the gateway is configured to delegate primary authentication to an external provider.
Core capabilities:
- Authorization Code Flow with PKCE (RFC 7636, S256 method only)
- DPoP token binding (RFC 9449) for proof-of-possession
- Pushed Authorization Requests (RFC 9126) for secure parameter submission
- Token Introspection (RFC 7662) for active token validation
- Token Revocation (RFC 7009) for controlled token lifecycle
- ID token validation with signature, issuer, audience, nonce, and age checks
- UserInfo endpoint for fetching additional user claims
- Configurable claim mapping for provider-specific attribute names
- OIDC discovery with 24-hour caching and lazy initialization
- JWKS fetching with 1-hour caching and key rotation support
- AES-GCM encrypted state parameters with cluster-derived keys
- Multi-provider support with independent configuration per provider
- Session management (OIDC Session 1.0) with session ID tracking
Authorization flow:
1. Client calls Authorize -> module generates PKCE verifier, encrypts state 2. Module stores session in distributed cache (10 min TTL) 3. Module returns authorization URL for user redirect 4. User authenticates with external IdP 5. IdP redirects back with authorization code and state 6. Client calls Callback -> module validates state, exchanges code for tokens 7. Module validates ID token (signature, claims, nonce) 8. Module returns tokens and user identity claimsStartup behavior:
Module does NOT pre-fetch discovery metadata on startup to avoid blocking initialization if IdPs are temporarily unreachable. Discovery is fetched lazily on first use and cached for 24 hours.Config
Providers configured via TOML array:
[[identity.oidc_providers]] name = "azure" # Internal provider identifier display_name = "Microsoft Azure AD" # UI display name icon = "microsoft" # Icon identifier (optional) issuer = "https://login.microsoftonline.com/{tenant}/v2.0" # OIDC issuer URL client_id = "your-client-id" # OAuth 2.0 client ID client_secret = "your-client-secret" # Client secret (optional with PKCE) scopes = ["openid", "profile", "email", "groups"] # Must include "openid" redirect_uris = ["https://app.example.com/callback"] # Allowed redirect URIs pkce_required = true # Require PKCE (default: true) dpop_enabled = false # Enable DPoP token binding timeout = "30s" # HTTP timeout (default: 30s) discovery_ttl = "24h" # Discovery cache TTL (default: 24h) dev_mode = false # Relaxed validation (NEVER in production) clock_skew_tolerance = "2m" # Per-provider clock tolerance strict_key_expiry = false # Reject expired JWKS keys required_amr = ["mfa"] # Required auth methods (optional) suppress_error_details = false # Hide error details in responses [identity.oidc_providers.claim_mapping] preferred_username = "upn" # Map IdP claims to standard names groups = "groups" [identity.oidc_providers.extra_params] prompt = "select_account" # Additional authorize endpoint paramsMultiple providers can be configured simultaneously. Each provider has independent settings, discovery cache, and health status.
Cache TTLs (distributed across the cluster):
Discovery metadata: 24 hours JWKS keys: 1 hour Auth sessions: 10 minutes DPoP JTI replay prevention: 2 minutesHot-reloadable: provider settings, claim mappings, extra params, timeouts. Discovery and JWKS caches refresh on TTL expiry.
Troubleshooting
Common symptoms and diagnostic steps:
Authorization flow fails with “invalid_request”:
- Verify redirect_uri matches exactly what is registered with the IdP - Check scopes include "openid" (required for OIDC) - Verify client_id is correct for the target IdP - Check if provider requires client_secret (some do even with PKCE) - For PAR failures, module falls back to standard authorization URLCallback fails with “invalid_grant”:
- Authorization code may have expired (typically 5-10 minutes) - Code already exchanged (single-use enforcement by IdP) - PKCE code_verifier mismatch (state corruption or session expired) - Check auth session TTL: sessions expire after 10 minutesID token validation fails:
- Issuer mismatch: verify issuer config matches IdP exactly - Audience mismatch: client_id must be in token audience claim - Token expired: check clock sync between gateway and IdP - Nonce mismatch: state/session corruption during flow - Signature failure: JWKS cache may be stale, force refresh - Blocked algorithm: none, HS256/384/512 are rejected for security - Allowed algorithms: RS256-512, ES256-512, EdDSADiscovery fetch failures:
- Check network connectivity to IdP's .well-known endpoint - Verify issuer URL is correct (common mistake: wrong tenant ID) - TLS errors: gateway must trust IdP's certificate chain - Timeout: increase timeout setting for slow IdP responses - Check metrics: oidc_rp.discovery_fetch_failures_totalDPoP errors:
- "invalid_dpop_proof": proof JWT validation failed - "use_dpop_nonce": server requires DPoP nonce (retry with nonce) - JTI replay detected: same proof used twice (2-min dedup window) - Key thumbprint mismatch between proof and token bindingToken refresh fails:
- Refresh token expired or revoked by IdP - Scope validation: requested scopes must be subset of original grant - DPoP-bound tokens require DPoP proof on refresh - Check oidc_rp.token_refresh_failures_total{reason} for detailsProvider health check shows unhealthy:
- Discovery endpoint unreachable (network, DNS, TLS issues) - JWKS not cached (no tokens validated yet, lazy fetch) - Check per-provider health via 'auth oidc' admin commandKey metrics for monitoring:
- oidc_rp.authorization_initiated_total: flow start rate - oidc_rp.token_exchange_success_total / failures_total: conversion rate - oidc_rp.state_validation_failures_total: security events - oidc_rp.discovery_fetch_failures_total: IdP connectivity - oidc_rp.token_exchange_duration: IdP latencySecurity
Security measures and hardening:
PKCE (RFC 7636):
64-byte code_verifier (512 bits entropy) with S256 method only. Plain method is blocked. Code verifier stored server-side only, never exposed to the browser. Prevents authorization code interception attacks.State protection:
AES-GCM encryption with cluster-derived key and domain separation. Single-use enforcement (deleted after validation). 10-minute TTL prevents replay attacks on stale authorization sessions.ID token validation (defense-in-depth):
- Signature verified against JWKS from IdP - Issuer must match configured issuer exactly - Audience must contain the configured client_id - Expiration checked with configurable clock skew tolerance - Nonce validated to prevent replay attacks - Token age validation (max 10 minutes from issuance) - at_hash validation linking ID token to access token (OIDC Core 3.1.3.6)Algorithm restrictions:
Blocked: none, HS256, HS384, HS512 (symmetric algorithms) Allowed: RS256-512, ES256-512, EdDSA (asymmetric only) RSA key size: 2048-8192 bits only RSA exponent: only standard values (3, 17, 65537) These restrictions prevent algorithm confusion and weak key attacks.DPoP (RFC 9449):
Proof-of-possession for token binding. JTI replay prevention with 2-minute distributed cache. JWK thumbprint computation per RFC 7638. Confirmation claim (cnf) validation required when DPoP is enabled.Pushed Authorization Requests (RFC 9126):
Authorization parameters sent directly to IdP (not in browser URL). Prevents parameter exposure in browser history and URL bar. Larger parameter payloads possible without URL length limits. Falls back to standard URL if PAR endpoint unavailable.Error disclosure control:
Configurable suppress_error_details for production environments. Sensitive information masked in API responses. Full details logged server-side for debugging. Token values and secrets never logged.Connection security:
TLS 1.2+ required for all IdP connections. Max 3 redirects with HTTPS required on redirect. Response size limits prevent memory exhaustion (Discovery: 1MB, JWKS: 512KB, Token/UserInfo: 256KB).Relationships
Module dependencies and interactions:
- Sign-in flow engine: Primary consumer. The sign-in flow engine uses the OIDC RP module to initiate authorization flows and process callbacks for SSO-based authentication.
- proxy auth: Reverse proxy mappings can use OIDC providers for per-application SSO authentication via unified cookie solution.
- Distributed memory cache: Distributed cache for auth sessions (10 min TTL), discovery metadata (24h), JWKS (1h), and DPoP JTI replay prevention (2 min). Enables cross-node callback handling when user returns from IdP to a different cluster node.
- Directory: After OIDC authentication, user identity is matched against directory for group memberships and authorization. OIDC claims (email, groups) may be used for just-in-time provisioning.
- Configuration: Hot-reloadable provider configuration. Provider settings, claim mappings, and timeouts are reloadable.
- Cluster: Read operations run locally for low latency. Session storage and DPoP JTI tracking are replicated to all nodes so callbacks work regardless of which node the user returns to.
- Telemetry: Structured logging for authorization flows, token operations, and security events. Metrics exported for all major operations with provider-level labels.
Logs
Log entries by component. Search with: logs search “identity.oidc” Levels: ERROR > WARN > INFO > DEBUG > TRACE.
Init (module startup):
identity.oidc.init DEBUG No OIDC providers configured, module inactive identity.oidc.init ERROR AUDIT Invalid OIDC provider configuration identity.oidc.init INFO OIDC RP module initializedAuthorize (build authorization URL):
identity.oidc.authorize DEBUG Building authorization URL identity.oidc.authorization INFO AUDIT Authorization URL built identity.oidc.authorization WARN Failed to delete auth sessionCallback (authorization code callback):
identity.oidc.callback WARN AUDIT IdP returned error identity.oidc.callback WARN AUDIT State validation failed identity.oidc.callback DEBUG Processing authorization callbackDiscovery (OIDC discovery metadata):
identity.oidc.discovery DEBUG Fetching discovery metadata identity.oidc.discovery WARN Dev mode enabled - endpoint validation relaxed identity.oidc.discovery INFO AUDIT Discovery metadata fetched and validated identity.oidc.discovery WARN Failed to cache discovery metadata identity.oidc.discovery WARN Invalid cached metadata typeJWKS (JSON Web Key Set):
identity.oidc.jwks DEBUG Fetching JWKS identity.oidc.jwks INFO AUDIT JWKS fetched identity.oidc.jwks WARN Failed to cache JWKSToken (exchange, refresh, revocation, introspection):
identity.oidc.token DEBUG Exchanging code for tokens identity.oidc.token INFO AUDIT Token exchange successful identity.oidc.token DEBUG Refreshing access token identity.oidc.token INFO AUDIT Token refresh successful identity.oidc.token WARN AUDIT Provider does not support token revocation identity.oidc.token INFO AUDIT Token revocation acknowledged identity.oidc.token DEBUG Token introspection completedValidate ID Token:
identity.oidc.validate_id_token DEBUG Validating ID token identity.oidc.validate_id_token WARN AUDIT ID token validation failedUserInfo (fetch user claims):
identity.oidc.userinfo DEBUG Fetching user info from external IdP identity.oidc.userinfo DEBUG Fetching user info identity.oidc.userinfo INFO AUDIT User info fetchedDPoP (Demonstration of Proof-of-Possession):
identity.oidc.dpop WARN AUDIT Failed to check/store DPoP JTI identity.oidc.dpop WARN DPoP JTI SetNX wait failed identity.oidc.dpop ERROR Unexpected SetNX response typePAR (Pushed Authorization Requests):
identity.oidc.par DEBUG Pushing authorization request to IdP identity.oidc.par WARN AUDIT PAR endpoint returned error with unparseable body identity.oidc.par WARN Non-standard request_uri format from IdP identity.oidc.par WARN PAR expires_in missing, using default identity.oidc.par WARN PAR expires_in outside RFC 9126 recommended range identity.oidc.par INFO AUDIT PAR request successful identity.oidc.par WARN Discovery failed, falling back to standard authorization identity.oidc.par DEBUG PAR not supported, using standard authorization identity.oidc.par WARN PAR request failed, falling back to standard authorization identity.oidc.par INFO Authorization URL built with PARProvider info:
identity.oidc.get_provider DEBUG Fetching provider metadata identity.oidc.list_providers DEBUG Listed OIDC providersHealth:
identity.oidc.health_check DEBUG Health check completedRefresh (entry point):
identity.oidc.refresh DEBUG Refreshing access tokenRevoke (entry point):
identity.oidc.revoke DEBUG Revoking token with external IdPIntrospect (entry point):
identity.oidc.introspect DEBUG Introspecting token with external IdPMetrics
Prometheus metrics. Query with: metrics prometheus oidc_rp_<name>
Authorization flow:
oidc_rp_authorization_initiated_total counter {provider} Authorization URL built successfully oidc_rp_state_validation_success_total counter {provider} State validated and session consumed oidc_rp_state_validation_failures_total counter {reason} State validation failures (reason: decryption_failed, version_mismatch, state_expired, session_not_found, state_mismatch, csrf_mismatch)Token exchange:
oidc_rp_token_exchange_success_total counter {provider} Code-for-tokens exchange succeeded oidc_rp_token_exchange_failures_total counter {provider, reason} Exchange failures (reason: network_error, id_token_invalid, at_hash_mismatch, or IdP error code) oidc_rp_token_exchange_duration latency {provider} Token endpoint round-trip timeToken refresh:
oidc_rp_token_refresh_success_total counter {provider} Refresh token exchange succeeded oidc_rp_token_refresh_failures_total counter {provider, reason} Refresh failures (reason: network_error, or IdP error code) oidc_rp_token_refresh_duration latency {provider} Token refresh round-trip timeToken revocation:
oidc_rp_token_revocation_success_total counter {provider} Revocation acknowledged by IdP oidc_rp_token_revocation_failures_total counter {provider, reason} Revocation failures (reason: network_error, or IdP error code)Token introspection:
oidc_rp_token_introspection_active_total counter {provider} Introspection returned active=true oidc_rp_token_introspection_inactive_total counter {provider} Introspection returned active=false oidc_rp_token_introspection_failures_total counter {provider, reason} Introspection failures (reason: network_error, or IdP error code) oidc_rp_token_introspection_duration latency {provider} Introspection round-trip timeID token validation:
oidc_rp_id_token_validation_success_total counter {provider} ID token signature and claims validatedDiscovery:
oidc_rp_discovery_cache_hits_total counter {provider} Discovery served from cache oidc_rp_discovery_cache_misses_total counter {provider} Discovery cache miss, fetched from IdP oidc_rp_discovery_fetch_success_total counter {provider} Discovery fetched and validated oidc_rp_discovery_fetch_failures_total counter {provider} Discovery fetch failed oidc_rp_discovery_fetch_duration latency {provider} Discovery endpoint round-trip timeJWKS:
oidc_rp_jwks_cache_hits_total counter {provider} JWKS key found in cache oidc_rp_jwks_fetch_success_total counter {provider} JWKS fetched from IdP oidc_rp_jwks_fetch_failures_total counter {provider} JWKS fetch failed oidc_rp_jwks_fetch_duration latency {provider} JWKS endpoint round-trip timeDPoP:
oidc_rp_dpop_jti_replay_total counter {provider} DPoP JTI replay attack detected oidc_rp_dpop_validation_success_total counter {provider} DPoP proof validated successfullyPAR (Pushed Authorization Requests):
oidc_rp_par_success_total counter {provider} PAR request accepted by IdP oidc_rp_par_failures_total counter {provider, reason} PAR failures (reason: discovery_failed, not_supported, invalid_redirect_uri, network_error, http_error, invalid_expires_in, expires_in_too_large, or IdP error code) oidc_rp_par_request_duration latency {provider} PAR endpoint round-trip time oidc_rp_par_authorization_success_total counter {provider} Authorization URL built via PAR flowUserInfo:
oidc_rp_userinfo_success_total counter {provider} UserInfo fetched successfully oidc_rp_userinfo_failures_total counter {provider, reason} UserInfo failures (reason: network_error, http_<status>) oidc_rp_userinfo_fetch_duration latency {provider} UserInfo endpoint round-trip timeAlerts:
rate(oidc_rp_state_validation_failures_total{reason="csrf_mismatch"}[5m]) > 0 CSRF attack attempt rate(oidc_rp_state_validation_failures_total{reason="decryption_failed"}[5m]) > 5 State tampering or key mismatch rate(oidc_rp_dpop_jti_replay_total[5m]) > 0 DPoP replay attack attempt rate(oidc_rp_discovery_fetch_failures_total[5m]) > 3 IdP discovery unreachable rate(oidc_rp_token_exchange_failures_total[5m]) > 10 Elevated token exchange failuresSCIM Identity Provider
Syncs users and groups from Okta, Azure AD, and other SCIM 2.0 providers with lifecycle management
Overview
Synchronizes users and groups from external SCIM 2.0 providers into the gateway’s directory. Supports Okta, Azure AD, OneLogin, JumpCloud, and any SCIM 2.0 compliant source. Handles full lifecycle: provisioning, updates, deprovisioning, and multi-provider merge with conflict resolution.
Core capabilities:
- Pull sync: scheduled full sync at configurable intervals with delta computation for minimal directory writes, per-provider sync workers
- Push sync (webhooks): real-time updates via HMAC-SHA256 signed events with atomic deduplication, fail-closed for destructive operations
- Multi-provider merge: priority-based attribute conflict resolution when multiple SCIM providers are configured (lower priority number wins)
- Nested group resolution: DAG traversal with cycle detection, configurable direction (up/down/both), and max depth limits
- Deletion safety: per-sync thresholds, cumulative daily limits, zero-user protection, two-step delete (disable then remove)
- Circuit breaker: consecutive failure detection, exponential backoff with automatic recovery on success
- Authentication: OAuth2 client_credentials, Bearer token, HTTP Basic
- SCIM path expressions: simple (userName), nested (name.givenName), array filter (emails[primary eq true].value)
Cluster behavior:
Read operations (status, health, user/group queries) run on the local node. Write operations (sync, directory updates) are replicated to all nodes for cluster-wide consistency. Each node maintains independent SCIM clients and sync loops.Config
Providers configured via TOML array:
[[identity.scim_providers]] name = "okta" # Internal identifier for the provider enabled = true # Whether provider is active (default: true) priority = 1 # Merge priority, lower = higher (default: 10) base_url = "https://example.okta.com/scim/v2" # SCIM 2.0 base URL auth_type = "oauth2" # Authentication: "oauth2", "bearer", "basic" oauth2_token_url = "https://example.okta.com/oauth2/v1/token" oauth2_client_id = "client_id" # OAuth2 client ID oauth2_client_secret = "secret" # OAuth2 client secret oauth2_scopes = ["scim"] # OAuth2 scopes to request bearer_token = "" # Static bearer token (auth_type = "bearer") basic_username = "" # HTTP Basic username (auth_type = "basic") basic_password = "" # HTTP Basic password (auth_type = "basic") sync_interval = "15m" # Background sync interval (default: "15m") sync_timeout = "3m" # Per-sync timeout (default: "3m") max_nesting_depth = 5 # Maximum group nesting depth (default: 5) nested_groups = false # Enable nested group resolution (default: false) nested_groups_direction = "up" # Resolution direction: "up", "down", "both" webhook_secret = "min-32-byte-secret" # HMAC-SHA256 secret (minimum 32 bytes) [identity.scim_providers.attribute_map] username = "userName" email = "emails[primary eq true].value" full_name = "displayName" given_name = "name.givenName" surname = "name.familyName" groups = "groups[].display"Multiple providers with merge:
Provider okta (priority: 1) and azure (priority: 2) both have user alice. If both have different emails, okta's email wins (lower priority number). Group memberships are merged as union across all providers.Webhook endpoint:
POST /webhook/scim/{provider} Signature headers (checked in order): X-Webhook-Signature: sha256=<hex-hmac> X-Hub-Signature-256: sha256=<hex-hmac> X-Signature-256: sha256=<hex-hmac> Max payload size: 256KB Supported events: user.created, user.updated, user.deleted, user.disabled, group.created, group.updated, group.deletedHot-reloadable: provider settings, attribute maps, sync intervals, timeouts,
webhook secrets, nested group settings.Cold (restart required): none; providers fully reinitialize on config change.
Troubleshooting
Common symptoms and diagnostic steps:
Sync not running or providers not initializing:
- Check provider enabled=true in config - Verify base_url is reachable from the gateway node - Check auth credentials: OAuth2 token URL, client_id/secret, bearer token - Check: 'scim status' for provider initialization and sync status - Check: 'scim health' for per-provider connectivitySync completing but no users/groups appearing:
- Verify attribute_map matches provider's SCIM schema - Check SCIM path expressions match the provider's data format - Trigger manual sync: 'scim sync' to test - Check: 'directory users' and 'directory groups' for cached dataCircuit breaker open (sync suspended):
- The circuit opens after 10 consecutive sync failures - Backoff starts at 30 seconds, doubles each time up to 30 minutes max - Check: 'scim health' for circuit breaker state - Fix underlying issue and wait for auto-recovery, or 'scim sync'Webhook events not being processed:
- Verify webhook_secret is configured (minimum 32 bytes) - Check HMAC signature format from provider - Payload must be valid JSON, max 256KB - Deduplication: same event ID processed only once (1 hour window) - Check: 'logs search webhook' for rejection reasonsWebhook deletions being blocked:
- Per-sync threshold: defaults to max 10% of users or 50 absolute per cycle - Cumulative daily limit: defaults to 200 deletions in a rolling 24-hour window - Zero-user protection: deletions blocked when current user count is zero - Timestamp freshness: destructive events require timestamp within 5 minutesMulti-provider merge conflicts:
- Lower priority number wins for conflicting attributes - Group memberships are always union (no conflict) - Check: 'logs search "merge conflict"' for conflict detailsNested group resolution issues:
- Verify nested_groups=true and correct direction - Check max_nesting_depth (default 5): deep hierarchies may be truncated - Circular references: detected and logged, cycles broken at detection pointSecurity
Security properties and hardening:
Webhook verification (HMAC-SHA256):
Constant-time signature comparison prevents timing attacks. Signature verified before any payload parsing. Webhook secret must be minimum 32 bytes. Webhooks rejected if no secret configured for the provider.Fail-closed destructive operations:
When deduplication fails due to cache errors, delete and disable events are blocked rather than allowed through. This prevents accidental mass deletion if the distributed cache is temporarily unavailable.Deletion safety (defense-in-depth):
Per-sync thresholds limit the percentage and absolute count of deletions per cycle. A cumulative daily limit caps total deletions in a rolling 24-hour window. Zero-user protection blocks deletions when current count is zero. Two-step delete: disable first (triggers session revocation), then remove.Timestamp freshness:
Destructive webhook events require a timestamp within 5 minutes. Stale destructive events are rejected to prevent replay.Deduplication:
Atomic single-use enforcement with 1-hour TTL. Each webhook event ID consumed exactly once across the cluster. Prevents replay attacks.Input validation:
UTF-8 correctness, control character rejection, length limits (256 chars max for usernames). Case-insensitive identity matching.Connection security:
TLS 1.2+ required for all provider connections. Per-provider HTTP client with connection pooling. Configurable timeout (default 30s per request).Relationships
Module dependencies and interactions:
- Directory: Primary consumer of synced data. SCIM writes users and groups to the directory with cluster-wide replication for consistency.
- Sessions: Receives cascading callbacks on user deprovisioning. When a user is disabled or deleted, active sessions are revoked immediately.
- OIDC provider: Receives cascading callbacks on user removal for token revocation and session cleanup.
- Configuration: Hot-reloadable provider settings, attribute maps, sync intervals, timeouts, webhook secrets, and nested group settings.
- Admin CLI: ‘scim status’, ‘scim health’, ‘scim sync’ commands for diagnostics and manual sync triggering.
Logs
Log entries by component. Search with: logs search “scim” Levels: ERROR > WARN > INFO > DEBUG > TRACE. AUDIT = persisted to audit trail.
Init (module startup):
identity.scim.init INFO SCIM provider disabled - no providers configured identity.scim.init INFO Initializing SCIM provider identity.scim.init ERROR Failed to initialize SCIM provider identity.scim.init INFO SCIM provider initialized identity.scim.init INFO SCIM identity provider readyHexdcall operations:
identity.scim.sync_all DEBUG Starting sync identity.scim.sync INFO Sync completed identity.scim.get_sync_status DEBUG Getting sync status identity.scim.get_all_users DEBUG Getting all users identity.scim.get_all_users ERROR Failed to list users identity.scim.get_all_users INFO Retrieved users identity.scim.get_all_groups DEBUG Getting all groups identity.scim.get_all_groups ERROR Failed to list groups identity.scim.get_all_groups INFO Retrieved groups identity.scim.get_user DEBUG Getting user identity.scim.get_group DEBUG Getting group identity.scim.health_check DEBUG Checking health identity.scim.process_webhook DEBUG Processing webhookSCIM client (HTTP communication):
scim.client.list DEBUG Starting paginated user list scim.client.list DEBUG Starting paginated group list scim.client.list WARN Pagination safety limit reached scim.client.list INFO Completed paginated user list scim.client.list INFO Completed paginated group list scim.client.retry WARN Retrying request scim.client.oauth2 DEBUG Refreshing OAuth2 token scim.client.oauth2 INFO OAuth2 token refreshedSync orchestrator:
identity.scim.sync INFO Starting full sync identity.scim.sync INFO AUDIT Full sync completed identity.scim.sync INFO Starting incremental sync identity.scim.sync INFO Incremental sync completedBackground sync manager:
identity.scim.sync INFO Starting background sync manager identity.scim.sync ERROR Initial sync failed for provider identity.scim.sync INFO Initial sync completed identity.scim.sync.delta INFO Delta sync loop started identity.scim.sync.delta INFO Delta sync loop stopping identity.scim.sync.delta INFO No previous sync time, falling back to full sync identity.scim.sync.delta ERROR Delta sync failed identity.scim.sync.delta INFO Delta sync completed identity.scim.sync.full INFO Full sync loop started identity.scim.sync.full INFO Full sync loop stopping identity.scim.sync.full ERROR Full sync failed identity.scim.sync.full INFO Full sync completed identity.scim.sync.full ERROR Cumulative 24h deletion threshold exceeded identity.scim.sync.full WARN Per-sync deletion threshold exceeded identity.scim.sync.full WARN Cannot get client for current state - treating as initial syncCircuit breaker:
identity.scim.sync ERROR Circuit breaker opened - provider disabled after consecutive failures identity.scim.sync INFO Circuit breaker closed - provider recovered identity.scim.sync INFO Circuit breaker manually reset identity.scim.sync.delta WARN Skipping delta sync - circuit open identity.scim.sync.full WARN Skipping full sync - circuit openDeprovisioning:
identity.scim.deprovisioning ERROR AUDIT Deletion threshold exceeded - blocking hard deletions identity.scim.deprovisioning ERROR Deletion requested with zero current users - blocking identity.scim.deprovisioning WARN AUDIT Disabling user identity.scim.deprovisioning WARN AUDIT Deleting user identity.scim.deprovisioning WARN Deleting groupNested group resolution:
identity.scim.nested WARN Max groups per user reached, truncating identity.scim.nested WARN Max nesting depth reached identity.scim.flatten WARN Max nesting depth reached during flatteningMulti-provider merge:
identity.scim.merge WARN Skipping user with invalid username identity.scim.merge INFO Merge completed with conflicts identity.scim.merge WARN Skipping group with invalid name identity.scim.merge WARN Group membership truncatedWebhook processing:
identity.scim.webhook ERROR Webhook rejected: no webhook_secret configured for provider identity.scim.webhook WARN Webhook payload exceeds size limit identity.scim.webhook WARN Webhook signature verification failed identity.scim.webhook WARN Failed to parse webhook payload identity.scim.webhook INFO Processing webhook event identity.scim.webhook ERROR Webhook event processing had errors identity.scim.webhook INFO Webhook event processed successfully identity.scim.webhook WARN Destructive webhook event missing timestamp identity.scim.webhook WARN Webhook timestamp outside freshness window identity.scim.webhook ERROR Deduplication check failed for destructive event, rejecting identity.scim.webhook WARN Deduplication check failed, proceeding for non-destructive event identity.scim.webhook INFO Duplicate webhook event, skipping identity.scim.webhook WARN Cannot deduplicate destructive event (missing event_id/resource_id), rejecting identity.scim.webhook ERROR Webhook deletion blocked: 24h cumulative threshold exceededMetrics
Prometheus metrics. Query with: metrics prometheus identity_scim_<name> or scim_client_<name>
SCIM client counters (module: scim_client):
scim_client_list_failures_total counter {provider, endpoint, reason} Paginated list failures scim_client_request_errors_total counter {provider, reason} HTTP request errors (network/timeout) scim_client_requests_total counter {provider, status} HTTP requests by status code scim_client_oauth2_failures_total counter {provider, reason} OAuth2 token refresh failures scim_client_oauth2_success_total counter {provider} OAuth2 token refresh successesSCIM client latency (module: scim_client):
scim_client_request_duration histogram {provider, method} Per-request HTTP latency scim_client_list_duration histogram {provider, endpoint} Full paginated list latencySCIM client gauges (module: scim_client):
scim_client_list_results gauge {provider, endpoint} Resources returned from last listSync counters (module: identity.scim):
identity_scim_sync_started counter {provider, sync_type} Sync operations started identity_scim_sync_completed counter {provider, sync_type, status} Sync operations completed identity_scim_sync_failed counter {provider, sync_type, status} Sync operations failed identity_scim_delta_fallback_to_full counter {provider} Delta syncs that fell back to full identity_scim_circuit_opened counter {provider} Circuit breaker open events identity_scim_circuit_closed counter {provider} Circuit breaker close events identity_scim_deletions_blocked counter {provider, reason} Deletion operations blocked by safety thresholdsSync gauges (module: identity.scim):
identity_scim_users_synced gauge {provider} Users from last sync identity_scim_groups_synced gauge {provider} Groups from last syncSync latency (module: identity.scim):
identity_scim_sync_duration histogram {provider, sync_type, status} Sync processing timeDirectory apply counters (module: identity.scim):
identity_scim_users_created counter {provider, source} Users created in directory identity_scim_users_updated counter {provider, source} Users updated in directory identity_scim_users_disabled counter {provider, source} Users disabled in directory identity_scim_users_deleted counter {provider, source} Users deleted from directory identity_scim_groups_created counter {provider, source} Groups created in directory identity_scim_groups_updated counter {provider, source} Groups updated in directory identity_scim_groups_deleted counter {provider, source} Groups deleted from directory identity_scim_sync_errors counter {provider, source} Per-operation sync errorsWebhook counters (module: identity.scim):
identity_scim_webhook_total counter {provider, result} Webhook events by result Labels: result="success"|"unknown_provider"|"provider_disabled"|"no_secret_configured"| "empty_payload"|"payload_too_large"|"missing_signature"|"invalid_signature"| "parse_error"|"unknown_event_type"|"missing_timestamp"|"stale_event"| "duplicate"|"dedup_failed_closed"|"dedup_impossible"|"deletion_budget_exceeded"| "apply_error"Alerts:
changes(identity_scim_sync_completed{status="success"}[30m]) == 0 No successful syncs identity_scim_circuit_opened > 0 Circuit breaker tripped rate(identity_scim_webhook_total{result="invalid_signature"}[5m]) > 0 Webhook signature failures identity_scim_sync_duration > 120s Sync taking too long