Connectivity
Client Access (HexonClient)
Transparent L3 network access via QUIC tunnels for CLI tools and native applications
Overview
The Client Access subsystem enables end users (DBAs, developers, operators) to transparently access internal resources through a lightweight QUIC tunnel. The HexonClient binary captures IP packets via TUN + gVisor netstack, extracts TCP flows, and dials each flow as a QUIC stream to the gateway.
The gateway side (this module) handles:
- QUIC listener on a dedicated port with ALPN “hexon-client” and TLS 1.3
- Two authentication paths: server-side device code (RFC 8628) for interactive use, JWT with RFC 5705 channel binding for reconnect/automation
- Per-user route derivation from firewall ACL rules (CIDR + Site routes)
- Virtual IP allocation from a dedicated subnet (default 100.64.208.0/22)
- Per-stream firewall ACL check before dialing backends
- Direct dial or connector tunnel routing based on HostAlias Site field
- Bidirectional splice with 32KB pooled buffers and half-close propagation
- DNS resolution on the control stream for split DNS
- DNS defense-in-depth: per-session O(1) rate limiting + ACL enforcement (RFC 8914)
- Token refresh with group-change detection and mid-session route updates
- Cluster-wide session tracking
This mirrors the connector architecture but reversed: the client opens streams, the gateway accepts and dials backends.
Configuration
Configuration uses the [client_access] TOML section:
[client_access] enabled = true port = 8445 # network_interface = "" # Bind to specific interface (falls back to service.network_interface) # cert = "" # Dedicated TLS cert (falls back to SNI/auto-TLS) # key = "" # Dedicated TLS key subnet = "100.64.208.0/22" # Virtual IP pool for clients (1022 addresses) gateway_ip = "100.64.208.1" # Gateway IP within subnet (excluded from pool) dns_upstream = ["10.0.0.53"] # DNS resolvers for client queries dns_domains = [] # Additional DNS domains pushed to all clients # cidrs = ["10.0.0.0/22"] # Additional CIDR routes pushed to all clients heartbeat_interval = "30s" # Heartbeat frequency (session TTL = 3x this) token_refresh_interval = "45m" # Client token refresh interval max_idle_timeout = "5m" # QUIC idle timeout max_clients = 1000 # Maximum concurrent client connections max_streams_per_client = 100 # Maximum concurrent TCP streams per client dns_rate_limit = 100 # Maximum DNS queries per second per client # required_groups = ["engineers", "operators"] # Empty = any authenticated userEach connected client gets one virtual IP from the pool — use a dedicated CGN-space subnet to avoid overlap with other networks.
Routes pushed to clients come from two sources:
- Firewall host aliases: CIDRs and IPs from aliases matched by user groups
- Config-level cidrs: pushed to all clients regardless of group membership Both are merged (deduplicated) before sending in ClientAck.
Admin commands
Admin CLI commands:
clients list [--user=X] List connected hexonclient sessions (cluster-wide) clients show <session_id> Show full session details (device, network, streams, traffic, timing) clients disconnect <user> [id] Disconnect all sessions for user, or a specific session [WRITE]Bastion shell commands (self-service, filtered to own sessions):
clients List your active hexonclient sessions clients list Same as above clients disconnect [session_id] Disconnect your own session(s)Security
Two authentication paths (determined by whether client sends a token):
Device code flow (interactive — RFC 8628, same as bastion SSH):
- Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
- Client sends ClientAuth with empty token (signals device code request)
- Gateway initiates device code authorization (server-side, no HTTP from client)
- Gateway sends DeviceCodeChallenge: verification URI, user code, expiry
- Client displays QR code + clickable URL + user code
- Gateway polls the device code service until authorized, denied, or expired
- On authorization: gateway extracts claims (username, email, groups) from poll response
- Gateway checks required_groups, derives routes, allocates VIP
- Gateway sends ClientAck with VIP, routes, DNS, and JWT tokens for reconnection Reconnected sessions use the JWT path below (no re-authentication needed).
JWT flow (reconnect / automation):
- Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
- Client sends ClientAuth: JWT + HMAC-SHA256(token, TLS exporter) proof
- Gateway validates JWT (extracts username, groups)
- Gateway verifies channel binding proof (RFC 5705 prevents token replay)
- Gateway checks required_groups (if configured): user must have ANY listed group
- Client sends ClientRegister with device metadata
- Gateway derives per-user routes from firewall ACL rules
- Gateway sends ClientAck with VIP, routes, DNS, token refresh interval
Per-stream access control:
- Each QUIC stream carries a DialHeader (host, port, protocol)
- Gateway checks firewall access control (user groups vs target host/port/protocol)
- Denied streams get DialStatusDenied response immediately
- Only allowed streams proceed to backend dial
DNS defense-in-depth:
- Per-session rate limiting: O(1) time-bucketed rolling window (dns_rate_limit qps)
- DNS ACL: after resolve, firewall checks user groups vs host aliases
- ACL-denied queries return DNSStatusDenied (RFC 8914 REFUSED) — prevents information leakage
- ACL call failure fails open (dial-time ACL is the authoritative control)
Token refresh:
- Client sends TokenRefresh with new JWT + proof before token expires
- Gateway re-validates JWT and channel binding
- Gateway re-checks required_groups: if user lost membership, connection is terminated
- If groups changed: re-derive routes, send RouteUpdate with add/remove entries
- Bad token on refresh kills the connection (security boundary)
Troubleshooting
Common symptoms and diagnostic steps:
Client cannot connect:
- Check listener: 'status summary' shows clientaccess listener state - Check config: 'config show client_access' (enabled, port, subnet) - Check required_groups: 'config show client_access' — user must be in listed groups (empty = any) - Check certs: 'certs list' or 'diagnose domain <hostname>' - Max clients reached: 'logs search clientaccess --level=warn' - Group denied: 'logs search clientaccess --level=warn' shows "group access denied"Client connected but cannot reach services:
- Check pushed routes: 'config show client_access' — cidrs must include destination subnet - Check firewall rules: user's groups must match rule sources - Check HostAlias: destination alias must have matching hosts (CIDRs for TUN routes, wildcards for DNS only) - Check connector: if Site is set, connector must be connected - 'logs search clientaccess-dial --level=warn' for denied dialsDNS not resolving:
- Check dns_upstream config: must point to reachable resolvers - Check dns_domains: domains must be in the pushed list for split DNS - 'logs search clientaccess-dns' for resolution errors - DNS ACL denied (REFUSED): check user's groups match firewall rules for the hostname - DNS rate limited (SERVFAIL): check dns_rate_limit setting (default 100 qps)Token refresh failures:
- 'logs search clientaccess-refresh --level=warn' - Invalid token: OIDC provider may have rotated keys - Channel binding failure: possible MITM or TLS session changeRelationships
Module dependencies:
- devicecode: Server-side device code authorization (RFC 8628) for interactive authentication
- oidc: JWT validation for reconnect/automation authentication
- firewall: Per-stream access control, DNS ACL enforcement, host alias route derivation
- dns: DNS resolution for client split DNS queries
- sessions: Cluster-wide session tracking (create, validate, revoke)
- connectors: Site-based routing through connector tunnels
- IP pool: Virtual IP allocation from dedicated subnet
- listener: QUIC listener with TLS 1.3 and idle timeout
- telemetry: Structured logging and Prometheus metrics
Logs
Log entries by component. Search with: logs search “clientaccess” Levels: ERROR > WARN > INFO > DEBUG.
Lifecycle:
clientaccess INFO initializing client access subsystem clientaccess ERROR failed to create IP pool clientaccess ERROR TLS config not available, client access listener disabled clientaccess ERROR failed to create client access listener clientaccess ERROR failed to start client access listener clientaccess INFO client access listener startedConnection:
clientaccess INFO AUDIT client connected (VIP, routes, hostname) clientaccess INFO AUDIT client disconnected (duration, traffic stats) clientaccess WARN client rejected: max clients reached clientaccess WARN unexpected first message typeRegistration:
clientaccess INFO client registered (session, VIP, hostname) clientaccess INFO client unregistered (session, duration, traffic counters)Authentication — JWT:
clientaccess INFO/WARN client auth failed (INFO for PAT rejection, WARN otherwise) clientaccess WARN channel binding failedAuthentication — Device Code:
clientaccess WARN device code auth rejected: concurrency limit reached clientaccess WARN device code authorization request failed clientaccess INFO device code challenge sent, waiting for authorization clientaccess INFO client disconnected during device code auth clientaccess INFO device code authorized clientaccess INFO device code denied by user clientaccess INFO device code expiredAuthorization:
clientaccess WARN group access deniedToken Refresh:
clientaccess WARN token refresh failed: invalid token clientaccess WARN token refresh failed: channel binding clientaccess WARN group access revoked on refresh clientaccess INFO token refreshed with group change clientaccess DEBUG token refreshedPAT Revocation:
clientaccess INFO disconnected clients after PAT revocationDial:
clientaccess WARN dial denied by ACL clientaccess DEBUG dial failed clientaccess DEBUG udp dial failed clientaccess DEBUG dial accept stream errorTraffic:
clientaccess DEBUG client trafficHexdcall Module:
clientaccess.list_sessions WARN Registry not initialized clientaccess.list_sessions DEBUG Listed client access sessions clientaccess.disconnect_session WARN Username missing in disconnect request clientaccess.disconnect_session WARN Registry not initialized clientaccess.disconnect_session INFO Session not found on this node clientaccess.disconnect_session INFO Disconnected client access session clientaccess.disconnect_session INFO Disconnected all client access sessions for userMetrics
Prometheus metrics. Query with: metrics prometheus clientaccess_<name>
Connections:
clientaccess_connections_total counter {} QUIC connections accepted clientaccess_connections_active gauge {} Currently active QUIC connections clientaccess_connections_rejected counter {reason} Connections rejected before auth clientaccess_connection_duration latency {username?} Connection lifetimeAuthentication:
clientaccess_auth_success_total counter {username?} Successful authentications clientaccess_auth_failures_total counter {reason} Failed authenticationsClients:
clientaccess_clients_active gauge {} Registered client instancesHeartbeat:
clientaccess_heartbeat_latency latency {username?} Heartbeat RTT (raw)Dial:
clientaccess_dials_total counter {} Dial requests received clientaccess_dials_denied_total counter {} Dials denied by ACL clientaccess_dials_success_total counter {} Dials completed successfully clientaccess_dials_errors_total counter {} Dial errors (connect refused, timeout) clientaccess_dial_latency latency {} Backend dial time clientaccess_streams_active gauge {} Active QUIC dial streamsDNS:
clientaccess_dns_queries_total counter {} DNS queries processedAlerts:
clientaccess_connections_active > max_clients * 0.9 Approaching client limit rate(clientaccess_connections_rejected[5m]) > 10 Connection rejection spike rate(clientaccess_auth_failures_total[5m]) > 10 Authentication failure spike rate(clientaccess_dials_denied_total[5m]) > 20 ACL denial spikeQUIC Connector
Connects remote sites to the gateway via outbound QUIC tunnel — no inbound ports required at the remote site
Overview
Enables access to services at remote sites without IPsec or opening inbound ports. A lightweight binary at the remote site dials out to the gateway over QUIC — the gateway routes traffic through the tunnel. All protocols work through connectors: HTTP proxy, SSH bastion, forward proxy, and SQL bastion.
A lightweight binary (hexonconnect) deployed at the remote site establishes an outbound QUIC connection to Hexon. Hexon then sends “dial” commands through this tunnel whenever a proxy mapping, bastion session, forward proxy rule, or firewall policy references that site via the “site” parameter.
Key capabilities:
- Zero-trust remote access: connector dials only what Hexon asks, nothing else
- Opaque site namespace: same IPs and DNS names across sites are irrelevant
- Stateless token auth: HMAC-derived tokens validated without storage
- Channel-bound authentication: RFC 5705 TLS Exported Keying Material prevents replay and MITM attacks — the token never travels on the wire
- Multi-instance HA: multiple connectors per site with adaptive load balancing
- Cross-node routing: any cluster node can route to any connector via adaptive inter-node forwarding — requests arriving at a node without connector instances are transparently forwarded to a node that has them
- Auto-reconnect: connector never gives up, exponential backoff on disconnect
- CDN-compatible: optional dedicated hostname and TLS certificate for direct access
Configuration
Configuration uses the [connector] TOML section:
[connector] enabled = true port = 8444 # hostname = "connector.example.com" # Optional: dedicated hostname (CDN bypass) # cert = "/path/to/cert.pem" # Optional: file path or inline PEM # key = "/path/to/key.pem" # Optional: file path or inline PEM [[connector.sites]] id = "prod-asia-a8f3c1" name = "Production Asia" cidrs = ["203.0.113.0/24"] max_instances = 3 rebalance = true # Distribute across cluster nodes (default: true) rebalance_retries = 5 # Accept after N soft-rejects (default: 5, 1-10)TLS certificate resolution:
1. connector.cert/key when set (static certificate) 2. SNI callback: auto-TLS (ACME), certmanager, wildcard, or service certificate If connector.hostname is set and no cert/key is provided, ACME will automatically provision a certificate for the connector hostname.Usage across subsystems — add “site” parameter:
[[proxy.mapping]] app = "API Asia" host = "api-asia.example.com" service = "http://api.default.svc.cluster.local:8080" site = "prod-asia-a8f3c1" # Shadow targets can also route through connectors: [[proxy.mapping.shadow]] name = "staging-mirror" service = "https://staging.internal:8443" site = "staging-eu" # Circuit breaker fallback can use a different connector site: [proxy.mapping.circuit_breaker] fallback_mode = "service" fallback_service = ["http://dr-backend:8080"] fallback_site = "dr-europe" # SSH cert rules — route bastion SSH through connector: [[bastion.ssh_cert.rules]] name = "remote-dc-ssh" groups = ["devops"] destinations = ["*.internal"] site = "prod-asia-a8f3c1" # SQL bastion — route database connections through connector: [[sql_bastion.sites]] name = "postgres-remote" type = "postgres" host = "pg.internal" port = 5432 site = "prod-asia-a8f3c1" # Firewall host aliases — route forward proxy traffic through connector: [[firewall.aliases.hosts]] name = "remote_services" hosts = ["gitlab.internal", "jenkins.internal"] site = "prod-asia-a8f3c1" # Aliases with site skip nft rules — traffic goes through userspace QUIC tunnelToken generation is deterministic from the cluster key — any node can validate.
Admin commands
Admin CLI:
connector list List configured sites and live connections connector show <site-id> Show site config, token, and connected instances (includes platform, origin with geo/ASN, system labels) connector create <site-id> Create new site (generates token) connector revoke <site-id> Block site, disconnect active QUIC tunnels connector instances <site-id> List connected instances with metricsThe “connector show” output includes per-instance details: platform (OS/arch), origin IP with country and ASN (via geo module), and system labels reported by the connector binary (kernel, OS version, runtime environment, memory, virtualization, PID, UID/GID).
The “connector revoke” command disconnects active QUIC tunnels in addition to revoking cluster sessions, causing connectors to reconnect (and be rejected).
Config reload cleanup: when a site is removed from config (via GitOps or hot reload), active QUIC connections for that site are automatically disconnected. The connector binary will reconnect but be rejected because the site is no longer in config. This prevents stale sessions from lingering in JetStream KV.
Security
Trust boundaries:
- Hexon Cluster (full trust): policy enforcement, identity, routing
- Connector (minimal trust): dials only what Hexon asks, no autonomous access
Authentication flow:
- QUIC/TLS 1.3 connection established (server cert, ECDHE, forward secrecy)
- Both sides compute TLS exporter keying material (RFC 5705) with an application-specific label
- Connector sends: site_id + HMAC of token bound to the TLS channel
- Hexon validates by recomputing from cluster key
Additional protections:
- Optional CIDR allowlist per site restricts connector source IPs
- max_instances limit prevents token abuse
- Instance selection uses epsilon-greedy adaptive algorithm with circuit breaker
- QUIC relay loop prevention: relay handler only dispatches locally, preventing infinite forwarding loops between nodes
- Cluster-wide rebalancing: soft-rejects excess connectors so they redistribute across gateway nodes (configurable per site, default 5 retries before accepting)
Inter node forwarding
All cluster nodes can route to any connector site through QUIC relay.
When a request arrives at a node without local connector instances (or after local retries are exhausted), the dispatcher transparently relays through a peer node. The relay uses QUIC on the same connector port (8444) with ALPN “hexon-relay” and mTLS for peer authentication. Each relay request opens a QUIC stream, sends a dispatch header, and the peer dispatches locally through its QUIC connector tunnel.
All traffic types converge through the same dispatch path — this covers reverse proxy, forward proxy, client access (TCP/UDP), SSH bastion, SQL bastion, shadow targets, and probes.
Remote node IPs are cached (5s refresh) from cluster-wide connector sessions. Failed nodes are tracked by the cluster discovery health checks.
Loop prevention:
- The relay handler only dispatches locally (never relays further)
- A peer with no local instances returns an immediate error
Troubleshooting relay:
- Client-side metrics: relay_total (attempts), relay_success_total, relay_errors_total
- Server-side metrics: relay_served (requests handled), relay_rejected_total (auth failures)
- Relay rejected with “no_certificate”: peer isn’t presenting its service cert
- Relay rejected with “not_peer”: source IP not in cluster discovery peer list
- Relay “no_instances”: the peer node also has no local connectors for the site
- Check logs: ‘logs search connectors.relay —level=warn’
Quic tuning
QUIC performance tuning applied to both gateway and connector sides:
Flow control windows (tuned for database and bulk transfer workloads):
- Stream: 2MB initial, 8MB max - Connection: 4MB initial, 20MB max - Stream-to-connection ratio: 2:5 Larger initial windows reduce round-trips for big responses (SQL results, file transfers).Persistent QUIC transport (connector side):
- hexonconnect reuses one UDP socket across reconnections - Avoids per-connection socket allocation and kernel offload state loss - Enables future QUIC connection migration if network interface changesStream error handling:
- Error paths immediately release QUIC stream resources instead of graceful close - Frees resources under load without waiting for peer acknowledgmentMax concurrent streams:
- Gateway: configurable per listener (default 100) - Connector: 1024 (high concurrency for multiplexed tunnel streams)Rebalancing
When multiple connector replicas start simultaneously (e.g., Kubernetes Deployment with 3 replicas), they may all connect to the same gateway node via DNS or a load balancer. The rebalance mechanism redistributes them:
- First connector for a site on a node is always accepted
- Subsequent connectors check cluster distribution: if this node has more instances than the least-loaded remote node, the registration is soft-rejected
- The connector reconnects with a short backoff (2 seconds) — DNS/LB randomness typically sends it to a different node
- After N soft-rejects (configurable, default 5), the node accepts anyway
Per-site configuration:
rebalance = true # Enable cluster-wide load distribution (default: true) rebalance_retries = 5 # Max soft-rejects before accepting (1-10, default: 5)Rebalance is best-effort — sticky load balancers may prevent redistribution, so the retry budget ensures connectors are never stuck. Metrics: rebalance_reject_total and rebalance_accept_total track distribution activity per site.
Logs
Log entries by component. Search with: logs search “connectors” Levels: ERROR > WARN > INFO > DEBUG. DEBUG requires log level configuration.
Initialization:
connectors INFO initializing connector subsystem connectors ERROR TLS config not available, connector listener disabled connectors ERROR failed to create connector listener connectors ERROR failed to start connector listener connectors INFO connector listener startedAuthentication:
connectors.handler WARN AUDIT connector auth failed: invalid proof connectors.handler WARN AUDIT connector auth failed: unknown site connectors.handler WARN AUDIT connector auth failed: source IP not allowedConnection lifecycle:
connectors.handler INFO AUDIT connector connected connectors.handler INFO AUDIT connector disconnectedRegistry:
connectors.registry INFO AUDIT Connector instance registered connectors.registry INFO AUDIT Connector instance unregisteredSession management:
connectors WARN failed to create session connectors WARN session create wait failed connectors WARN unexpected session create response type connectors DEBUG failed to extend session connectors DEBUG session extend wait failed connectors WARN failed to revoke session connectors WARN session revoke wait failedConfig reload:
connectors.reload INFO disconnected instances for removed siteRelay:
connectors.relay WARN AUDIT relay rejected: source IP not a cluster peer connectors.relay DEBUG relay connection accepted connectors.relay WARN relay fallback also failed after local exhaustionMetrics
Prometheus metrics. Query with: metrics prometheus connectors_<name>
Connections:
connectors_connections_total counter {} Total connector connections connectors_connections_active gauge {} Active connector connections connectors_connections_rejected counter {reason} Rejected connections connectors_connection_duration latency {site_id} Connection lifetimeAuthentication:
connectors_auth_success_total counter {site_id} Successful authentications connectors_auth_failures_total counter {site_id, reason} Authentication failuresInstances:
connectors_instances_active gauge {site_id} Active connector instances connectors_heartbeat_latency latency {site_id} Heartbeat round-trip timeDial (tunnel dispatch):
connectors_dials_total counter {site_id} Dial attempts through tunnel connectors_dials_success_total counter {site_id} Successful dials connectors_dials_errors_total counter {site_id, reason} Failed dials connectors_dial_latency latency {site_id} Dial latency connectors_streams_active gauge {} Active QUIC streamsRebalance:
connectors_rebalance_reject_total counter {site_id} Soft-rejected for rebalance connectors_rebalance_accept_total counter {site_id} Accepted after rebalance checkInter-node forwarding (TCP-level):
connectors_forward_total counter {site_id, target} Forward attempts to peer node connectors_forward_success_total counter {site_id, target} Successful forwards connectors_forward_errors_total counter {site_id, target} Failed forwards connectors_forward_latency latency {site_id, target} Forward latency connectors_forward_local_total counter {site_id} Requests handled locallyRelay (QUIC inter-node dispatch):
connectors_relay_total counter {site_id, target} Client-side relay attempts connectors_relay_served counter {site_id, target} Server-side relay requests handled connectors_relay_success_total counter {site_id, target} Successful relay dispatches connectors_relay_errors_total counter {site_id, reason} Failed relay dispatches connectors_relay_rejected_total counter {reason} Relay connections rejected (auth)Alerts:
rate(connectors_auth_failures_total[5m]) > 5 High auth failure rate (brute force or misconfiguration) connectors_instances_active == 0 No connector instances (site unreachable) rate(connectors_dials_errors_total[5m]) > 10 High dial failure rate (tunnel health) rate(connectors_relay_rejected_total[5m]) > 0 Relay auth failures (cluster misconfiguration) connectors_connections_active > 100 High connection countDNS Resolution
Resolves DNS for all gateway components — custom resolvers, DNSSEC validation, caching, and health-aware failover
Overview
Handles DNS resolution for all gateway components — proxy backends, bastion hosts, cluster discovery, and ACME validation. Provides custom resolvers with automatic failover, caching, and DNSSEC validation.
Capabilities:
- Custom DNS resolvers with automatic failover and circuit breaker pattern
- DNSSEC validation in two modes: resolver-trust (fast) and full cryptographic (secure)
- Distributed DNS caching via the memory storage module (local reads, broadcast writes)
- Lookup coalescing to prevent cache poisoning from concurrent requests
- Hostname validation to block DNS injection attacks (null bytes, CRLF)
- IPv4 preference when both A and AAAA records are available
- CNAME flattening with configurable depth limit (default 16, per RFC 1034)
- DNS-over-TLS (DoT) support for encrypted transport (RFC 7858)
- Adaptive resolver selection using epsilon-greedy algorithm (20-40% lower latency)
- Health checking with exponential backoff and automatic system DNS fallback
- Typed DNS queries for 30+ record types (A, AAAA, CAA, TLSA, SRV, MX, etc.)
- Context propagation for request cancellation and graceful shutdown
- TTL sanitization to prevent integer overflow attacks (capped at 1 week)
Operations:
- Resolve: DNS resolution with optional DNSSEC, caching, and resolver selection
- ValidateHostname: RFC-compliant hostname validation against injection attacks
Config
Core configuration under [dns]:
[dns] timeout = 5 # DNS query timeout in seconds (default: 5) cache_ttl = 300 # Default cache TTL in seconds (default: 300) cache_override = false # Ignore DNS server TTL, always use cache_ttl (default: false) resolvers = ["1.1.1.1:53", "8.8.8.8:53", "9.9.9.9:53"] # DNS resolvers (default: cluster.cluster_dns_resolvers) flatten_cname = true # Follow CNAMEs to final A/AAAA records (default: true) max_cname_depth = 16 # Max CNAME chain depth to prevent loops (default: 16)DNSSEC settings:
dnssec_full_validation = false # Full cryptographic RRSIG/DNSKEY validation (default: false) dnssec_strict = false # Fail if zone is not DNSSEC-signed (default: false)DNS-over-TLS (DoT):
dot_enabled = false # Enable DNS-over-TLS transport (default: false) dot_port = 853 # DoT port per RFC 7858 (default: 853) dot_verify_server_cert = true # Verify resolver TLS certificate (default: true)Health checking:
health_check_enabled = true # Enable resolver health monitoring (default: true) health_check_interval = 30 # Health check interval in seconds (default: 30) health_failure_threshold = 2 # Consecutive failures before marking unhealthy (default: 2) health_check_query = "google.com" # Domain used for health check probes (default: "google.com")Adaptive resolver selection (epsilon-greedy ML):
adaptive_selector_enabled = true # Enable adaptive resolver selection (default: true) adaptive_exploration_rate = 0.10 # Exploration rate 0.0-1.0 (default: 0.10 = 10%) adaptive_smoothing_factor = 0.3 # EMA smoothing factor for latency tracking (default: 0.3) adaptive_min_sample_size = 100 # Queries before switching from learning to intelligent mode (default: 100) adaptive_load_balance_enabled = true # Penalize recently-used resolvers to spread load (default: true)Resolver architecture — three separate resolver pools:
dns.resolvers # Infrastructure resolvers (health-checked, used by all modules) cluster.cluster_dns_resolvers # Cluster discovery resolvers (fallback if dns.resolvers unset) proxy.dns.resolvers # Proxy-specific override (must be subset of dns.resolvers)Per-route proxy DNS overrides in [[proxy.mapping]]:
dnssec = true # Override global DNSSEC setting for this route dns_resolvers = ["10.0.0.1:53"] # Override resolvers for this route (must be in dns.resolvers)TTL precedence (cache_override=false): DNS server TTL > dns.cache_ttl > 300s default. TTL precedence (cache_override=true): dns.cache_ttl > 300s default. TTL bounds: minimum 1 second, maximum 604800 seconds (1 week).
Cache key format: “dns_cache:{hostname}:{resolver_hash}” (128-bit SHA256 hash). Cache reads are local (no network). Cache writes broadcast to cluster (fire-and-forget).
Hot-reloadable: resolvers, DNSSEC settings, cache TTL, health check parameters, adaptive settings. Cold (restart required): dot_enabled, dot_port.
Troubleshooting
Common symptoms and diagnostic steps:
DNS resolution failures:
- Check resolver health: 'dns resolvers' shows status, latency, and failure counts - Test specific hostname: 'dns test <hostname>' performs live resolution - All resolvers unhealthy: module falls back to system DNS (/etc/resolv.conf) - Resolver filtered out: proxy resolvers must be a subset of dns.resolvers - Cross-subsystem check: 'diagnose domain <hostname>' tests DNS + proxy + TLS togetherDNSSEC validation errors:
- Zone not signed: set dnssec_strict=false to allow unsigned zones (default) - Resolver-trust mode: compromised resolver can fake AD bit — use dnssec_full_validation=true - Full validation slow: first query ~200ms (chain of trust), cached queries ~50ms - Clock skew: DNSSEC signatures have validity windows — ensure NTP is running - Check validation: 'dns test <hostname> --dnssec' shows validation result and mode - Strict mode blocking: dnssec_strict=true rejects all unsigned zones — check per-route overrideSlow DNS resolution:
- Check cache hit rate: 'dns cache' shows hit/miss ratio and entry count - High cache miss: increase cache_ttl or set cache_override=true for static backends - Resolver latency: 'dns resolvers' shows per-resolver average latency (EMA) - Adaptive selector: 'dns adaptive' shows resolver scores and selection distribution - Learning phase: first 100 queries use round-robin — performance improves after - CNAME chains: deep chains add latency per hop — check with 'dns test <hostname>'All resolvers down (circuit breaker tripped):
- Health checker marks resolver unhealthy after 2 consecutive failures (configurable) - Backoff schedule: 30s, 1m, 2m, 4m, 8m, 15m (max) - System DNS fallback activates automatically when all custom resolvers fail - Recovery is automatic — resolver returns to pool when health check succeeds - Force re-check: 'dns health --reset' clears backoff timers - Check: 'dns resolvers' shows healthy/unhealthy status and next retry timeCache poisoning concerns:
- Lookup coalescing: concurrent requests for same hostname share single lookup result - Per-hostname locking prevents race conditions (no global bottleneck) - Enable DNSSEC (dnssec_full_validation=true) for cryptographic validation - Use DoT (dot_enabled=true) to encrypt DNS transport against snoopingCNAME resolution issues:
- CNAME not followed: check flatten_cname=true (default) - CNAME loop detected: max_cname_depth exceeded (default 16) — check DNS zone config - CNAME + ACL: ACL checks use original hostname, not CNAME target (prevents bypass) - Metrics: dns.cname_resolutions_total tracks success and depth_exceeded countsDoT connection failures:
- Port blocked: DoT uses port 853 (RFC 7858) — verify firewall rules - Certificate error: set dot_verify_server_cert=false to diagnose (re-enable after) - Non-standard port: module warns if dot_port is not 853502/503 from proxy due to DNS:
- DNSSEC failure blocks connection (no system DNS fallback for security) - DNS infrastructure failure falls back to system DNS (availability) - Fix: set dnssec=false on specific proxy routes for unsigned internal zones - Verify: 'dns test <backend-hostname> --dnssec' to check DNSSEC statusInterpreting tool output:
'dns health': Healthy: Status=healthy, Healthy resolvers = total resolvers Degraded: Healthy < total — some resolvers failing, but DNS still works Down: Healthy=0 — all resolvers failed, system DNS fallback active Action: Degraded/Down → 'dns resolvers' for per-resolver breakdown 'dns resolvers': Healthy: Status=healthy, Latency < 50ms, Score > 100 Degraded: Status=unhealthy with BackoffUntil timestamp — resolver in circuit breaker Learning: Score near 100 with low QueryCount — adaptive selector still calibrating (normal) Action: All unhealthy → check network connectivity to resolver IPs, verify port 53/853 open 'dns test <hostname>': Success: IPs returned, TTL shown, DNSSEC=valid (if enabled) DNSSEC failure: DNSSEC=invalid — zone is unsigned or signatures expired No results: hostname does not resolve — check DNS zone configuration Action: DNSSEC failure + proxy 502 → set dnssec=false on that proxy routeArchitecture
Resolution flow:
- Resolve request arrives (from proxy, bastion, ACME, or discovery)
- Hostname validation: RFC compliance check, injection prevention (null bytes, CRLF, length)
- Cache lookup: local memory read for “dns_cache:{hostname}:{resolver_hash}”
- If cache hit: return cached IPs immediately (no network call)
- If cache miss: acquire per-hostname lock (coalescing for concurrent requests)
- Resolver selection: adaptive selector picks best resolver (or round-robin during learning)
- Health filter: only healthy resolvers considered (circuit breaker pattern)
- DNS query: send query via UDP (or DoT if enabled) with configured timeout
- DNSSEC validation (if enabled): a. Resolver-trust mode: check AD bit in response b. Full validation: verify RRSIG signatures, validate DNSKEY chain to root trust anchor
- CNAME handling: if CNAME response and flatten_cname=true, recursively resolve target
- IPv4 preference: sort results with A records before AAAA records
- TTL extraction: from DNS response (DNSSEC/custom resolver) or use configured default
- TTL sanitization: clamp to [1s, 604800s], zero defaults to 300s
- Cache store: broadcast write to cluster memory (fire-and-forget, best-effort)
- Release per-hostname lock, waiting callers receive same result
- Return ResolveResponse with IPs, TTL, cached flag, DNSSEC validity
Adaptive resolver selection (epsilon-greedy):
Learning phase (first 100 queries): round-robin across all healthy resolvers Intelligent phase: 90% exploitation (best score), 10% exploration (random) Score = 100 + (success_rate * 50) - (avg_latency_ms / 10) - (timeout_pct * 30) - (consecutive_failures * 20) - (recently_used * 10) Latency tracked via EMA: new_avg = 0.3 * sample + 0.7 * old_avg Load balancing penalty: -10 points if resolver used within last 1 secondHealth checker circuit breaker:
Healthy: failure_count = 0, available for selection Unhealthy: failure_count >= threshold (default 2), excluded from selection Backoff: 30s -> 1m -> 2m -> 4m -> 8m -> 15m (max) Recovery: single successful health check returns resolver to healthy state System DNS fallback: automatic when ALL custom resolvers are unhealthy Memory cleanup: Resolver sync removes stale entries on config reloadDNSSEC full validation chain:
1. Query resolver with DO bit set 2. Extract RRSIG from response 3. Fetch DNSKEY for target zone (cached with TTL) 4. Verify RRSIG signature using DNSKEY (RSA/SHA-256, ECDSA P-256, Ed25519) 5. Fetch DS record from parent zone 6. Verify DNSKEY hash matches DS record 7. Recurse up to root zone 8. Validate root DNSKEY against hardcoded IANA trust anchor (KSK 20326) 9. Validate NSEC/NSEC3 for authenticated denial of existenceDistributed caching via memory module:
Read path: local-only (no network, no quorum) Write path: broadcast to all cluster nodes (fire-and-forget) Key format: "dns_cache:{hostname}:{sha256_hash_of_resolvers}" (collision-resistant) Eviction: TTL-based (respects DNS TTL or configured override) Coalescing: per-hostname mutex prevents concurrent duplicate lookupsMetrics emitted:
dns.resolve_total (tags: status, cached, dnssec) dns.resolve_latency_ms (histogram) dns.cache_hit_total / dns.cache_miss_total dns.health_check_total (tags: resolver, status) dns.adaptive_resolver_selected (tags: resolver, reason) dns.resolver_score (gauge, tags: resolver) dns.resolver_avg_latency_ms (gauge, tags: resolver) dns.cname_resolutions_total (tags: status)Relationships
Module dependencies and interactions:
-
proxy: Backend hostname resolution for all proxy routes. Uses [dns] configuration by default. Per-route overrides via dnssec and dns_resolvers fields in [[proxy.mapping]]. DNSSEC validation failure blocks connection (no system DNS fallback — prevents downgrade). DNS infrastructure failure falls back to system DNS (availability).
-
bastion: SSH connection and port forwarding hostname resolution. Uses [dns] configuration directly (no bastion-specific overrides). DNSSEC protects against SSH destination poisoning.
-
discovery: Cluster peer discovery via DNS SRV records. Uses [dns] configuration for resolver settings. Critical for cluster formation and membership.
-
acme: ACME challenge validation uses typed DNS queries (CAA record checking per RFC 8659). SERVFAIL handling distinguishes “no records” from “DNS infrastructure error” for security.
-
memory: Distributed DNS cache storage. Local reads (fast), broadcast writes (best-effort). No quorum required — cache is opportunistic, falls back to fresh lookup on miss.
-
config: Reads [dns] and [cluster] TOML sections. Hot-reload updates resolvers, DNSSEC settings, cache parameters, health check configuration, and adaptive selection tuning. Resolver sync cleans up stale resolver state on reload (memory leak prevention).
-
metrics (telemetry): Emits counters, histograms, and gauges for resolution, caching, health checks, and adaptive selection. Enables monitoring dashboards and alerting.
Logs
Log entries by component. Search with: logs search “dns” Levels: ERROR > WARN > INFO > DEBUG > TRACE.
Init & Lifecycle:
dns.init INFO DNS module initialized dns.health INFO DNS resolvers not configured, using cluster resolvers for health checking dns.health WARN Failed to initialize resolver health manager dns.health INFO Resolver health manager started dns.health INFO Health checking enabled but no resolvers configured dns.health INFO Resolver health checking disabled dns.adaptive INFO Adaptive resolver selector initialized dns.adaptive INFO Adaptive selector enabled but no resolvers configured dns.adaptive INFO Adaptive resolver selector disabledResolution:
dns.resolve DEBUG DNS resolution request dns.resolve DEBUG DNS cache hit dns.resolve DEBUG Waiting for concurrent DNS lookup to complete dns.resolve ERROR DNS lookup panicked dns.resolve ERROR DNS resolution failed dns.resolve INFO DNS resolution succeeded - no records found dns.resolve INFO DNS resolution succeededHostname Validation:
dns.validate WARN Hostname validation failedHealth Status:
dns.gethealth DEBUG DNS health status requestedCache Operations:
dns.cache WARN Invalid cache entry type dns.cache WARN Failed to broadcast DNS cache update dns.cache DEBUG DNS result cachedDNSSEC Core:
dns.dnssec DEBUG Using DNS-over-TLS dns.dnssec WARN DNS query failed dns.dnssec DEBUG DNS query returned error dns.dnssec.full DEBUG RRSIG present but AD bit not set - performing full validation dns.dnssec.full ERROR Full DNSSEC validation failed dns.dnssec.full INFO Full DNSSEC validation succeeded dns.dnssec ERROR DNSSEC validation failed: RRSIG present but AD bit not set dns.dnssec ERROR DNSSEC strict mode: zone not signed dns.dnssec WARN DNSSEC validation skipped: zone not signed dns.dnssec DEBUG DNSSEC validation succeeded (resolver-trust mode)DNSSEC Validation:
dns.dnssec.validate WARN RRSIG signature verification failed dns.dnssec.validate WARN RRSIG signature expired or not yet valid dns.dnssec.validate DEBUG RRSIG signature validated successfully dns.dnssec.dnskey WARN Failed to query DNSKEY dns.dnssec.dnskey WARN DNSKEY query returned error dns.dnssec.dnskey WARN No DNSKEY records found in zone dns.dnssec.dnskey DEBUG DNSKEY records fetched successfully dns.dnssec.validate ERROR DNSSEC strict mode: RRset not signed dns.dnssec.validate DEBUG RRset has no RRSIG (zone not signed) dns.dnssec.validate ERROR No matching DNSKEY found for RRSIG dns.dnssec.validate INFO DNSSEC validation completedDNSSEC Cache:
dns.dnssec.cache DEBUG DNSKEY cache hit dns.dnssec.cache DEBUG DNSKEY cache expired dns.dnssec.cache DEBUG DNSKEY cached dns.dnssec.cache DEBUG DS cache hit dns.dnssec.cache DEBUG DS cache expired dns.dnssec.cache DEBUG DS cached dns.dnssec.cache INFO DNSSEC cache clearedDNSSEC Chain of Trust:
dns.dnssec WARN DEPRECATED: SHA-1 used in DNSSEC validation dns.dnssec.ds WARN Failed to query DS dns.dnssec.ds WARN DS query returned error dns.dnssec.ds DEBUG No DS records found (zone may be unsigned or at root) dns.dnssec.ds DEBUG DS records fetched successfully dns.dnssec.chain WARN Failed to compute DS digest dns.dnssec.chain DEBUG DNSKEY validated successfully using DS dns.dnssec.chain ERROR DNSKEY validation failed: no matching DS found dns.dnssec.chain DEBUG Validating chain of trust dns.dnssec.chain INFO Root DNSKEY validated against trust anchor dns.dnssec.chain ERROR Root DNSKEY validation failed: no matching trust anchorDNSSEC NSEC/NSEC3:
dns.dnssec.nsec DEBUG No NSEC records found in response dns.dnssec.nsec DEBUG Found NSEC records for validation dns.dnssec.nsec INFO NSEC authenticated denial validated dns.dnssec.nsec WARN NSEC validation failed: name not in range dns.dnssec.nsec3 DEBUG No NSEC3 records found in response dns.dnssec.nsec3 DEBUG Found NSEC3 records for validation dns.dnssec.nsec3 WARN Unsupported NSEC3 hash algorithm dns.dnssec.nsec3 ERROR Failed to compute NSEC3 hash dns.dnssec.nsec3 INFO NSEC3 authenticated denial validated dns.dnssec.nsec3 WARN NSEC3 validation failed: hash not in rangeResolver:
dns.resolve WARN Hostname validation failed dns.ttl DEBUG Cache override enabled, using configured TTL dns.ttl DEBUG Using DNS server TTL dns.ttl DEBUG DNS server TTL not available, using fallback dns.health DEBUG Filtered unhealthy resolvers dns.resolve DEBUG DNS resolution succeeded dns.resolve DEBUG DNS resolution failed, trying next resolver dns.resolve ERROR All DNS resolvers failed dns.resolve DEBUG Using system DNS resolver dns.resolve DEBUG Using configured DNS cache TTL for system resolver dns.resolve DEBUG DNS resolution succeeded dns.dnssec DEBUG DNSSEC resolution succeeded dns.dnssec WARN DNSSEC lookup failed, trying next resolver dns.cname DEBUG Resolving CNAME target dns.cname DEBUG CNAME record found dns.cname WARN Failed to resolve CNAME target dns.cname DEBUG CNAME chain returned (flatten disabled) dns.query DEBUG Using DNS-over-TLS dns.query WARN DNS query failed dns.query DEBUG DNS query returned error dns.query DEBUG DNS query completed dns.query WARN DNS query returned SERVFAILAdaptive Resolver:
dns.adaptive ERROR Failed to create adaptive selector dns.adaptive INFO Cleaned up performance data for removed resolvers dns.adaptive INFO Adaptive resolver selector initialized dns.adaptive TRACE Resolver performance updated dns.adaptive INFO Adaptive selector learning phase completed, switching to intelligent selection dns.adaptive DEBUG Adaptive DNS resolution succeeded dns.adaptive DEBUG Adaptive DNS resolution failed, selecting another resolver dns.adaptive ERROR All adaptive DNS resolution attempts failedHealth Manager:
dns.health INFO Initializing resolver health checks dns.health ERROR Invalid resolver address format dns.health WARN Initial health check failed dns.health INFO Initial health check passed dns.health ERROR No healthy DNS resolvers available dns.health INFO Resolver health initialization complete dns.health DEBUG Starting health check dns.health WARN Health check query failed dns.health WARN Health check returned nil response dns.health DEBUG Health check returned error response dns.health DEBUG Health check successful dns.health DEBUG GetHealthyResolvers called dns.fallback WARN All custom DNS resolvers unhealthy, falling back to system DNS dns.fallback INFO Custom DNS resolver recovered, switching back from system DNS dns.health WARN RecordSuccess called for unknown resolver dns.health INFO Resolver recovered dns.fallback INFO Custom DNS resolver recovered, switching back from system DNS dns.health WARN RecordFailure called for unknown resolver dns.health WARN Resolver marked unhealthy dns.health INFO Starting resolver health checker dns.health INFO Stopping resolver health checker dns.health DEBUG Performing health checks dns.health DEBUG Health check still failing dns.health INFO Resolver recovered via health check dns.health INFO Removed resolvers no longer in configurationMetrics
Prometheus metrics. Query with: metrics prometheus dns_<name>
Resolution (namespace: dns):
dns_resolve_total counter {result, cached, dnssec} Resolution outcomes result=success, cached=true|false Successful resolution result=nxdomain, cached=true|false Domain not found (valid response) result=failure, cached=false Resolution failed dns_nxdomain_total counter {} NXDOMAIN responses (uncached) dns_cache_hits counter {} Cache hits dns_cache_misses counter {} Cache misses dns_lookup_coalesced counter {} Lookups coalesced (shared concurrent result) dns_lookup_performed counter {} Lookups actually performed dns_cache_operations_total counter {operation, result} Cache write operations operation=set, result=success|error Broadcast cache set outcomesResolver Selection (namespace: dns):
dns_resolver_queries_total counter {resolver, result} Per-resolver query outcomes result=success|nxdomain|failure Query result per resolver dns_system_dns_queries_total counter {result} System DNS fallback queries result=success|nxdomain|failure System resolver outcomesTransport (namespace: dns):
dns_transport_used counter {type, resolver} DNS transport protocol used type=udp|dot UDP or DNS-over-TLSCNAME Resolution (namespace: dns):
dns_cname_resolutions_total counter {status} CNAME chain resolution outcomes status=success|depth_exceeded CNAME follow resultsDNSSEC Validation (namespace: dns):
dns_dnssec_validations_total counter {result, resolver} Resolver-trust mode validations result=valid|invalid|unsigned AD bit check outcomes dns_dnssec_full_validations counter {result, resolver} Full cryptographic validations result=valid|invalid RRSIG/DNSKEY verification outcomes dns_dnssec_signature_validations counter {result, algorithm} RRSIG signature verifications result=valid Successful signature check dns_dnssec_dnskey_queries counter {result} DNSKEY record fetches result=success DNSKEY query succeeded dns_dnssec_response_validations counter {result} Full response validations result=valid All RRsets validated dns_dnssec_chain_validations counter {result} Chain of trust DS validations result=valid|invalid DNSKEY-DS digest match dns_dnssec_root_validations counter {result} Root trust anchor validations result=valid|invalid Root DNSKEY match dns_dnssec_nsec_validations counter {result, type} NSEC/NSEC3 denial validations result=valid|invalid, type=nsec|nsec3 Authenticated denial outcomesDNSSEC Cache (namespace: dns):
dns_dnssec_cache_hits counter {type} DNSSEC record cache hits type=dnskey|ds Cached record type dns_dnssec_cache_misses counter {type} DNSSEC record cache misses type=dnskey|ds Record type queried dns_dnssec_cache_clears counter {} DNSSEC cache full clearsHealth Management (namespace: dns):
dns_resolver_latency latency {resolver} Per-resolver query latency dns_resolver_healthy gauge {resolver} Resolver health status (1=healthy, 0=unhealthy) dns_resolver_avg_latency_ms gauge {resolver} Resolver average latency EMA (ms) dns_resolver_consecutive_failures gauge {resolver} Consecutive failure count per resolver dns_resolver_failures_total counter {resolver} Total resolver failures dns_system_fallback gauge {} System DNS fallback active (1=active, 0=inactive) dns_fallback_activations counter {} System DNS fallback activationsAdaptive Selection (namespace: dns):
dns_adaptive_resolver_selected counter {resolver, reason} Adaptive resolver selections reason=exploration|best_score|round_robin|... Selection strategy used dns_adaptive_selection_total counter {mode, resolver} Selection mode distribution mode=explore|exploit Exploration vs exploitation dns_resolver_score histogram {resolver} Resolver scores (intelligent phase)Forward Proxy
Browser-native access to internal resources — no client software needed, just configure the browser’s proxy settings
Overview
Provides browser-native access to internal resources — no client software needed. Users configure their browser’s proxy settings (or use the auto-generated PAC file) and access internal services as if they were local. Handles HTTP CONNECT for TCP tunneling and CONNECT-UDP for UDP proxying via MASQUE.
Core capabilities:
- HTTP CONNECT handling for TCP proxy tunneling
- CONNECT-UDP handling for UDP proxy tunneling (MASQUE/QUIC)
- PAC file endpoint serving at configurable path (default /proxy.pac)
- Browser extension config endpoint at /proxy/config
- Browser extension setup/login endpoint at /proxy/setup
- CONNECT rejected on main service port (421 Misdirected) — proxy port only
- Geo-IP and time-based restriction enforcement before tunneling
- DNS resolution with system DNS fallback
- Bidirectional TCP relay with idle timeout and max connection duration
- HTTP/2+ full duplex CONNECT stream support (RFC 8441)
- HTTP/1.1 connection hijacking for classic CONNECT tunneling
- Connection tracking and byte-level metrics recording
The service runs on a dedicated port (forward_proxy.port) separate from the main service port for security isolation. CONNECT requests on the main port receive 421 Misdirected Request, directing clients to the correct proxy port.
TCP CONNECT request flow:
1. Extract client IP (CDN bypass mode uses RemoteAddr directly) 2. Check geo-IP and time-based restrictions 3. Validate target host:port format (RFC 1035 hostname length limit) 4. Extract bearer token from Proxy-Authorization header 5. Authenticate token and check user is not disabled 6. Check ACL (firewall group rules for target destination) 7. Check per-user rate limit (fail-closed) 8. Resolve hostname via DNS module (system DNS fallback) 9. Establish backend TCP connection with configurable timeout 10. Start bidirectional relay with idle timeout and max duration 11. Record metrics (bytes sent/recv, duration, success)CONNECT-UDP request flow:
1-7. Same as TCP (restrictions, auth, ACL, rate limit) 8. MASQUE UDP proxying (capsule protocol, socket management) 9. Record metrics after session completesBearer token authentication supports two formats:
- "Bearer <token>" header (direct bearer token) - "Basic <base64>" header where username is "_bearer_" and password is the token (Chrome's onAuthRequired format for Proxy-Authorization)Config
Service-level configuration under [forward_proxy] in hexon.toml:
[forward_proxy] enabled = true # Enable forward proxy (default: false) port = 8443 # Dedicated proxy port (must differ from service.port) public_port = 8443 # External port for PAC URLs (NAT/LB scenarios) hostname = "proxy.example.com" # Separate hostname for CDN bypass (optional) enable_tcp = true # Enable TCP CONNECT handling (default: true) enable_udp = true # Enable CONNECT-UDP/MASQUE handling (default: true) udp_proxy_path = "/masque" # URI path for CONNECT-UDP requests (default: /masque) auth_mode = "bearer" # Authentication mode for CONNECT requests buffer_size = "32KB" # TCP relay buffer size (default: 32KB) connect_timeout = "10s" # Backend connection timeout idle_timeout = "5m" # Idle connection timeout (no data flowing) max_connection_duration = "24h" # Maximum connection duration (hard limit) preserve_client_port = true # Use client's port in Alt-Svc header # Token settings (used by /proxy/config endpoint) token_ttl = "5m" # Token validity duration (default: 5m, min: 30s) token_refresh_interval = "60s" # Extension refresh interval (default: 60s, min: 5s) # TLS certificate for the proxy hostname (when hostname differs from service) cert = "/path/to/cert.pem" # File path or inline PEM key = "/path/to/key.pem" # File path or inline PEM # Geo-IP restrictions (overrides [service] if set) geo_enabled = true # Enable geo-IP restrictions geo_allow_countries = ["US", "CA"] # Allowed country codes (ISO 3166-1 alpha-2) geo_deny_countries = [] # Denied country codes geo_bypass_cidr = ["10.0.0.0/8"] # CIDR ranges that bypass geo checks geo_deny_code = 403 # HTTP status for geo denial geo_deny_message = "Access denied from your location" # Time-based restrictions (overrides [service] if set) time_enabled = true # Enable time-based restrictions time_timezone = "America/New_York" # Timezone for time checks time_allow_days = ["Mon","Tue","Wed","Thu","Fri"] time_allow_hours = "09:00-18:00" # Allowed hours range time_deny_code = 403 # HTTP status for time denial time_deny_message = "Access not permitted at this time"# PAC file settings[forward_proxy.pac] enabled = true # Enable PAC endpoint (default: true) path = "/proxy.pac" # PAC file URL path cache_ttl = "15m" # PAC response Cache-Control max-age group = "proxy-users" # Required group for PAC/config/setup access (optional) use_firewall_targets = true # Derive PAC targets from firewall rulesEndpoints registered by the service:
GET /proxy.pac - PAC file (requires auth, optional group) GET /proxy/config - JSON: PAC + token + refresh interval + username + server_time GET /proxy/setup - Login trigger page for browser extensionsCDN bypass mode:
When forward_proxy.hostname differs from service.hostname, the proxy accepts direct connections (no CDN in between). Client IP is extracted from RemoteAddr instead of X-Forwarded-For. This is typical because CDNs do not support HTTP CONNECT.Hot-reloadable: token_ttl, token_refresh_interval, geo/time restrictions, PAC settings,
rate_limit_per_user, bandwidth_limit_per_user, buffer_size, idle_timeout, max_connection_duration.Cold (restart required): enabled, port, hostname, enable_tcp, enable_udp,
udp_proxy_path, preserve_client_port.Troubleshooting
Common symptoms and diagnostic steps:
CONNECT requests returning 421 Misdirected Request:
- Client is sending CONNECT to the main service port instead of the proxy port - The forward proxy middleware rejects CONNECT on the main port by design - Verify client is configured to use forward_proxy.port (or public_port) - Check error message for the correct proxy hostname:port407 Proxy Authentication Required:
- Missing Proxy-Authorization header on CONNECT request - Token format not recognized (must be "Bearer <token>" or "Basic <base64>") - For Chrome extension: username must be "_bearer_" in Basic auth format - Token exceeds max length (8192 bytes) — check token generation - Verify token is being refreshed before expiry: check /proxy/config response403 Forbidden on CONNECT:
- ACL denied: user's groups do not match firewall rules for the target - Check: 'forwardproxy check <user> <target>' for ACL evaluation - Check: 'forwardproxy targets <user>' to see allowed destinations - Check: 'firewall check <user>' for firewall rule details - Geo-IP denial: 'geo lookup <client_ip>' and 'geo check <client_ip>' - Time-based denial: verify time_timezone and time_allow_hours in config429 Too Many Requests:
- Per-user rate limit exceeded: check rate_limit_per_user setting - Per-user bandwidth limit exceeded: check bandwidth_limit_per_user - Retry-After header in response indicates when to retry - Monitor: 'forwardproxy metrics' for per-user rate limit stats - Consider increasing limits for legitimate high-volume users502 Bad Gateway on CONNECT:
- DNS resolution failed: 'dns test <target_hostname>' - Backend unreachable: 'net tcp <target_host:port>' - Connect timeout too short: check forward_proxy.connect_timeout - All resolved IPs failed (tries IPv4 first, then IPv6) - DNS module failure with system DNS fallback also failingConnection drops or timeouts during tunnel:
- Idle timeout: no data flowing for forward_proxy.idle_timeout (default 5m) - Max duration exceeded: forward_proxy.max_connection_duration hard limit - Check relay buffer_size: default 32KB, increase for high-throughput tunnels - HTTP/2 full duplex not supported by server: check error logs for full duplex support errors - Intermediate firewall blocking long-lived connections or UDP (QUIC)PAC file returns DIRECT for all traffic:
- PAC endpoint requires authentication; verify session cookie is sent - Check forward_proxy.pac.enabled = true - Check use_firewall_targets = true and user has firewall rules - Unauthenticated PAC intentionally returns DIRECT-only (security by design) - Inspect PAC: curl -b session=<cookie> https://host/proxy.pac/proxy/config returns 401 or 403:
- 401: session cookie missing or expired; trigger re-login via /proxy/setup - 403: user not in required group (forward_proxy.pac.group) - Verify group membership: 'directory user <username>'Extension not refreshing token:
- Verify token_refresh_interval < token_ttl in config - Check /proxy/config endpoint accessibility from extension - Look for clock skew between client and server (server_time in response) - Monitor: 'forwardproxy metrics' for token generation countsCONNECT-UDP/MASQUE failures:
- QUIC port (UDP) blocked by intermediate firewall - forward_proxy.enable_udp = false in config - URI template mismatch: check udp_proxy_path setting - MASQUE parse error: malformed CONNECT-UDP request - Verify: 'net tcp <proxy_hostname:port> --tls' for TLS connectivityGeo/time restriction inconsistencies:
- Forward proxy has its own geo/time config that overrides [service] settings - Check both forward_proxy.geo_enabled and service.geo_enabled - Restrictions on /proxy/config and CONNECT may behave differently - CONNECT restrictions fail-open if the cluster is not readyMetrics and monitoring:
- 'forwardproxy metrics' — cluster-wide connection counts and byte totals - 'forwardproxy metrics <user>' — per-user breakdown - Bytes sent/recv recorded per TCP connection; UDP records duration and success only (MASQUE library limitation)Relationships
Dependencies and interactions:
- Forward proxy module: All authentication, ACL, rate limiting, PAC generation, metrics, and restriction checks handled cluster-wide.
- DNS: Hostname resolution for CONNECT targets. Falls back to system DNS if the DNS module is unavailable. IPv4 preferred over IPv6 in resolution order.
- Firewall: ACL rules determine which groups can access which destination host:port. Firewall rules also drive PAC file generation (use_firewall_targets).
- Directory: User disabled status checked during authentication. Group membership resolved server-side from the directory memory index during ACL evaluation (not embedded in the bearer token).
- Geo/Time access: Location and time-based access checks on both /proxy/config endpoint and CONNECT requests. Forward proxy can override [service] geo/time settings with its own configuration.
- Sessions: Session cookies used for /proxy/config, /proxy/setup, and /proxy.pac. Browser extension first authenticates via session, then receives a bearer token for subsequent CONNECT requests.
- Reverse proxy: Complementary service — reverse proxy handles inbound traffic to backends, forward proxy handles outbound traffic from users. Both share the same TLS listener and session subsystem.
Logs
Log entries by component. Search with: logs search “forwardproxy” Levels: ERROR > WARN > INFO > DEBUG. DEBUG requires log level configuration.
Lifecycle & Middleware:
forwardproxy.service.init INFO Forward proxy service disabled in config forwardproxy.service.init INFO Forward proxy service initialized forwardproxy.middleware INFO Forward proxy disabled, passing CONNECT to next handler forwardproxy.middleware WARN CONNECT request rejected on main service portPAC & Config Endpoints:
forwardproxy.pac DEBUG Generating PAC file for authenticated user forwardproxy.pac ERROR Failed to generate PAC forwardproxy.config DEBUG Generating proxy config for extension forwardproxy.config WARN Access blocked by restriction forwardproxy.config ERROR Failed to generate PAC forwardproxy.config ERROR Failed to generate proxy token forwardproxy.config INFO Proxy config generated successfully forwardproxy.setup INFO Proxy setup authorizedRestrictions:
forwardproxy.restrictions ERROR Failed to call restrictions checkSSRF Protection:
forwardproxy.ssrf WARN AUDIT blocked non-routable IP from DNS resolution forwardproxy.ssrf WARN AUDIT all resolved IPs are non-routable — request blockedDNS & Connectivity:
forwardproxy.dns DEBUG Resolving hostname via DNS module forwardproxy.dns DEBUG DNS resolution successful forwardproxy.dns DEBUG Using system DNS resolver forwardproxy.dns DEBUG Successfully connected to backend forwardproxy.dns WARN DNS module failure - falling back to system DNS forwardproxy.dns WARN DNS resolution timeout - falling back to system DNS forwardproxy.dns WARN DNS module returned error - falling back to system DNS forwardproxy.dns WARN Failed to connect to IP, trying next forwardproxy.connector DEBUG Dialing via connector site forwardproxy.connector DEBUG Connected via connector siteTCP CONNECT Authentication:
forwardproxy.tcp.auth INFO AUDIT Missing or invalid Proxy-Authorization header forwardproxy.tcp.auth INFO AUDIT Token too long forwardproxy.tcp.auth INFO AUDIT Authentication failedTCP CONNECT ACL & Rate Limiting:
forwardproxy.tcp.acl WARN AUDIT ACL denied forwardproxy.tcp.ratelimit ERROR Rate limit service unavailable forwardproxy.tcp.ratelimit ERROR Rate limit check failed forwardproxy.tcp.ratelimit WARN AUDIT Rate limit exceededTCP CONNECT Connection:
forwardproxy.tcp.connect INFO Proxy connection established forwardproxy.tcp.dial ERROR Failed to connect to backend forwardproxy.tcp.http2 DEBUG Using HTTP/2+ full duplex CONNECT stream forwardproxy.tcp.http2 ERROR Failed to enable full duplex mode forwardproxy.tcp.http2 ERROR Failed to flush response forwardproxy.tcp.hijack ERROR ResponseWriter does not support hijacking forwardproxy.tcp.hijack ERROR Failed to hijack connection forwardproxy.tcp.error ERROR Request validation or service errors (dynamic message)HTTP Proxy Authentication:
forwardproxy.http.auth INFO AUDIT Missing or invalid Proxy-Authorization header forwardproxy.http.auth INFO AUDIT Token too long forwardproxy.http.auth INFO AUDIT Authentication failedHTTP Proxy ACL & Rate Limiting:
forwardproxy.http.acl WARN AUDIT ACL denied forwardproxy.http.ratelimit ERROR Rate limit service unavailable forwardproxy.http.ratelimit ERROR Rate limit check failed forwardproxy.http.ratelimit WARN AUDIT Rate limit exceededHTTP Proxy Forwarding:
forwardproxy.http.forward INFO HTTP proxy request forwarded forwardproxy.http.forward ERROR Failed to forward request forwardproxy.http.copy DEBUG Response body copy error forwardproxy.http.error ERROR Request validation or service errors (dynamic message)UDP/MASQUE Authentication:
forwardproxy.udp.auth INFO AUDIT Missing or invalid Proxy-Authorization header forwardproxy.udp.auth INFO AUDIT Token too long forwardproxy.udp.auth INFO AUDIT Authentication failedUDP/MASQUE ACL & Rate Limiting:
forwardproxy.udp.acl WARN AUDIT ACL denied forwardproxy.udp.ratelimit ERROR Rate limit service unavailable forwardproxy.udp.ratelimit ERROR Rate limit check failed forwardproxy.udp.ratelimit WARN Rate limit exceededUDP/MASQUE Connection & Session:
forwardproxy.udp.parse WARN Failed to parse CONNECT-UDP request forwardproxy.udp.parse WARN Invalid CONNECT-UDP request forwardproxy.udp.parse WARN Invalid target hostname forwardproxy.udp.connect INFO UDP proxy session authorized forwardproxy.udp.ssrf WARN AUDIT SSRF blocked: UDP target resolves to non-routable IP forwardproxy.udp.dial WARN Failed to dial UDP IP, trying next forwardproxy.udp.dial ERROR All UDP dial attempts failed forwardproxy.udp.proxy ERROR UDP proxy error forwardproxy.udp.complete INFO UDP proxy session completed forwardproxy.udp.error ERROR Request validation or service errors (dynamic message)Shared (TCP, HTTP, UDP):
forwardproxy.ratelimit.status DEBUG Rate limit check passedMetrics
No Prometheus metrics emitted directly by this service layer. Metrics are recorded by the forward proxy infrastructure module after each connection. Query with: metrics prometheus forwardproxy_<name>
Forward Proxy Engine
Authentication, ACL evaluation, rate limiting, and PAC generation engine for the forward proxy
Overview
The forward proxy module provides browser-native access to backend services using the MASQUE protocol (RFC 9298) over QUIC. It enables authenticated, policy-controlled tunneling of TCP and UDP traffic through the Hexon gateway without requiring any client software.
Core capabilities:
- Bearer token authentication using HMAC-SHA256 signed tokens with configurable TTL
- Firewall ACL integration for group-based destination access control
- Per-user rate limiting (requests/sec) and bandwidth limiting (bytes/sec)
- PAC (Proxy Auto-Configuration) file generation for browser proxy setup
- JA4/JA4Q fingerprint binding for session-based authentication
- Geo-IP and time-based access restrictions (fail-closed)
- Active connection tracking with per-user and per-target metrics
- DNS resolution via the DNS module (prevents DNS poisoning)
- Separate proxy hostname and TLS certificate support for CDN bypass
- Token refresh mechanism for long-lived browser sessions
Transport security model:
The PAC file returns "HTTPS host:port", so the browser always connects to the proxy over TLS. The forward proxy listener only speaks TLS. HTTPS target (e.g. https://example.com): Browser --TLS--> Proxy --TLS--> Target CONNECT tunnel (end-to-end encrypted) + token (raw bytes, no proxy headers) Plain HTTP target (e.g. http://ifconfig.io): Browser --TLS--> Proxy --plain--> Target GET http://... (content visible on last hop) + token (token STRIPPED before forwarding) The bearer token only travels on the encrypted browser-to-proxy leg. Hop-by-hop headers (Proxy-Authorization, Connection, etc.) are removed before forwarding. The token never reaches the target server.Authentication flow (bearer token):
1. User logs in via any method, receives session cookie 2. Browser extension fetches /proxy/config with session cookie 3. Service generates HMAC-SHA256 signed token with user/groups/expiry 4. Extension sends Proxy-Authorization: Bearer <token> on CONNECT 5. Token validated locally (no round-trip for validation) 6. User disabled status checked against directory 7. CheckAccess enforces firewall ACL rules 8. Connection established and traffic relayed 9. Extension periodically refreshes token via /proxy/configConfig
Core configuration under [forward_proxy] section in hexon.toml:
[forward_proxy] enabled = true # Enable forward proxy module port = 8443 # Dedicated proxy port (must differ from service.port) public_port = 8443 # External port for PAC URLs (for NAT/LB scenarios) preserve_client_port = true # Use client's port in Alt-Svc header hostname = "proxy.example.com" # Separate hostname for CDN bypass (optional) fingerprint_binding = true # Enable JA4/JA4Q fingerprint-to-session binding fingerprint_binding_ttl = "8h" # Fingerprint binding TTL (match session TTL) rate_limit_per_user = 1000 # Max requests per second per user bandwidth_limit_per_user = "100mbps" # Max bandwidth per user # Token settings token_ttl = "5m" # Token validity duration (default: 5m) token_refresh_interval = "60s" # Extension refresh interval (default: 60s) # Probe resistance — hide proxy presence from unauthenticated probes probe_resistance_mode = "off" # off | fingerprint | ip | secret_host probe_resistance_decoy = "404" # 404 (Not Found) | empty (204 No Content) probe_resistance_ttl = "" # cache TTL; defaults to fingerprint_binding_ttl probe_resistance_secret_host = "" # required when mode=secret_host # TLS certificate for the proxy hostname (optional) # Only needed when hostname differs from service.hostname # Value can be a file path or inline PEM content # If not set, uses ACME (add hostname to acme.additional_domains) or service cert cert = "/path/to/cert.pem" key = "/path/to/key.pem" # Geo-IP restrictions (optional, falls back to [service] if not set) geo_enabled = true # Enable geo-IP restrictions geo_allow_countries = ["US", "CA"] # Allowed country codes (ISO 3166-1 alpha-2) geo_deny_countries = [] # Denied country codes geo_bypass_cidr = ["10.0.0.0/8"] # CIDR ranges that bypass geo checks geo_deny_code = 403 # HTTP status code for geo-denied requests geo_deny_message = "Access denied from your location" # Time-based restrictions (optional, falls back to [service] if not set) time_enabled = true # Enable time-based restrictions time_timezone = "America/New_York" # Timezone for time checks time_allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"] time_allow_hours = "09:00-18:00" # Allowed hours range time_deny_code = 403 # HTTP status code for time-denied requests time_deny_message = "Access not permitted at this time"# PAC file configuration[forward_proxy.pac] enabled = true # Enable PAC endpoint path = "/proxy.pac" # PAC file URL path cache_ttl = "15m" # PAC response cache TTL use_firewall_targets = true # Derive PAC targets from firewall rulesPAC authentication requirement: unauthenticated requests receive a minimal PAC that routes all traffic directly. Authenticated users get a PAC with targets derived from their firewall rules.
Hot-reloadable: rate_limit_per_user, bandwidth_limit_per_user, geo/time restrictions, PAC settings, token_ttl, token_refresh_interval, probe_resistance_* (mode and TTL apply on next request). Cold (restart required): enabled, port, hostname, fingerprint_binding.
Security
Security layers and hardening measures:
Bearer token security:
Tokens signed with HMAC-SHA256 using the cluster-wide secret key. Short TTL (default 5 minutes) limits exposure window for stolen tokens. Token contains user ID, groups, and expiry; validated locally without round-trip for minimal latency. Tokens are not stored server-side (stateless validation via signature). Token transport is always encrypted: the browser-to-proxy connection is TLS (PAC returns "HTTPS"), and the token is stripped (hop-by-hop header) before forwarding to the target. Even for plain HTTP targets, the token never leaves the TLS tunnel.Fingerprint binding:
JA4/JA4Q TLS fingerprint bound to session via BindFingerprint operation. Prevents token replay from a different client/browser. Binding has its own TTL that should match the session TTL for consistent expiry. The same binding doubles as the signal for probe_resistance_mode=fingerprint.Probe resistance (probe_resistance_mode):
Without this gate, every CONNECT to the proxy receives a 407 with "Basic realm=Hexon Proxy", and the IAP-port middleware leaks the dedicated proxy port in plaintext via 421 — both fingerprintable. off — legacy 407-on-everything (default). fingerprint — 407 only when the request's JA4Q TLS fingerprint is currently bound (BindFingerprint cache, populated on sign-in). Recommended for browser-extension deployments; the binding is auto-populated whenever a user signs in. ip — 407 only when the request's source IP authenticated within probe_resistance_ttl. Survives JA4Q drift but leaks the 407 to every client behind a shared egress once one user has signed in. Best for office/VPN-fronted access. secret_host — 407 only when r.Host equals probe_resistance_secret_host. Best for non-extension manual proxy configuration where the secret host is distributed out-of-band. In all non-off modes the IAP-port middleware mirrors the same gate so a probe diffing responses across listeners cannot fingerprint either. Metrics: forwardproxy_probe_decisions_total{mode, decision, path} mode configured mode at decision time decision "challenge" (407 emitted) or "decoy" (decoy served) path "tcp" / "http" / "udp" / "iap_middleware"Access control (multi-layer):
1. Bearer token authentication (identity verification) 2. User disabled check via directory.IsUserDisabled (account status) 3. Firewall ACL via CheckAccess (group-based destination control) 4. Rate limiting per user (abuse prevention) 5. Bandwidth limiting per user (network saturation prevention) 6. Geo-IP restrictions (location-based access, fail-closed) 7. Time-based restrictions (schedule-based access, fail-closed) 8. DNS resolution via the DNS module (prevents DNS poisoning)Geo-IP and time restrictions:
Both use fail-closed semantics: if the check cannot be performed (e.g., GeoIP database unavailable), access is denied. Forward proxy has its own geo/time config that overrides [service] defaults, allowing different policies for proxy vs. web access.PAC file security:
PAC endpoint requires authentication to return proxy-routed targets. Unauthenticated PAC returns DIRECT-only routing (no information leak). Username embedded in PAC for browser extension display only.Rate and bandwidth limiting:
Per-user rate limiting prevents connection flooding. Per-user bandwidth limiting prevents single-user network saturation. Both return RetryAfter hints for well-behaved clients.Troubleshooting
Common symptoms and diagnostic steps:
User cannot connect through forward proxy:
- Verify forward_proxy.enabled = true and port is correct - Check bearer token: token_ttl may have expired, verify refresh is working - Check user disabled status: directory user <username> - Verify firewall rules allow the target: forwardproxy check <user> <target> - Check geo restrictions: geo lookup <client_ip> and geo check <client_ip> - Check time restrictions: ensure current time is within allowed window - DNS resolution: verify target hostname resolves via dns test <hostname>PAC file returns DIRECT for all traffic:
- PAC requires authentication; check session cookie is being sent - Verify forward_proxy.pac.enabled = true - Check use_firewall_targets = true and firewall rules exist for the user - Inspect PAC content: curl -b session=<cookie> https://host/proxy.pacToken refresh failing (extension shows expired):
- Check token_refresh_interval is shorter than token_ttl - Verify /proxy/config endpoint is accessible with session cookie - Check for clock skew between client and server - Monitor token generation metrics via forwardproxy metricsRate limited (429 responses):
- Check rate_limit_per_user setting (requests/sec) - Check bandwidth_limit_per_user setting - Monitor per-user metrics: forwardproxy metrics <username> - RetryAfter header indicates when to retryFingerprint binding failures:
- Verify fingerprint_binding = true in config - Check fingerprint_binding_ttl matches session TTL - JA4 fingerprint changes between requests indicate client switching - Browser updates can change JA4 fingerprint (rebind needed)Connection drops or timeouts:
- Check backend connectivity: net tcp <target_host:port> - Check QUIC port (UDP) is not blocked by intermediate firewalls - Verify TLS certificate: net tls <proxy_hostname:port> - Check active connections: forwardproxy metrics to see connection countsGeo-IP or time-based denial (403/451):
- Geo denial: geo lookup <ip> shows country, geo check <ip> shows policy - Time denial: verify time_timezone is correct, check time_allow_hours - Bypass CIDR: add client network to geo_bypass_cidr for exemption - Forward proxy geo/time overrides [service] config if setMetrics and monitoring:
- Active connections: forwardproxy metrics (cluster-wide) - Per-user breakdown: forwardproxy metrics <username> - Connection success/failure rates tracked via RecordMetrics - Bytes sent/received per user for bandwidth accountingRelationships
Module dependencies and interactions:
- Firewall: ACL rule evaluation determines which destinations each user group can reach. Firewall rules also drive PAC file generation when use_firewall_targets is enabled.
- Directory: User disabled check on every authentication call. Group membership embedded in token for ACL evaluation.
- Forward proxy service: Service layer handles HTTP CONNECT (TCP tunneling), CONNECT-UDP (UDP tunneling), and absolute-form HTTP requests (plain HTTP forwarding), plus HTTP endpoints (/proxy/config, /proxy/setup, /proxy.pac). Service calls this engine for auth, ACL, metrics.
- DNS: Hostname resolution for target destinations, with system DNS fallback.
- Rate limiting: Per-user request throttling and bandwidth controls.
- Geo-IP: Location-based access restrictions. Forward proxy can override [service] geo config with its own settings.
- Sessions: Session cookie used for initial token generation. Fingerprint binding ties proxy session to TLS fingerprint.
- Configuration: Hot-reload of rate limits, bandwidth limits, geo/time restrictions, PAC settings. Token TTL changes apply to new tokens only.
- Telemetry: Structured logging for authentication, ACL decisions, rate limit events. Metrics for active connections, bytes transferred, token generation.
- Auto TLS: ACME certificate for proxy hostname when using a separate hostname (add to acme.additional_domains).
Logs
Log entries by component. Search with: logs search “forwardproxy” Levels: ERROR > WARN > INFO > DEBUG > TRACE.
Initialize:
forwardproxy.init INFO Forward proxy disabled in config forwardproxy.init ERROR Failed to initialize forward proxy forwardproxy.init INFO Initializing forward proxy moduleAccess Control:
forwardproxy.checkaccess ERROR Failed to resolve user groups forwardproxy.checkaccess ERROR Failed to call firewall.CheckProxyAccess forwardproxy.checkaccess ERROR Invalid response type from firewallAllowed Targets:
forwardproxy.getallowedtargets ERROR Failed to resolve user groups forwardproxy.getallowedtargets ERROR Failed to call firewall.GetAllowedTargets forwardproxy.getallowedtargets ERROR Invalid response type from firewallPAC Generation:
forwardproxy.generatepac WARN PAC requested without authentication forwardproxy.generatepac DEBUG Generated PAC fileAuthentication:
forwardproxy.auth WARN Token validation failed forwardproxy.auth WARN User account is disabled forwardproxy.auth INFO AUDIT Token authentication successful forwardproxy.auth DEBUG Invalidated fingerprint bindingToken Generation:
forwardproxy.token ERROR Failed to generate token forwardproxy.token DEBUG Generated proxy tokenFingerprint Binding:
forwardproxy.bind WARN Failed to broadcast fingerprint binding forwardproxy.bind WARN Failed to achieve quorum for fingerprint binding forwardproxy.bind INFO Fingerprint bound to sessionRate Limiting:
forwardproxy.ratelimit WARN Rate limit check called without UserID forwardproxy.ratelimit WARN User rate limit exceeded forwardproxy.ratelimit WARN Destination rate limit exceeded forwardproxy.ratelimit WARN User bandwidth limit exceededRate Limit Cleanup:
forwardproxy.cleanup DEBUG Cleaned up stale rate limit entriesGeo Restrictions:
forwardproxy.restrictions.geo ERROR Geo check failed - denying access (fail-closed) forwardproxy.restrictions.geo ERROR Geo check wait failed - denying access (fail-closed) forwardproxy.restrictions.geo ERROR Invalid geo check response type - denying access (fail-closed) forwardproxy.restrictions.geo INFO Access blocked by geo restrictionTime Restrictions:
forwardproxy.restrictions.time ERROR Time check failed - denying access (fail-closed) forwardproxy.restrictions.time ERROR Time check wait failed - denying access (fail-closed) forwardproxy.restrictions.time ERROR Invalid time check response type - denying access (fail-closed) forwardproxy.restrictions.time INFO Access blocked by time restrictionMetrics
Prometheus metrics. Query with: metrics prometheus forwardproxy_<name>
Connection Metrics (namespace: forwardproxy):
forwardproxy_connections_total counter {protocol, user_id} Proxy connections recorded forwardproxy_bytes_sent_total counter {protocol, user_id} Bytes sent through proxy forwardproxy_bytes_received_total counter {protocol, user_id} Bytes received through proxy forwardproxy_connection_duration latency {protocol, user_id} Connection duration forwardproxy_errors_total counter {protocol, error} Failed proxy connections forwardproxy_active_connections gauge {} Currently active proxy connectionsNetwork Listener
Manages all network connections — TLS termination, client fingerprinting, HTTP middleware chain, and protocol detection
Overview
Manages all incoming network connections — TLS termination, protocol detection, client fingerprinting, and the HTTP middleware chain. Every request to the gateway passes through the listener before reaching any service or proxy route. Supports TCP, TLS, HTTP/1.1, HTTP/2, HTTP/3 (QUIC), UDP, and gRPC.
Client fingerprinting combines three layers into a composite hash:
JA4 (TLS) — cipher and extension hash, extracted during TLS handshake HTTP/2 — SETTINGS frame parameters and pseudo-header ordering TCP/IP Stack — window size, MSS, TTL for OS identification Composite — SHA256(ja4|http2|tcp) truncated to 32 hex chars JA4Q (QUIC) — QUIC transport parameter fingerprint for HTTP/3 clientsUsed for rate limiting, session affinity, and client identification — resistant to IP spoofing and NAT. Fingerprint data is stored in a unified structure across all protocols (HTTP/1.1, HTTP/2, HTTP/3).
HTTP middleware chain (applied in order): security headers, geo restriction, time restriction, rate limiting, size limiting, proof-of-work, WAF. Each layer runs independently.
Additional capabilities:
- Deployment behind CDN/load balancer with header-based client identification (proxy mode)
- Per-SNI mTLS with dynamic CA rotation
- HXEP (Hexon Edge Protocol) for real client IP through edge proxies and SNAT
- Correlation ID propagation for end-to-end distributed tracing
- Malformed TLS blocking to reject invalid ClientHello messages
- Graceful shutdown with configurable connection draining timeout
Config
Core configuration under [service] in config TOML:
[service] hostname = "auth.example.com" # Service hostname tls_cert = "/path/to/cert.pem" # TLS certificate path tls_key = "/path/to/key.pem" # TLS private key path handshake_timeout = 10 # TLS handshake timeout in seconds (default: 10) block_malformed_tls = true # Reject invalid TLS ClientHello (default: true) max_header_bytes = 65536 # Max ClientHello size in bytes (default: 64KB) disable_server_header = false # Suppress HexonGateway/<version> header (default: false) correlation_id_header = "X-Hexon-ID" # Correlation ID header name (default: "X-Hexon-ID") cookie_name = "hexon" # Session cookie name (default: "hexon") # Mutual TLS mtls_mode = "none" # "none", "optional", "mandatory" (default: "none") # HTTP/2 settings http2_enable = true # Enable HTTP/2 (default: true) http2_maxstreams = 1000 # Max concurrent streams per connection http2_maxframesize = 1048576 # Max frame payload size (default: 1MB) http2_idletimeout = 120 # Idle timeout in seconds http2_keepalive = true # Enable HTTP/2 keepalive http2_keepaliveseconds = 30 # Keepalive interval in seconds # Fingerprint cache fingerprint_max_entries = 10000 # Max entries in addr fingerprint map (default: 10000) fingerprint_ttl_seconds = 300 # Base TTL in seconds (default: 5 min) fingerprint_cleanup_seconds = 30 # Cleanup sweep interval (default: 30s) fingerprint_max_entries_per_ip = 10 # Max fingerprints per IP, anti-abuse (default: 10) # JA4 parsing security limits ja4_max_extensions = 200 # Max TLS extensions to parse (default: 200, typical: 10-30) ja4_max_sigalgs = 100 # Max signature algorithms to parse (default: 100) # HTTP/2 fingerprint cache http2_fingerprint_cache_size = 10000 # Max entries (default: 10000) http2_fingerprint_cache_evict_pct = 10 # % of oldest entries to evict when full (1-50) # QUIC fingerprint reassembly quic_fingerprint_reassembly_max_packets = 10 # Max packets for reassembly (default: 10) quic_fingerprint_reassembly_max_bytes = 15360 # Max reassembly buffer (default: 15KB) quic_fingerprint_reassembly_timeout_s = 5 # State timeout (default: 5s) quic_max_crypto_frame_offset = 65536 # Max CRYPTO frame offset (default: 64KB) # Proxy mode (behind CDN/LB) proxy = false # Enable proxy mode (default: false) proxy_cidr = ["10.0.0.0/8"] # Trusted proxy IPs (REQUIRED when proxy=true) proxy_header_clientip = "X-Forwarded-For" # Real client IP header (REQUIRED when proxy=true) proxy_header_clientcert = "SSL_CLIENT_CERT" # Client certificate header (optional) proxy_header_clientfingerprint = "CF-Ray" # Client fingerprint header (optional) proxy_header_traceid = "X-Request-ID" # Trace ID header for distributed tracing (optional) # Geo restriction (router-level middleware) geo_enabled = false # Enable geo restrictions (default: false) geo_database = "GeoLite2-Country.mmdb" geo_asn_database = "GeoLite2-ASN.mmdb" geo_allow_countries = [] # ISO 3166-1 alpha-2 codes (empty = all) geo_deny_countries = [] # Deny takes precedence over allow geo_allow_asn = [] # ASN allow list geo_deny_asn = [] # ASN deny list geo_bypass_cidr = [] # CIDRs that skip geo checks geo_deny_code = 403 # HTTP status for denials geo_deny_message = "" # Custom denial message # Time restriction (router-level middleware) time_enabled = false # Enable time restrictions (default: false) time_bypass_cidr = [] # CIDRs that skip time checks time_default_timezone = "UTC" # Default timezone (IANA format)[protection] rate_limit = "100/1m" # Requests per interval (empty = disabled) rate_limit_type = "fingerprint" # "fingerprint" or "ip" (default: "ip") rate_limit_bantime = "5m" # Ban duration when limit exceededFingerprint adaptive TTL (based on cache utilization):
Normal (<60%): base TTL (default 5 min) Medium (60-80%): base TTL / 2 (min 2 min) High (>80%): base TTL / 5 (min 1 min) LRU eviction triggers when TTL cleanup is insufficient. # HXEP (Hexon Edge Protocol) hexon_edge_protocol = false # Enable HXEP header parsing (default: false) hexon_edge_cidr = [ # Trusted CIDRs for HXEP (default: trust all) "10.244.0.0/16", # Kubernetes pod network ]HXEP (Hexon Edge Protocol) — real client IP through edge proxies:
When traffic flows: External Client → Edge Proxy → Gateway (via k8s Service/LB), the edge proxy prepends a binary header with the original client IP and port. Format: Magic "HXEP" (4B) + Type (1B: 0x04=IPv4, 0x06=IPv6) + IP (4/16B) + Port (2B) Required for: geo-IP accuracy, rate limiting, IDS, and RADIUS NAS identification when the gateway sits behind an edge proxy or Kubernetes service with SNAT. Config: - service.hexon_edge_protocol = true → enables HXEP parsing on all listeners - service.hexon_edge_cidr = [...] → only these source CIDRs are trusted for HXEP Default: ["0.0.0.0/0", "::/0"] (trust all) — restrict to pod CIDR in production - Packets from untrusted CIDRs: HXEP header stripped, socket address used - Set automatically via Helm when edge.enabled=true Protocols: TCP (parsed on first read, before TLS handshake), UDP (PacketConn wrapper), HTTP/3 QUIC (HXEP wrapping applied transparently, GSO/ECN/GRO OOB data preserved). Used by: reverse proxy, RADIUS (RADSEC + UDP), SSH bastion, QUIC connector, QUIC client access.Hot-reloadable: TLS certificates, mTLS CA pool, proxy mappings, geo/time rules, rate limit settings, fingerprint cache limits. Cold (restart required): listen addresses, HTTP/2 enable, proxy mode toggle, HXEP settings.
Troubleshooting
Common symptoms and diagnostic steps:
TLS handshake failures:
- Malformed ClientHello blocked: check 'logs search "Malformed TLS"' for details - block_malformed_tls=true rejects missing SNI, invalid TLS version, oversized ClientHello - ClientHello too large: check max_header_bytes setting (default 64KB) - TLS version rejected: only 0x0301-0x0304 (TLS 1.0-1.3) accepted - mTLS certificate popup on proxy routes: check per-SNI mTLS config, set mtls=false on mapping - CA rotation issues: 'certs list' to verify CA bundle, check 'logs search "CA rotation"' - Start with: 'diagnose domain <hostname>' for cross-subsystem checkFingerprint cache exhaustion:
- High memory from fingerprint storage: check fingerprint_max_entries setting - Adaptive TTL kicking in too aggressively: increase fingerprint_ttl_seconds - Per-IP abuse: 'logs search "fingerprint limit exceeded"' to identify attackers - fingerprint_max_entries_per_ip controls anti-abuse threshold (default: 10) - LRU eviction warnings: 'logs search "evict"' to monitor cache pressure - Check: 'metrics prometheus fingerprint' for cache utilization metricsSession affinity not working:
- Verify cluster_affinity=true in global config - Loopback connections (127.0.0.1, ::1) bypass affinity by design - Circuit breaker open for target node: 'proxy circuits' to check breaker states - No TLS = no fingerprint = no affinity: ensure clients connect via HTTPS - Check: 'cluster status' for node health, 'health components' for listener statusProxy mode issues (behind CDN/LB):
- 403 Forbidden: source IP not in proxy_cidr, check 'logs search "CIDR"' - 400 Bad Request: missing client IP header, verify proxy_header_clientip config - Rate limiting all users as one: JA4 unavailable in proxy mode, use proxy_header_clientfingerprint - Wrong client IP: X-Forwarded-For uses FIRST IP only (original client, not proxy chain) - Header injection: ensure proxy_cidr is restricted to actual proxy IPs - Distributed tracing broken: configure proxy_header_traceid for end-to-end correlation - mTLS through proxy: set proxy_header_clientcert and mtls_mode="optional" or "mandatory"QUIC/HTTP/3 fingerprint failures:
- Large ClientHello spanning packets: check quic_fingerprint_reassembly_max_packets - Reassembly timeout: increase quic_fingerprint_reassembly_timeout_s for slow networks - CRYPTO frame offset too large: quic_max_crypto_frame_offset default 64KB should suffice - Connection ID too long (>20 bytes): RFC 9000 violation, likely malicious trafficRate limiting misbehavior:
- All clients sharing one rate bucket: check rate_limit_type ("fingerprint" vs "ip") - Composite fingerprint unavailable: falls back to IP automatically - Per-route bypass not working: verify disable_rate_limit=true on the proxy mapping - Cluster-wide consistency: rate limits use distributed memory cache - Check: 'ratelimit stats' for current rate limiting state, 'metrics ratelimit' for countersHXEP (Hexon Edge Protocol) issues:
- HXEP not resolving real client IP: verify service.hexon_edge_protocol = true - Wrong client IP after HXEP: verify source IP falls within service.hexon_edge_cidr - "HXEP header stripped": source IP is outside trusted CIDRs — add pod/edge CIDR - Geo/rate limiting sees edge proxy IP instead of client: HXEP not enabled or CIDR mismatch - RADIUS NAS rejected after HXEP: real NAS IP doesn't match any [[radius.client]] CIDR - Default trust-all CIDRs in production: security risk — restrict to actual pod network CIDR - Config: 'config show service' and check hexon_edge_protocol + hexon_edge_cidr fields - Helm sets HXEP automatically when edge.enabled=true in values.yamlConnection metrics missing:
- Metrics batched (flush every 100ms or on close): short-lived connections may lag - Check: 'health components' for listener health status - 'metrics prometheus listener' for per-listener connection countersGeo/time restriction issues:
- Geo blocking wrong country: verify MaxMind database is current - Bypass CIDR not working: geo_bypass_cidr checked before country/ASN rules - Time window mismatch: verify IANA timezone spelling (e.g., "America/New_York") - Overnight ranges supported: "22:00-06:00" spans midnight correctly - Check: 'geo lookup <ip>' to verify classification, 'geo timecheck <ip>' for time rulesArchitecture
Connection lifecycle:
- Client connects to TCP socket
- First bytes peeked to detect TLS, extract JA4 fingerprint + SNI
- TCP fingerprint extracted (window size, TTL, MSS, options ordering)
- Session affinity check: fingerprint hash maps to a cluster node
- If affinity target is a remote node: forward connection to that node
- If local: proceed with TLS handshake (per-SNI mTLS selection)
- If HTTP/2: extract HTTP/2 fingerprint from SETTINGS frame
- Compute composite hash: SHA256(ja4|http2|tcp) truncated to 32 hex chars
- Assign correlation ID, begin connection tracking
- HTTP middleware chain: telemetry -> client identification -> connection info -> security headers -> geo restriction -> time restriction -> rate limit -> handler
- Handler processes request, correlation ID propagates as trace_id across modules
- Metrics flushed on connection close
Fingerprint extraction pipeline:
Accept-level (before TLS): JA4 from ClientHello peek (zero-copy, buffered I/O) TLS callback: per-SNI mTLS mode selection Post-handshake: HTTP/2 SETTINGS fingerprint from connection preface TCP layer: p0f-style OS fingerprint from socket options (window, MSS, TTL) QUIC path: JA4Q from Initial packet, transport params fingerprint, multi-packet reassemblyGSO/ECN/GRO preservation:
All UDP wrappers (HXEP edge protocol and JA4Q fingerprint) preserve kernel offload capabilities so that QUIC can use: - GSO (Generic Segmentation Offload): send 64KB in one syscall, kernel splits into MTU packets - GRO (Generic Receive Offload): kernel coalesces packets, fewer syscalls on receive - ECN (Explicit Congestion Notification): congestion signals via IP header bits Without these, QUIC silently falls back to one syscall per packet. This affects both HTTP/3 reverse proxy and QUIC connector listeners.Fingerprint memory protection:
Address fingerprint map: configurable max entries (default 10,000) with adaptive TTL Per-IP limit: configurable (default 10), oldest replaced on overflow LRU eviction: sorts by timestamp, evicts oldest when TTL cleanup insufficient HTTP/2 cache: configurable size with percentage-based LRU eviction (1-50%) All maps use lock-free concurrent reads for performanceProxy mode flow:
Step 1: Validate source IP against configured proxy_cidr Step 2: Extract trace ID from proxy header, update correlation context Step 3: Extract and sanitize client IP (first IP from comma-separated list) Step 4: Fingerprint priority: dedicated header > client cert hash > client IP Step 5: Update context with real client identifiers for downstream modulesmTLS CA rotation flow:
1. ACME CA rotates, triggers listener update 2. CA pool rebuilt atomically (config CA + ACME CA merged) 3. HTTPS listeners gracefully restarted 4. Existing connections drain gracefully, new connections get fresh CA poolGraceful shutdown sequence:
1. Stop accepting new connections on all listeners 2. Close all listener sockets 3. Wait for active connections up to configurable timeout 4. Cancel contexts for remaining connections 5. Force-close any connections still open after timeoutPerformance characteristics:
- Pooled slice allocations reduce GC pressure during fingerprint extraction - Buffered I/O to minimize syscalls - Metrics batched to reduce overhead (flush every 100ms) - TCP Fast Open: 15-30% latency reduction for repeat clients (Linux 3.7+, macOS) - TCP Window Scaling: 20-40% throughput improvement for large transfers - SO_REUSEPORT on Linux for load balancing across coresRelationships
Module dependencies and interactions:
- Proxy: Provides per-SNI mTLS lookup. Listener provides fingerprint and client IP context consumed by proxy for rate limiting, identity headers, and session affinity.
- Sessions: Listener middleware manages session cookie extraction. Session validation uses correlation IDs propagated through listener context.
- Certificates: TLS termination uses certificates from the cert module. Per-mapping certificates loaded via SNI callback. CA pool for mTLS verification rebuilt atomically on ACME CA rotation.
- WAF: WAF rules applied in middleware chain after listener accepts connection. Fingerprint available in context for WAF correlation.
- X.509 authentication: mTLS mode controls TLS client auth level. In proxy mode, client certificates injected from HTTP header. Certificate validation uses dynamic CA pool.
- Rate limiting: Middleware reads composite fingerprint or client IP from context. Composite fingerprint (JA4+HTTP/2+TCP) or IP-based, configurable per route.
- Geo restriction: Middleware at router level uses client IP from context with MaxMind GeoLite2 databases for country/ASN lookup.
- Time restriction: Middleware after geo restriction uses client country for timezone-aware time window matching.
- Cluster affinity: Fingerprint hash selects cluster node for session routing. Node health checked before forwarding. Forwarded connections use inter-node communication for transparent routing.
- DNS: Listener does not directly use DNS, but proxy backends resolved via DNS module.
- Distributed tracing: Correlation IDs generated at listener level propagate as trace_id through all operations, enabling end-to-end tracing across cluster nodes.
- Connection pool: Backend connection management operates downstream of listener. Listener handles inbound connections; connection pool handles outbound to backends.
Encrypted Client Hello (ECH)
Encrypted Client Hello (ECH) hides proxied app SNI behind the service hostname.
Without ECH, a network observer sees which app a user accesses via the plaintext SNI in the TLS ClientHello (e.g., “app.internal.com”). With ECH enabled, the observer only sees the gateway’s service hostname (e.g., “gateway.example.com”). The real hostname is encrypted inside the ClientHello using HPKE (X25519 + HKDF-SHA256 + AES-128-GCM).
Configuration:
[service] ech = true # Default: false (opt-in)How it works:
1. Gateway generates an HPKE key pair (X25519) and ECH config on startup 2. The ECH config is logged as base64 — publish it in a DNS HTTPS record 3. Clients that support ECH (Chrome 117+, Firefox 118+, Safari 17.4+) encrypt the real SNI in the ClientHello 4. The gateway decrypts the inner ClientHello using its HPKE private key 5. GetCertificate receives the decrypted (inner) SNI — certificate selection and proxy routing work unchanged 6. Non-ECH clients connect normally with plaintext SNI (graceful fallback)The ECH config must be published in a DNS HTTPS (SVCB type 65) record for clients to discover ECH support. The gateway logs the config as base64 at startup:
"ech_config_list_base64": "<base64>" — copy to your DNS HTTPS recordWhat doesn’t change with ECH:
- Certificate selection (GetCertificateForSNI receives inner SNI) - Proxy routing (uses HTTP Host header, not SNI) - JA4 fingerprinting (computed from outer ClientHello) - mTLS (client cert validation after ECH decryption)Limitations:
- CDN termination: If a CDN terminates TLS before the gateway, ECH at the gateway layer has no effect — the CDN already saw the SNI - HTTP/3 QUIC: Uses a different ECH mechanism (not covered by this feature) - DNS requirement: Without the HTTPS record, clients fall back to plaintext SNILogs
Log entries by component. Search with: logs search “listener” Levels: ERROR > WARN > INFO > DEBUG > TRACE. DEBUG/TRACE require log level configuration.
HTTP Errors:
listener.http.error DEBUG/WARN HTTP server errors (DEBUG for client TLS/connection failures, WARN otherwise)Proxy Mode:
listener.proxy_validation WARN Rejected connection not from trusted proxy listener.proxy_validation ERROR Client IP header missing in proxy mode listener.proxy_cert WARN Oversized cert header (DoS) / parse failed listener.proxy_cert DEBUG/INFO Client cert injected / invalid PEM blockCORS:
listener.cors WARN AUDIT CORS origin rejectedSessions:
listener.session DEBUG Session created / validated / expired listener.session ERROR/WARN Session creation/validation failuresProof-of-Work:
listener.pow INFO PoW challenge passed / application session valid / body restored listener.pow WARN Body too large / session validation failures / invalid body format listener.pow ERROR PoW handler not registered / body encryption failures listener.pow DEBUG Session checks, challenge served, body storedRate Limiting:
listener.ratelimit WARN AUDIT Request blocked by rate limit listener.ratelimit WARN Config fallback (invalid rate_limit_type) listener.ratelimit ERROR Ratelimit module call/response failures / no fingerprint listener.ratelimit DEBUG Fingerprint fallback to IP listener.ratelimit TRACE Per-entity rate limiting applied listener.ratelimit.status DEBUG Rate limit check passed listener.ratelimit.circuitbreaker ERROR Circuit breaker open — blocking requestSize Limiting:
listener.sizelimit WARN AUDIT Request blocked — size limit exceeded listener.sizelimit ERROR Sizelimit module call/response failures listener.sizelimit TRACE Size limit applied / exception / within limitCompression:
listener.compression DEBUG Response compressedGeo Restrictions:
listener.geo INFO AUDIT Request blocked by geo restriction listener.geo ERROR Geo check failed (allowing request)Time Restrictions:
listener.time INFO AUDIT Request blocked by time restriction listener.time ERROR Time check failed (allowing request)ECH (Encrypted Client Hello):
ech.generate INFO ECH key pair derived from cluster keyPoW Body Preservation:
pow.body DEBUG POST body stored / retrieved / deleted / restored pow.body WARN Body not found (expired) / cleanup failures pow.body ERROR Storage / retrieval / decryption failuresMetrics
Prometheus metrics. Query with: metrics prometheus listener_<name>
Lifecycle:
listener_starts counter {type, name} Listener startups listener_stops counter {type, name} Listener shutdowns listener_restarts counter {type, name} Listener restarts listener_errors counter {type, name} Listener errorsRate & Size Limiting:
listener_rate_limit_hits counter {reason} Requests blocked by rate limit listener_ratelimit_circuit_breaker_trips_total counter {} Circuit breaker trips listener_size_limit_hits counter {host, path} Size limit exceededTLS Security:
listener_connections_accepted counter {protocol} Successful TLS connections listener_security_non_tls_dropped counter {reason} Non-TLS connections rejected listener_security_malformed_tls counter {reason} Invalid TLS versions listener_security_oversized_record counter {reason} TLS records exceeding RFC limits listener_security_oversized_clienthello counter {reason} ClientHello too large listener_security_small_clienthello counter {reason} Suspiciously small ClientHello listener_security_malformed_clienthello counter {reason} Malformed ClientHello listener_security_no_sni counter {reason} TLS handshakes without SNIQUIC Affinity:
listener_quic_affinity_packets_received counter {} QUIC packets received listener_quic_affinity_packets_dropped counter {reason} QUIC packets dropped listener_quic_affinity_decryption_failures counter {} QUIC decryption failures listener_quic_affinity_packets_local counter {} QUIC packets processed locally listener_quic_affinity_packets_forwarded counter {target_node} QUIC packets forwarded to cluster listener_quic_affinity_forward_failures counter {target_node} Forward failures listener_quic_affinity_response_dropped counter {reason} QUIC response packets dropped listener_quic_affinity_cid_mappings gauge {} Active connection ID mappings listener_quic_connection_migrations counter {} QUIC connection migrationsQUIC Forwarding:
listener_quic_forward_connect_errors counter {target_node} Forwarding connect errors listener_quic_forward_write_errors counter {target_node} Forwarding write errors listener_quic_forward_bytes counter {target_node} Bytes forwardedHXEP (Edge Protocol):
hxep_parsed_trusted counter {} TCP HXEP parsed (trusted) hxep_stripped_untrusted counter {} TCP HXEP stripped (untrusted) hxep_parse_failed counter {} TCP HXEP parse failures hxep_partial_header counter {} TCP HXEP incomplete headers hxep_udp_parsed_trusted counter {} UDP HXEP parsed (trusted) hxep_udp_stripped_untrusted counter {} UDP HXEP stripped (untrusted) hxep_udp_parse_failed counter {} UDP HXEP parse failuresAlerts:
rate(listener_rate_limit_hits[5m]) > 50 High rate limiting (possible attack) listener_ratelimit_circuit_breaker_trips_total > 0 Circuit breaker tripped rate(listener_security_no_sni[5m]) > 10 SNI probing rate(hxep_stripped_untrusted[5m]) > 0 HXEP spoofing attempt rate(listener_quic_affinity_forward_failures[5m]) > 0 Cluster QUIC forwarding issues