Skip to main content

Observability architecture

The fluxrig observability strategy is built on the standard of professional audio engineering: Telemetry Tapping. Like the "Aux Sends" on a mixing console, telemetry is tapped directly from the execution path and diverted for monitoring, tracing, and auditing without ever impacting the performance or integrity of the primary transactional flow.

The zero-agent advantage

In contrast to widespread industry practices, we do not follow the model of running resource-heavy sidecars or agents alongside the business logic. Instead, fluxrig embeds high-performance observability directly into its core binaries.

  • Minimized Overhead: By eliminating external agents, system resources (CPU/RAM) are reserved exclusively for high-speed edge processingcritical for industrial IoT and secure gateway deployments.
  • Unified Transport: Telemetry, logs, and control signals are multiplexed over the existing Snake Tunnel (mTLS), simplifying firewall complexity and reducing network overhead.
  • W3C TraceContext: fluxrig natively implements the W3C TraceContext standard, allowing it to participate in distributed traces started by upstream load balancers or client applications.

Resource efficiency (Indicative ROI)

MetricIndustry Standard (Sidecar/Collector)fluxrig (Embedded Tap)Improvement
Idle Memory (RSS)250MB - 800MB< 25MB~90% Reduction
CPU (Idle)3% - 5%< 0.1%Negligible
Operational SurfaceMulti-process / SidecarSingle BinaryReduced Attack Surface

Technical signals (OpenTelemetry)

fluxrig achieves extreme visibility by generating three distinct signal types for every transaction, fully compliant with the OpenTelemetry (OTel) standard.

  1. Traces: Distributed spans following a request across the entire mesh (from Edge Rack to Central Mixer).
  2. Metrics: High-fidelity performance histograms (latency, throughput, error rates).
  3. Logs: Structured, context-rich events attached directly to the transaction trace span for surgical root-cause analysis.

Multi-dimensional correlation

To bridge the gap between business operations and technical troubleshooting, every event is correlated across three axes:

  • flux_id: The Business Context (The Transaction ID).
  • trace_id: The Operational Context (The OTel Trace ID).
  • machine_id: The Source Context (The specific Rack/Gear origin).

Telemetry tapping and edge sovereignty

The system is designed to maintain 100% auditability even during total network isolation.


Traffic control and the pressure chain

The Transactional Hot-Path (Business Logic) always takes absolute precedence over the Telemetry Path (Observability).

Lane prioritization (The Governor)

The Rack implements a multi-lane architecture to ensure telemetry never congests critical signal processing.

LaneContentPriorityMechanismUnder Extreme Congestion
Hot PathBusiness Logic (fluxMsg)P0NATS JetStreamGuaranteed.
Audit LaneTransaction LogsP1Local WALDelayed, Never Lost.
Pulse LaneMetrics & Debug SpansP2Buffer ManagementDropped if capacity exceeded.

The pressure chain (Fail-to-Local)

To protect the system during backend saturation or network isolation:

  1. Backend Saturation: If the central analytics sink becomes unreachable, the Mixer signals internal backpressure.
  2. Snake Congestion: The Snake Tunnel detects pending byte pressure and restricts ingestion.
  3. Rack Diversion: The Rack automatically diverts telemetry from the network path to the Local CBOR WAL.
  4. Resumption: Once the Pressure Chain clears, the Rack trickles the archived WAL data back to the Mixer using a rate-limited background worker, without impacting active transaction traffic.

Compliance and governance

Deterministic sanitization

Organizations can utilize deterministic masking to scrub sensitive information at the edge before data enters the persistent observability bus. This ensures that sensitive fields (like PANs) never reach the centralized telemetry backend, significantly reducing the audit scope of the central infrastructure.

CAUTION

Production Logging: Enabling DEBUG or TRACE log levels may output raw hex payloads to the log stream. In production environments, ensure these levels are restricted to verify compliance with institutional "No Storage" security requirements.