IMECC — Unicamp · Master's Research Project

Homeostasis Operating
System Agent

HOSA

A bio-inspired architecture for endogenous resilience in Linux systems. Multivariate anomaly detection via Mahalanobis Distance with kernel-space instrumentation and autonomous graduated response.

Author Fabricio Roney de Amorim
Whitepaper v2.1 — March 2026
Status Vision & Theoretical Foundation

This project presents HOSA, a software architecture for autonomous resilience of Linux operating systems. HOSA replaces the dominant model of exogenous telemetry — where anomaly detection and failure mitigation depend on external central servers — with a model of Endogenous Resilience, where each computational node possesses autonomous capacity for multivariate detection and real-time local mitigation, independent of network connectivity.

Anomaly detection is performed through multivariate statistical analysis based on the Mahalanobis Distance and its temporal rate of change, with signal collection via eBPF in the Linux Kernel Space. Mitigation is executed through deterministic manipulation of Cgroups v2 and XDP, implementing a graduated response system inspired by the human nervous system's reflex arc.

HOSA does not replace orchestrators or global monitoring systems. It complements them by operating in the temporal interval where these systems are structurally incapable of acting: the milliseconds between the onset of collapse and the arrival of the first metric at the external control plane.

Keywords: Endogenous Resilience · Autonomic Computing · eBPF · Multivariate Anomaly Detection · Mahalanobis Distance · Bio-Inspired Systems · Edge Computing · SRE

The Lethal Interval

The operational cycle of exogenous monitoring follows a discrete sequence with cumulative latency. The interval between the onset of lethal stress and the arrival of the first usable metric at the external monitoring system constitutes what this work terms the Lethal Interval — the window where systems collapse without the external observer having awareness of the problem.

Without HOSA

  • 1 Memory leak starts. Prometheus scrapes every 15s — sees nothing.
  • 2 OOM-Killer fires at t=40s. Payment service dies mid-transaction.
  • 3 CrashLoopBackOff. Customers receive 502 errors.
  • 4 Alert fires at t=100s — 60 seconds after the first crash.

With HOSA

  • 1 Memory leak starts. HOSA detects anomaly acceleration in 1 second.
  • 2 Applies memory.high containment at t=2s. No process killed.
  • 3 System degraded but alive. All transactions preserved.
  • 4 Operator receives webhook with full dimensional context at t=2s.
Figure 1 Timeline of a system collapse — Memory leak at 50MB/s. HOSA detects and contains the anomaly within 2 seconds; Prometheus requires ~100 seconds to generate an alert.
t=0s

Leak Starts

Memory leak begins in payment-service. Rate: ~50MB/s.

mem: 61% DM = 1.1 Level 0

Prometheus last scraped 8s ago. Next scrape in 7s. Data shows: "healthy."

t=1s

HOSA Detects

DETECTED — 1s

Mahalanobis Distance crosses vigilance threshold. Sampling rate increased from 100ms to 10ms.

DM = 2.8 dDM/dt = +1.6 Level 0→1

Prometheus: no scrape in this interval. Zero awareness.

t=2s

HOSA Contains

CONTAINED — 2s

Dominant contributor identified: /kubepods/pod-payment-service-7b4f. Action: memory.high reduced from 2G → 1.6G. Webhook dispatched.

DM = 4.7 d²DM/dt² = +0.5 Level 1→2

Prometheus: next scrape in 5s. Still showing stale data from t=−8s.

Lethal Interval: 2 seconds

HOSA detected and contained the anomaly before any external monitoring system could collect its first data point post-leak. In the counterfactual scenario, OOM-Kill occurs at t≈40s and Prometheus alerts at t≈100s — 50× slower than HOSA's response.

t=4s

Containment Holding

Derivative decelerating — containment is effective. No escalation needed.

dDM/dt = +1.2 ↓ d²DM/dt² = −0.45

Prometheus: scrapes now. Sees mem=1.47GB. Rule requires >1.8GB for 1m. Result: OK

t=8s

System Stabilized

STABLE

Memory at 74% — plateau reached. Derivative near zero. System degraded but functional. All transactions preserved. No process killed.

DM = 6.2 dDM/dt ≈ 0 Level 2
Counterfactual — without HOSA
t=40s

OOM-Kill

CRASH

payment-service killed mid-transaction. Data corrupted. CrashLoopBackOff begins. Customers receive 502 errors.

t=100s

Prometheus Alert Fires

TOO LATE

Alert fires 60 seconds after the first crash. The for 1m condition is finally satisfied. On-call engineer paged. Postmortem begins.

Bio-Inspired Autonomous Resilience

The architecture draws from the biological reflex arc: the spinal cord retracts your hand from a hot surface in milliseconds, then notifies the brain for contextual processing. HOSA applies this pattern — immediate local action followed by opportunistic notification to the central control plane.

01

Multivariate Detection

No static thresholds. HOSA learns the behavioral profile of the node — how CPU, memory, I/O, and network correlate — and detects deviations using the Mahalanobis Distance. It identifies anomalous correlation structures that per-metric alerts miss.

02

Kernel-Space Collection

Metrics collected via eBPF probes attached directly to kernel tracepoints. No polling, no scraping. Data flows through ring buffers with microsecond latency for kernel↔user space transitions.

03

Predictive Derivatives

HOSA computes the velocity (dD̄M/dt) and acceleration (d²D̄M/dt²) of deviation from homeostasis. It detects trajectory toward collapse, not merely the arrival at a critical state.

04

Graduated Response

Six response levels from passive observation to autonomous quarantine, proportional to severity. No binary kill switches. Throttle first, contain second, isolate only as last resort. Every action is reversible at Levels 0–4.

05

Zero External Dependencies

No TSDB, no message broker, no cloud API required for primary function. Communication with orchestrators is opportunistic — utilized when available, never required for node survival.

06

Auditable Decisions

Every autonomous action is logged with its mathematical justification — DM value, derivative, threshold crossed, target cgroup, action taken. Full transparency for postmortem analysis. No black boxes.

Six Levels of Graduated Action

Inspired by biological threat response — proportional force from silent observation to network isolation, determined by the magnitude and acceleration of the Mahalanobis Distance.

Table 1 Graduated response levels, activation conditions, and reversibility.
LevelNameActionReversibility
0 Homeostasis None. Suppress redundant telemetry. Heartbeat only.
1 Vigilance Increase sampling rate. Log locally. No intervention. Automatic
2 Soft Containment renice non-essential processes. Webhook notification. Automatic
3 Active Containment CPU/memory throttling via cgroups. Partial load shedding via XDP. Auto w/ hysteresis
4 Severe Containment Aggressive throttling. Block inbound traffic except healthchecks. Freeze non-critical cgroups. Sustained recovery
5 Quarantine Network isolation. Freeze non-essential processes. Environment-aware recovery mode. Manual

The Perceptive-Motor Cycle

Three functional layers — sensory (eBPF), cortex (mathematical engine), motor (cgroups/XDP) — operating in a continuous loop. Kernel↔user space transitions occur via eBPF ring buffers with microsecond latency.

Figure 2 System architecture showing the perceptive-motor cycle across kernel and user space.
Kernel Space eBPF

Sensory Probes

  • Tracepoints (sched, mm, net)
  • Kprobes
  • PSI Hooks

Actuators

  • XDP (packet filtering)
  • Cgroup controllers (cpu, memory)
  • Process signals
eBPF Ring Buffer
BPF Maps
User Space Go

Predictive Cortex

  1. Receive events from ring buffer
  2. Update state vector x(t)
  3. Update μ and Σ incrementally (Welford)
  4. Calculate DM(x(t)) — Mahalanobis Distance
  5. Apply EWMA smoothing → M(t)
  6. Calculate dD̄M/dt and d²D̄M/dt²
  7. Evaluate against adaptive thresholds
  8. Determine response level (0–5)
  9. Send actuation command via BPF maps → Kernel

Opportunistic Communication

Webhooks Prometheus metrics Local audit log

Used when available. Never required for primary function.

"Orchestrators and centralized monitoring systems are essential instruments for capacity planning, load balancing, and long-term infrastructure governance. However, they are structurally — not accidentally — too slow to guarantee a node's survival in real time. If collapse occurs in the interval between perception and exogenous action, the capacity for immediate decision must reside in the node itself."

— HOSA Whitepaper v2.1, §1.3

Further Reading

From foundational concepts to implementation details.

How to Cite

Amorim, F. R. (2026). HOSA — Homeostasis Operating System Agent: A Bio-Inspired Architecture for Autonomous Linux Resilience. Whitepaper v2.1. IMECC, Universidade Estadual de Campinas (Unicamp).

% BibTeX
@techreport{amorim2026hosa,
  title   = {HOSA --- Homeostasis Operating System Agent},
  author  = {Amorim, Fabricio Roney de},
  year    = {2026},
  institution = {IMECC, Universidade Estadual de Campinas},
  type    = {Whitepaper},
  version = {2.1},
  url     = {https://bricio-sr.github.io/hosa/}
}