HOSA detects, contains, and stabilizes system collapse in milliseconds — before your monitoring even notices. Endogenous resilience for every node.
Your server crashes in 2 seconds. Your monitoring detects it in 100. That gap is where systems die — and HOSA lives.
memory.high containment at t=2s. No process killed.Memory leak begins in payment-service. Rate: ~50MB/s.
Prometheus last scraped 8s ago. Next scrape in 7s. Data shows: "healthy."
Mahalanobis Distance crosses vigilance threshold. Sampling rate increased from 100ms to 10ms.
Prometheus: no scrape in this interval. Zero awareness.
Dominant contributor identified: /kubepods/pod-payment-service-7b4f.
Action: memory.high reduced from 2G → 1.6G. Webhook dispatched.
Prometheus: next scrape in 5s. Still showing stale data from t=-8s.
HOSA detected and contained the anomaly before any external monitoring system could collect its first data point post-leak.
Derivative decelerating — containment is working. No escalation needed.
Prometheus: scrapes now. Sees mem=1.47GB. Rule says >1.8GB for 1m. Result: OK (!)
Memory at 74% — plateau reached. Derivative near zero. System degraded but functional. All transactions preserved. No process killed.
payment-service killed mid-transaction. Data corrupted. CrashLoopBackOff begins.
Customers receive 502 errors.
Alert fires 60 seconds after the first crash. The for 1m condition is finally satisfied.
On-call engineer paged. Postmortem begins.
Like the human reflex arc — your spinal cord retracts your hand from fire in milliseconds, then notifies your brain. HOSA does the same for your servers.
No static thresholds. HOSA learns the behavioral profile of your node — how CPU, memory, I/O, and network correlate — and detects deviations using the Mahalanobis Distance. It sees patterns that per-metric alerts miss.
Metrics collected via eBPF probes attached directly to kernel tracepoints. No polling, no scraping, no agents-calling-agents. Data flows through ring buffers with microsecond latency.
HOSA doesn't just measure where you are — it calculates velocity and acceleration of deviation. It detects that you're heading toward collapse, not just that you've arrived.
Six response levels from passive observation to autonomous quarantine. Proportional to severity. No binary kill switches. Throttle first, contain second, isolate only as last resort.
No TSDB, no message broker, no cloud API required for its primary function. Communication with orchestrators is opportunistic — used when available, never required.
Every autonomous action is logged with its mathematical justification — DM value, derivative, threshold crossed, target cgroup, action taken. Full transparency. No black boxes.
Inspired by biological threat response. Proportional force — from silent observation to network isolation.
| Level | Name | Action | Reversibility |
|---|---|---|---|
| 0 | Homeostasis | None. Suppress redundant telemetry. Heartbeat only. | — |
| 1 | Vigilance | Increase sampling rate. Log locally. No intervention. | Automatic |
| 2 | Soft Containment | renice non-essential processes. Webhook notification. |
Automatic |
| 3 | Active Containment | CPU/memory throttling via cgroups. Partial load shedding via XDP. | Auto w/ hysteresis |
| 4 | Severe Containment | Aggressive throttling. Block inbound traffic except healthchecks. Freeze non-critical cgroups. | Sustained recovery |
| 5 | Quarantine | Network isolation. Freeze non-essential processes. Environment-aware recovery mode. | Manual |
Three layers — sensory (eBPF), cortex (math), motor (cgroups/XDP) — operating in a continuous loop with microsecond kernel↔user transitions.
Used when available. Never required.
We're actively building the first public release.
Star the repo to get notified when it drops.
HOSA is currently in alpha development. Installation instructions, pre-built binaries, and quick-start guides will be available here once the first public release is ready.
Watch on GitHubFrom concepts to implementation details.
Endogenous Resilience, the Lethal Interval, and why local autonomy matters.
The perceptive-motor cycle, warm-up calibration, and system design decisions.
Mahalanobis Distance, Welford updates, EWMA, derivatives, and regime taxonomy.
Six graduated levels, cgroups actuation, XDP load shedding, quarantine modes.
Parameters, safelists, environment detection, and tuning guide.
Full academic foundation — 52 pages covering theory, taxonomy, and validation plan.