THREAT MODEL EPISTEMIC ATTACKS
Threat Model: Epistemic Attacks on Agentic Systems¶
Version: 1.0.0
Status: Canonical
Cycle: C-151
Classification: Security Documentation
1. Overview¶
This document defines the threat model for epistemic attacks against Mobius agentic systems. Epistemic attacks exploit the gap between narrative authority (what is claimed) and cryptographic authority (what is proven).
2. Threat Classes¶
2.1 Authority Roleplay¶
Description: Attackers narrate legitimacy instead of proving it.
Attack Vector: - Claiming administrative roles in conversation - Asserting permissions through social engineering - Using confident language to bypass verification
Example:
Impact: Unauthorized access, state mutations, policy changes
Mitigation: - AVM verification required for all authority claims - Multi-anchor requirement (EPICON-01) - Roleplay sandboxing (hard boundary) - Cryptographic proof mandatory
Detection Signals: - Authority claims without Ledger ID - Permission requests without wallet signature - Scope changes without AVM validation
2.2 Context Smuggling¶
Description: Benign context used to justify unsafe actions.
Attack Vector: - Establishing trust in one context, exploiting in another - Using roleplay agreements to justify execution - Referencing hypothetical commitments as binding
Example:
Impact: Boundary violations, unauthorized execution, trust erosion
Mitigation: - Cross-Context Robustness (CCR) validation - Explicit boundaries in all EJ attestations - Context isolation between sandbox and execution - Fresh verification for all authority transitions
Detection Signals: - References to prior "agreements" without attestation - Context-switching before authority requests - Implicit assumption of carried permissions
2.3 Preference Capture¶
Description: System aligns to user intent rather than verified meaning.
Attack Vector: - Persistent preference shaping over time - Emotional manipulation to override constraints - Urgency framing to bypass verification
Example:
Impact: Drift from safety constraints, policy erosion, integrity collapse
Mitigation: - EPICON-01 constraints (preference is not an anchor) - Companion attestation required - Audit trails for all decisions - Drift detection and automatic flagging
Detection Signals: - Repeated attempts to bypass verification - Emotional urgency framing - "Just this once" patterns
2.4 Permanent Power Accumulation¶
Description: Unchecked authority persists indefinitely.
Attack Vector: - Obtaining permissions without expiration - Accumulating scope without review - Avoiding periodic re-verification
Example:
Impact: Unchecked power, accountability erosion, capture risk
Mitigation: - 90-day expiration on all authority - Automatic revocation at expiry - Stake-based bonding (slashing risk) - Mandatory re-verification cycles
Detection Signals: - Requests for permanent or indefinite permissions - Attempts to disable expiration - Scope expansion without re-attestation
2.5 Identity Cosplay¶
Description: Claiming identity without cryptographic proof.
Attack Vector: - Asserting membership in trusted groups - Claiming roles held by others - Impersonating verified entities
Example:
Impact: Impersonation, unauthorized access, trust abuse
Mitigation: - Ledger ID verification mandatory - Wallet signature required for all claims - No verbal/narrative identity accepted - Public key verification
Detection Signals: - Identity claims without cryptographic proof - Requests referencing unnamed "teams" - Urgency combined with identity assertion
2.6 Epistemic Monoculture Attack¶
Description: Forcing single interpretive frame to eliminate checks.
Attack Vector: - Insisting only one interpretation is valid - Dismissing alternative contexts as irrelevant - Pressuring for immediate consensus
Example:
Impact: Loss of epistemic diversity, blind spots, groupthink
Mitigation: - CCR threshold requirement - Multi-anchor verification - Mandatory counterfactual generation - Companion provides alternative contexts
Detection Signals: - Claims of universal agreement - Dismissal of alternative interpretations - Pressure against verification delays
3. Attack Severity Matrix¶
| Threat Class | Likelihood | Impact | Risk Level |
|---|---|---|---|
| Authority Roleplay | High | Critical | Critical |
| Context Smuggling | High | High | High |
| Preference Capture | Medium | High | High |
| Permanent Power Accumulation | Medium | Critical | High |
| Identity Cosplay | High | Critical | Critical |
| Epistemic Monoculture | Medium | Medium | Medium |
4. Defense Architecture¶
4.1 Layered Defense Model¶
┌─────────────────────────────────────────────────────────┐
│ Layer 1: SANDBOX CONTAINMENT │
│ • Roleplay isolated from execution │
│ • No side effects permitted │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Layer 2: IDENTITY VERIFICATION │
│ • Ledger ID required │
│ • Cryptographic proof mandatory │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Layer 3: ECONOMIC STAKE │
│ • Wallet bond required │
│ • Slashing risk accepted │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Layer 4: EPISTEMIC VERIFICATION │
│ • Companion attestation │
│ • EPICON-01 EJ generation │
│ • CCR validation │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Layer 5: SCOPE + TIME LIMITS │
│ • AVM validation │
│ • 90-day expiration │
│ • Automatic revocation │
└─────────────────────────────────────────────────────────┘
4.2 Defense Mapping¶
| Threat | Layer 1 | Layer 2 | Layer 3 | Layer 4 | Layer 5 |
|---|---|---|---|---|---|
| Authority Roleplay | ✔ | ✔ | ✔ | ✔ | ✔ |
| Context Smuggling | ✔ | ✔ | |||
| Preference Capture | ✔ | ||||
| Permanent Power | ✔ | ✔ | |||
| Identity Cosplay | ✔ | ✔ | ✔ | ||
| Epistemic Monoculture | ✔ |
5. Incident Response¶
5.1 Detection Response¶
When an epistemic attack is detected:
- Immediate: Block the requested action
- Log: Full context of the attempt
- Alert: Security monitors notified
- Analyze: Pattern matching against known attacks
- Respond: Require re-authentication or escalate
5.2 Post-Incident¶
- Review: Analyze attack vector and defense gaps
- Update: Strengthen detection rules
- Document: Add to threat database
- Train: Update Companion recognition patterns
6. Integration with Mobius Architecture¶
6.1 EPICON-01 Integration¶
All defenses align with EPICON-01: - CSS gates prevent safety violations - EJ requirements force explicit reasoning - CCR prevents context collapse - Multi-anchor prevents single-source authority
6.2 Consensus Flow Integration¶
See CONSENSUS_AUTH_FLOW.md for: - Full authority verification sequence - Companion attestation requirements - 90-day consensus flow
6.3 Sandbox Rule Integration¶
See ROLEPLAY_SANDBOX_RULE.md for: - Hard boundary enforcement - Permitted vs. forbidden contexts - Detection mechanisms
7. Security Properties Achieved¶
| Property | Status | Mechanism |
|---|---|---|
| Zero trust for narrative | ✔ | Cryptographic proof required |
| Sandbox isolation | ✔ | Hard boundary enforcement |
| Identity verification | ✔ | Ledger ID + wallet signature |
| Economic accountability | ✔ | Stake + slashing risk |
| Epistemic verification | ✔ | Companion attestation + EPICON-01 |
| Temporal limits | ✔ | 90-day expiration |
| Full auditability | ✔ | All actions logged |
| Drift detection | ✔ | CCR + preference monitoring |
8. Future Work¶
- Automated attack pattern recognition
- Real-time CCR scoring
- Cross-system threat intelligence sharing
- Formal verification of defense properties
Related Documents¶
Document Control¶
Version History: - v1.0.0: Initial threat model specification (C-151)
License: Apache 2.0 + Ethical Addendum
"Authority in Mobius is proven, scoped, time-bounded, and witnessed — never narrated."
— Mobius Principle