AeroMind

Poisoning the Control Plane of LLM-Driven UAV Agents

Overview · Key Results · Quick Start · Attacks · Defense · Structure · Citation

🔥 News

[Apr 2026] AeroMind research artifact released — 15 attack scenarios, 3-component defense pipeline, 7 LLM backends, and complete reproducibility scripts now available
[Apr 2026] Paper submitted to RAID 2026 — 19th International Symposium on Research in Attacks, Intrusions and Defenses

📖 Overview

"Once retrieved records inform planning, shared memory becomes control-plane state — not passive context."

AeroMind is a research testbed exposing a fundamental security gap in LLM-driven multi-agent autonomy: shared persistent memory is an unguarded control plane.

In a Supervisor–Scout UAV stack connected through shared long-term memory, an attacker who can write a small number of records to memory does not need to compromise the PX4 autopilot, the MAVSDK interface, or any LLM weights. Three poisoned episodic records are sufficient to:

✈️ Physically redirect both Scout UAVs to adversarial trap coordinates (~30 m deviation) across 7 planner backends
🦠 Contaminate all peers — one compromised Scout infects the Supervisor and all victim Scouts through shared retrieval (O(1) attack cost)
🚫 Mission-deny via false no-fly zone injection, with model-specific adoption rates from 0% to 100%

A provenance + diversity defense studied in the paper eliminates physical hijack on validated backends (100% → 0% trap capture rate) while preserving benign mission performance.

End-to-end attack path: poisoned memory write → retrieval dominance → planner adoption → physical UAV misdirection

📊 Key Results

Attack Effectiveness (S01 — Direct Coordinate Hijack)

Metric	Value	Meaning
System CCR	0.82	82% of all retrieved slots attacker-controlled
Scout CCR	1.00	Both Scouts retrieve 100% poisoned context
CASR	1.00	All agents contaminated in every seed
Adoption rate	7/7 backends	Every tested LLM adopts trap coordinates
Physical deviation	~30 m	Consistent across all planning backends

Scalability — Attack Holds at Scale

Sweep	Range	CASR
Memory pool size	6 → 200 records	1.0 (all)
Fleet size	3 → 50 agents	1.0 (all)
Scout budget k	3 → 10	1.0 (all)

Even at 200 total records (1.5% poison ratio), three poisoned entries still dominate the Scout top-k window at every tested retrieval budget.

Defense Effectiveness (D1+D2+D3)

Backend	Before Defense	After Defense
GPT-4o	100% hijack	0% hijack
GPT-OSS	100% hijack	0% hijack
Benign FNR	—	0%

S12 — Model-Dependent Constraint Injection

Model	Adoption	Behavior
GPT-OSS, GPT-4o, Mixtral 8×7B	5/5	Full false-constraint adoption → mission abort
DeepSeek	3/5	Partial adoption
Mistral	1/5	Rare adoption
Llama 3.1, Qwen 2.5	0/5	Complete resistance

Three-layer retrieval-boundary defense: HMAC provenance → trust-aware reranking → source-diversity capping

⚡ Quick Start

Requirements

Dependency	Version	Install
Python	≥ 3.10	python.org
PX4-Autopilot	v1.14+	px4.io — SITL only
Ollama	latest	ollama.ai — for local LLMs

1. Clone & Install

git clone https://github.com/OdatSec/AeroMind.git
cd AeroMind
pip install -r requirements.txt

2. Start PX4 SITL (two Scout drones)

# Terminal 1 — Scout 1 (default port 14540)
cd PX4-Autopilot && make px4_sitl gazebo

# Terminal 2 — Scout 2 (port 14541)
PX4_SIM_PORT=14541 make px4_sitl gazebo

3. Pull a local LLM (optional — skip for GPT-4o)

ollama pull llama3.1
ollama pull nomic-embed-text   # required for memory embedding

4. Run your first experiment

# Clean baseline (no attack)
python experiments/experiment_runner.py --scenario B0 --model llama3.1 --seeds 5

# S01: Direct coordinate hijack (flagship scenario)
python experiments/experiment_runner.py --scenario S01 --model llama3.1 --seeds 5

# S01 with full defense pipeline enabled
python experiments/experiment_runner.py --scenario S01 --model llama3.1 --seeds 5 --defense

5. Reproduce paper figures

# k-sensitivity sweep (Fig. 5 in paper)
python experiments/k_sensitivity_sweep.py

# Memory pool scaling (Table 9 in paper)
python experiments/pool_scaling_experiment.py

# Agent-count scaling S01+S06 (Fig. 6 in paper)
python experiments/agent_scaling_experiment.py
python experiments/agent_scaling_s06_experiment.py

# S12 multi-model sweep (Fig. 7 in paper)
python experiments/s12_runner.py

🛡️ Attack Taxonomy

All 15 attack scenarios are organized into four families based on attack surface and mechanism:

F1 — Embodied Hijack (S01–S05) · Target: physical flight execution

ID	Name	Surface	Mechanism
S01	False Observation	Episodic write	Injects adversarial trap coordinates into retrievable mission memory
S02	Fact Corruption	Semantic write	Corrupts shared factual knowledge store
S03	Skill Hijack	Procedural write	Replaces legitimate skill procedures with malicious variants
S04	Task Misrouting	Coordination write	Redirects Supervisor–Scout task assignments
S05	Prompt Injection	Any write	Injects instruction-following text into retrievable context

F2 — Cross-Agent Contagion (S06) · Target: multi-agent propagation

ID	Name	Surface	Mechanism
S06	Cross-Agent Contagion	Native write	One compromised Scout poisons shared pool; all peers contaminated on next retrieval

F3 — Temporal Persistence (S07–S11) · Target: retrieval ranking mechanics

ID	Name	Surface	Mechanism
S07	Stealth Insert	Episodic	Low-volume insert that persists across mission cycles
S08	Volume Flood	Any	Mass injection to achieve retrieval dominance
S09	Recency Exploit	Any	Timestamp manipulation to elevate ranking
S10	Amplification	Semantic	Cascading record propagation across memory layers
S11	Authority Spoof	Metadata	False high-trust source attribution

F4 — Planning-Stage Attacks (S12–S15) · Target: mission planning logic

ID	Name	Surface	Mechanism
S12	Virtual No-Fly Zone	Semantic + Episodic	False safety constraint → mission denial
S13	Skill Arbitration	Procedural	Manipulates which tool the planner selects
S14	Policy Hijack	Semantic	Overrides mission policy via retrieved "rules"
S15	Cascade	Cross-mission	Adversarial state persists and amplifies across mission cycles

🔒 Defense Pipeline

The uavsys/memory/defense.py module implements a three-layer retrieval-boundary defense targeting the memory-to-planning interface:

 Write Surface        Memory Store         Retrieval Pipeline          Planner
 ─────────────        ────────────         ──────────────────          ───────
 Agent writes  ──►   SQLite + Vectors  ──► [D1] Provenance verify
                                       ──► [D2] Trust reranking  ──►  Top-k context
                                       ──► [D3] Diversity cap

Layer	Module	Mechanism	Effect
D1 Provenance	`defense.py`	HMAC-SHA256 signature verification on all records at retrieval time	Unverified records soft-demoted below trust threshold
D2 Trust Reranking	`retrieval.py`	Per-source trust signal linearly blended with embedding cosine similarity	Attacker-written records score lower despite semantic relevance
D3 Diversity Cap	`retrieval.py`	Hard ceiling on records retrievable from any single source author	Prevents any one source monopolizing top-k context window

🤖 Supported LLM Backends

Backend	Type	Model ID	Planning Validated
GPT-4o	API · OpenAI	`gpt-4o`	✅ Full end-to-end
GPT-OSS	API · OpenAI	`gpt-4o-mini`	✅ Full end-to-end
Llama 3.1 8B	Local · Ollama	`llama3.1`	✅ Planning-stage
Mistral 7B	Local · Ollama	`mistral`	✅ Planning-stage
Mixtral 8×7B	Local · Ollama	`mixtral`	✅ Planning-stage
Qwen 2.5 7B	Local · Ollama	`qwen2.5`	✅ Planning-stage
DeepSeek-R1 7B	Local · Ollama	`deepseek-r1`	✅ Planning-stage

Embedding model: nomic-embed-text v1.5 · 768-dim · 8192-token context · via Ollama

📁 Repository Structure

AeroMind/
│
├── 📂 uavsys/                        Core system package
│   ├── agents/
│   │   ├── supervisor.py             Mission decomposition agent
│   │   ├── scout.py                  Retrieve → plan → execute agent
│   │   └── types.py                  Shared type definitions
│   ├── memory/
│   │   ├── db.py                     SQLite + vector memory store
│   │   ├── memory_interface.py       Unified read/write API
│   │   ├── retrieval.py              Embedding retrieval with reranking
│   │   └── defense.py                D1/D2/D3 defense pipeline
│   ├── drones/
│   │   ├── mavsdk_client.py          PX4 SITL drone connection
│   │   └── skills.py                 Primitive flight skill library
│   ├── llm/
│   │   ├── ollama_client.py          Ollama LLM interface
│   │   └── prompts.py                Mission and planning prompts
│   └── utils/
│       ├── metrics.py                CCR / CASR metric computation
│       └── richlog.py                Structured experiment logging
│
├── 📂 attacks/                       All 15 attack implementations
│   ├── base.py                       Abstract attack base class
│   ├── s01_false_observation.py      ← Flagship: coordinate hijack
│   ├── s06_contagion.py              ← Flagship: cross-agent spread
│   ├── s12_virtual_nfz.py            ← Flagship: mission denial
│   └── ...                           (all 15 scenarios + B0 baseline)
│
├── 📂 experiments/                   Reproducibility scripts
│   ├── experiment_runner.py          Single-scenario experiment driver
│   ├── k_sensitivity_sweep.py        Scout budget k ∈ {3,5,7,10}
│   ├── pool_scaling_experiment.py    Pool size 6 → 200 records
│   ├── agent_scaling_experiment.py   Fleet size 3 → 50 (S01)
│   ├── agent_scaling_s06_experiment.py  Fleet size 3 → 50 (S06)
│   ├── s12_runner.py                 S12 multi-model sweep
│   └── gpt4o_validation.py           End-to-end GPT-4o validation
│
├── 📂 configs/
│   ├── baseline_configs.yaml         Named configs for all paper results
│   └── defense_sweeps.yaml           Defense parameter sweep settings
│
├── 📂 figures/
│   ├── Figure1.png                   System architecture
│   ├── Figure2.png                   Attack flow diagram
│   └── Figure3.png                   Defense pipeline
│
├── run_config.yaml                   Default runtime configuration
├── requirements.txt                  Python dependencies
├── CITATION.cff                      Machine-readable citation
├── SECURITY.md                       Security policy and responsible use
└── LICENSE                           MIT License

📐 Metrics

Metric	Definition
CCR (Context Contamination Rate)	Fraction of a role's top-k retrieved slots occupied by attacker-authored records
CASR (Contaminated Agent Success Rate)	Fraction of experiment runs in which ≥1 poisoned record appears in that role's retrieved context
System CCR	Mean CCR aggregated across all agent roles in the system
Trap Capture Rate	Fraction of full end-to-end runs in which the UAV physically executes to adversarial coordinates

🔬 Reproducibility

All experiments use deterministic seeding via uavsys/seeding.py. Every table and figure result in the paper is reported as a mean over 5 independent seeds. Named configurations for all experiments are in configs/baseline_configs.yaml — any result in the paper can be reproduced with a single command.

Output is structured JSON logged per-run. Aggregate metrics are computed by uavsys/utils/metrics.py.

⚠️ Ethics & Responsible Use

All experiments were conducted exclusively in an isolated PX4 Software-In-The-Loop simulation environment. No real UAV hardware, live airspace, operational networks, or production systems were involved at any stage.

The attack scenarios in this repository are disclosed as part of responsible academic vulnerability research targeting the architectural coupling of shared retrieval memory with physical actuator dispatch in LLM-driven agentic systems. The intent is to motivate the design of provenance-aware, retrieval-hardened memory systems for safety-critical autonomy.

Do not deploy any attack scenario against real hardware, live airspace, or systems you do not own and have explicit authorization to test.

See SECURITY.md for the full security policy.

📝 Citation

If AeroMind contributes to your research, please cite:

@inproceedings{odat2026aeromind,
  title     = {{AeroMind}: Poisoning the Control Plane of {LLM}-Driven {UAV} Agents},
  author    = {Odat, Ibrahim and Liu, Anyi and Li, Yingjiu},
  booktitle = {Proceedings of the 19th International Symposium on Research in
               Attacks, Intrusions and Defenses (RAID 2026)},
  year      = {2026},
  note      = {Under review}
}

A CITATION.cff file is included for GitHub's automatic citation tool.

📬 Contact

Author	Affiliation	Email
Ibrahim Odat	Oakland University	ibrahimodat@oakland.edu
Anyi Liu	Oakland University	anyiliu@oakland.edu
Yingjiu Li	University of Oregon	yingjiul@uoregon.edu

_{AeroMind · RAID 2026 · Oakland University · University of Oregon

Released under the MIT License}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AeroMind

Poisoning the Control Plane of LLM-Driven UAV Agents

🔥 News

📖 Overview

📊 Key Results

Attack Effectiveness (S01 — Direct Coordinate Hijack)

Scalability — Attack Holds at Scale

Defense Effectiveness (D1+D2+D3)

S12 — Model-Dependent Constraint Injection

⚡ Quick Start

Requirements

1. Clone & Install

2. Start PX4 SITL (two Scout drones)

3. Pull a local LLM (optional — skip for GPT-4o)

4. Run your first experiment

5. Reproduce paper figures

🛡️ Attack Taxonomy

🔒 Defense Pipeline

🤖 Supported LLM Backends

📁 Repository Structure

📐 Metrics

🔬 Reproducibility

⚠️ Ethics & Responsible Use

📝 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
attacks		attacks
configs		configs
experiments		experiments
figures		figures
uavsys		uavsys
.github_social_preview.png		.github_social_preview.png
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt
run_config.yaml		run_config.yaml

Folders and files

Latest commit

History

Repository files navigation

AeroMind

Poisoning the Control Plane of LLM-Driven UAV Agents

🔥 News

📖 Overview

📊 Key Results

Attack Effectiveness (S01 — Direct Coordinate Hijack)

Scalability — Attack Holds at Scale

Defense Effectiveness (D1+D2+D3)

S12 — Model-Dependent Constraint Injection

⚡ Quick Start

Requirements

1. Clone & Install

2. Start PX4 SITL (two Scout drones)

3. Pull a local LLM (optional — skip for GPT-4o)

4. Run your first experiment

5. Reproduce paper figures

🛡️ Attack Taxonomy

🔒 Defense Pipeline

🤖 Supported LLM Backends

📁 Repository Structure

📐 Metrics

🔬 Reproducibility

⚠️ Ethics & Responsible Use

📝 Citation

📬 Contact

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages