Agentic Offensive Security — Unprecedented Speed and Scale

Section 01

Why Agentic?

Traditional pentesting is bottlenecked by human bandwidth. Scoping, recon, enumeration, exploitation, pivoting, reporting — all manual. AI changes the speed at every layer.

BEFORE · Manual Workflow

4h Scope definition & approval

8h Recon & enumeration

12h Vulnerability discovery

16h Exploitation & pivoting

8h Report writing

48h total · 1 pentester

AFTER · Agent-Augmented

0.5h Scope → Iron Curtain constitution

1h Parallel recon (Hermes delegates)

2h Automated vuln mapping + CVE correlation

4h Gated exploitation (human approves)

0.5h Auto-generated structured report

8h total · 1 pentester + agents

Repetition

Agents handle recursive recon, brute-force, and coverage checks without fatigue. They don't forget to check that one subdomain.

Orchestration

Agents chain tools across phases without losing context. Hermes persists memory, delegates parallel tasks, and tracks state end-to-end.

Containment

Iron Curtain-style runtimes sandbox autonomous agents behind plain-English policy constitutions. No tool fires without deterministic enforcement.

Section 02

How Hackers Are Already Using Agents

While blue teams evaluate, adversaries deploy. Agentic workflows don't just make pentesting faster — they fundamentally change the economics of attack.

600×

More targets scanned per hour vs. manual

One agent + 200 parallel sub-agents = coverage that was previously impossible [1]

8h

End-to-end attack cycle vs. traditional 48h

Recon → exploit → exfiltrate → cover tracks — fully automated, while the defender sleeps [2]

⌁ Mass Parallelization

A single attacker with an agent framework can scan thousands of targets simultaneously. Hermes's delegation spawns sub-agents for each IP range, each service, each CVE check. What took a team of five pentesters two days takes one agent two hours.

// 200 parallel nmap scans. No human team does this.
$ hermes delegate "scan all 200 subnets, report back open ports"

⏱️ Continuous Persistence

Agents don't clock out. With persistent memory and cron scheduling, an agent can probe a target network every hour for weeks — learning what patches were applied, when shifts change, which services come and go. It builds a map no human attacker has the patience for.

// Cron: probe every 30 minutes. Memory: remember what changed.
$ hermes cron create "recon sweep every 30m, diff against last run"

🧠 Adaptive Learning

An agent that fails an exploit doesn't give up — it adapts. It tries the next CVE, chains a different path, pivots through a different host. Memory persists the findings. Each failure makes the next attempt smarter. This is not brute force — it's iterative learning at machine speed.

// Failed CVE-2024-1234? Try CVE-2024-5678. Still no? Chain through found creds.
$ hermes memory recall "find alternative paths to 10.0.1.50"

📉 Lowering the Bar

An attacker doesn't need to understand buffer overflows anymore. They tell the agent "find a way into this network" and the agent chains together the skill pack's 277 tools, the memory of what worked last time, and the sub-agents handling each attack surface. Sophistication is now a commodity.

// No exploit knowledge needed. The agent has it.
$ "I have access to a 10.0.0.0/8 network. Find me a foothold."

⚠️

The Asymmetry Problem

Attackers don't need Iron Curtain. The safety layer that ethical pentesters use — constitutional guardrails, approval gates, scoped targets — is optional for an adversary. They run agents without policy enforcement, without rate limiting, without human-in-the-loop. The same tools that make pentesting safe and structured for red teams make attacks fast and unbounded for everyone else.

Section 03

The Landscape

Four categories of tools. Each solves a different piece of the agentic pentesting puzzle.

Agent Framework

Hermes Agent

Orchestrates skills, delegates tasks, runs cron, persists memory. The "brain" of the operation.

hermes-agent.nousresearch.com →

Secure Runtime

Iron Curtain

Sandboxes agents behind MCP policy engine. Plain-English constitutions → deterministic enforcement. The "immune system".

github.com/provos/ironcurtain →

Skill Packs

kali-pentest

150–277 CLI tools, pre-built playbooks, authorization gates. Hermes-compatible. The "hands".

github.com/x-glacier/kali-pentest →

Specialized Agents

Pentest-Swarm-AI

ReAct reasoning, swarm coordination, purpose-built autonomous hackers. The "special forces".

Decepticon · LuaN1aoAgent · HexStrike AI

Section 04

Architecture

How the pieces fit together in a secure agentic pentesting stack.

💡

The Mental Model

Hermes decides what to do. · Iron Curtain decides whether it's allowed. · The skill pack decides how to do it.

Section 05

Core Workflow

The 5-phase agentic pentest pipeline — from authorization to report.

1

Scoping

Define targets, rules of engagement → Iron Curtain constitution

Hermes schedule

2

Recon

nmap, masscan, subdomain enum → scoped & rate-limited

kali-pentest

3

Discovery

CVE mapping, misconfig checks → parallel sub-agents

Hermes delegation

4

Exploitation

Agent proposes → human approves → tool executes

Iron Curtain gate

5

Reporting

Attack chain, CVSS, exec summary → structured markdown

Hermes report

PHASE 1

Scoping & Authorization

· User defines scope, RoE, targets
· Iron Curtain constitution encodes: allowed IPs, prohibited techniques
· Hermes schedules & tracks the engagement

PHASE 2

Reconnaissance

· Agent runs nmap, masscan, service detection
· Uses kali-pentest playbooks or HexStrike MCP tools
· Iron Curtain: scoped only, rate-limited, no OOB

PHASE 3

Vulnerability Discovery

· Version detection, CVE mapping, misconfig checks
· Hermes delegates to sub-agents for parallel scanning
· Memory persists findings across tool runs

PHASE 4

Exploitation ⚠ GATED

· Iron Curtain escalates to human for approval
· Agent proposes exploit path, tool, expected outcome
· Only after /approve does the tool execute

PHASE 5

Reporting

· Synthesizes attack chain, findings, CVSS, remediation
· Structured markdown report with executive summary
· Findings matrix, timeline, reproduction steps

More phases can be composed as Hermes skills

Section 06

Decision Framework

Which tool for which job? Choose the right layer for your use case.

"I need to orchestrate a full pentest with memory & delegation"

→ Hermes Agent

Skills system, cron scheduling, sub-agent delegation, persistent cross-session memory. Hermes is the central nervous system — it tracks state, delegates parallel tasks, and never forgets what it found.

"I need to ensure an agent can't go rogue during autonomous ops"

→ Iron Curtain

Constitutional policy enforcement at the MCP layer. Write your guardrails in plain English — Iron Curtain deterministically enforces them. No tool fires without passing the constitution check.

"I need 200+ pentest tools pre-wired for AI agents"

→ kali-pentest / HexStrike

Pre-built MCP servers with 150–277 CLI tools, playbooks, and authorization gates. Drop-in compatible with Hermes. You get nmap, Metasploit, hydra, and hundreds more — all agent-callable.

"I need fully autonomous red-team ops with swarm logic"

→ Pentest-Swarm-AI / Decepticon

Specialized ReAct agents with coordinated exploitation and swarm coordination. Multiple agents work in parallel, share findings, and adapt strategy in real time.

"I need to benchmark my agent's pentesting capability"

→ XBOW Benchmark (LuaN1aoAgent)

Standardized scoring with a 90%+ threshold. Measure your agent's effectiveness across a range of pentest scenarios and compare against the state of the art.

Critical Reading

Safety & Ethics

⚠️

No tool replaces written authorization.

Every tool in this landscape includes mandatory scope gates. Iron Curtain constitutions are not optional — they're the enforcement layer. kali-pentest and HexStrike include hard-coded authorization checks.

If you skip the safety layer, you're not doing pentesting — you're committing a crime.

📋

Authorization First

Written, scoped, timestamped. Know exactly what you're allowed to test and when the engagement ends.

🔒

Containment Always

Iron Curtain or equivalent sandbox. Never run autonomous agents against targets without a policy enforcement layer.

👤

Human in the Loop

Escalation gates for destructive actions. The agent proposes — you decide. No `/approve`, no execution.

Section 07

Getting Started

Three paths depending on your needs. From solo pentester to full red-team ops.

🥷

Minimal

Solo pentesters adding AI to existing workflow

Stack: Hermes + kali-pentest

$ hermes setup

# Follow interactive setup wizard

$ hermes skills install kali-pentest

# 277 pentest tools, agent-ready

🛡️

Hardened

Recommended

Teams needing policy-enforced autonomy

Stack: Hermes + Iron Curtain + kali-pentest

$ hermes setup

$ npx ironcurtain init

# Write your constitution

$ hermes skills install kali-pentest

# Point Hermes at Iron Curtain MCP proxy

🤖

Full Auto

Research / red-team ops with swarm orchestration

Stack: Iron Curtain + Pentest-Swarm-AI

$ npx ironcurtain init

# Constitution-first: define guardrails

$ git clone pentest-swarm-ai

# Point swarm at Iron Curtain proxy

$ ./swarm up --target=scope.yaml

Sources & Further Reading

Where the Numbers Come From

[1]

Parallel Agent Delegation

The 600× scan volume figure is derived from Hermes Agent's sub-agent delegation model: 200 parallel sub-agents each scanning at approximately 3× human speed, enabling coverage that would require an infeasibly large human team. See Hermes Agent documentation on sub-agent delegation and tool parallelism.

[2]

Agentic Pentest Acceleration

The 48h→8h compression (6× speedup) is based on automated parallelization of the five pentest phases. Google's Project Naptime demonstrated LLM agents autonomously discovering exploitable 0-day vulnerabilities (Project Zero, 2024). ProtectAI's Vulnhuntr used LLMs to find zero-day vulnerabilities across Python packages through automated code analysis at scale (Vulnhuntr, 2024).

[3]

AI Agent Threat Acceleration

Anthropic's frontier model evaluations documented offensive cyber capabilities including vulnerability discovery and exploit generation (Anthropic, 2024). OpenAI's Preparedness Framework evaluated GPT-4 for offensive cyber operations and found it could assist with vulnerability discovery and exploit development (OpenAI, 2024). CrowdStrike's 2025 Global Threat Report documented accelerating adversary speed, with AI-assisted attacks reducing dwell time.

[4]

Agentic Pentesting Benchmarks

The XBOW benchmark provides standardized scoring of AI agent pentesting capabilities, with state-of-the-art agents achieving 90%+ success rates on defined security tasks. See XBOW Benchmark for methodology and results.

[5]

NIST & OWASP Frameworks

The NIST AI Risk Management Framework (NIST AI RMF) and OWASP Top 10 for LLM Applications (OWASP) provide the governance and threat modeling frameworks that inform the defensive recommendations throughout this site.