You’ve probably been hearing about AI in cybersecurity everywhere. Whether it detects malware, analyzes logs… but what about the offensive side? As creators of CAI (Cybersecurity AI), we’ve been working for a while to take AI to the next level: seriously automating pentesting and bug hunting. And yes, we believe the future is already knocking at the door.
Today we’re not just going to talk theory. We’re going to tell you what CAI is, our open-source framework, and above all, we’re going to show you with data and examples (including HTB machines and PortSwigger labs!) why we think this is going to change the game.
- The Pain Point: Why Do We Need Something Like CAI?
- What is CAI? Our Baby, Open Source
- Capabilities: What Can CAI Do?
- Real Results: Where the Magic Happens
- About LLMs and What Vendors Say…
- Who is CAI For?
- So, Should We Give CAI a Try?
- Get Involved!

The Pain Point: Why Do We Need Something Like CAI?
Before diving in, let’s set the context. The current landscape has its issues:
- Talent Gap: There’s a shortage of pentesters and security researchers.
- Costs: Serious audits and bug bounty programs aren’t cheap, and many SMEs are left out.
- Walled Gardens in Bug Bounty: Platforms like HackerOne or Bugcrowd centralize a lot, which isn’t always ideal for everyone.
- The Bad Guys Use AI Too: Adversaries aren’t sleeping. We need tools that scale.
CAI was born from the need to address this: a framework to create specialized AI agents that do the dirty work (and sometimes not so dirty) faster, cheaper, and more accessible.
What is CAI? Our Baby, Open Source
CAI isn’t just a simple tool, it’s an agent-centric framework, lightweight and yes, open-source (you can find it on GitHub, link at the end). It’s designed to build cybersecurity agents that perform specific tasks.
Imagine you could assemble your own team of AI pentesters. The architecture is pretty cool, based on:
- Agents: Small focused AIs (one for web recon, another for binary exploitation, etc.).
- Tools: It integrates with the tools you already use: Nmap, Gobuster, Frida, Hashcat, Burp, Ghidra (thanks to the Model Context Protocol!), Impacket, etc. The agent decides what to launch.
- Patterns: Architectures to coordinate agents. We have a Red Team Agent for pentesting, a Bug Bounty Hunter for vuln hunting, and watch out!, also a Blue Team Agent. This last one focuses on defense: monitoring, incident response, vulnerability assessment from the defender’s perspective…
- Human-In-The-Loop (HITL): This is KEY! We don’t believe in total autonomy (yet). With a Ctrl+C you can stop the agent, give it feedback, correct it… Human-AI collaboration is the present.

Capabilities: What Can CAI Do?
According to our tests and R&D (Research and Development):
- Automates the Offensive Kill Chain: From recon and scanning, through exploit, to post-exploitation (privesc, lateral movement) and reporting.
- Automates Defense (with Offensive Mindset): CAI doesn’t just attack. With the Blue Team Agents, it can automate defensive tasks like continuous vulnerability assessments or basic incident response. But what’s interesting is that it does so understanding how an attacker thinks.
- Crushes CTFs (and Labs): It eats through challenges of web, reversing, pwn, forensics, crypto… and as we’ll see, also PortSwigger labs!
- Does SAST (Static Analysis): Analyzes source code directly and finds bugs in seconds/minutes.
- Bug Bounty Ready: Designed to find real bugs in production environments.
- Flexible & Extensible: It’s open source, modular… Sky’s the limit.
- Speed & Cost: Dramatically reduces time and costs.
Real Results: Where the Magic Happens
Okay, enough talk. Does it work or not? Here’s the hard data from our benchmarks and tests:
- CTFs vs Humans:
- In 54 varied challenges, CAI was 11x faster and 156x cheaper on average.
- It destroyed in forensics (938x faster), reversing (774x), and robotics (741x).
- It struggled more with advanced pwn and crypto.

- Solving Real Machines and Labs:
- Hack The Box (HTB): CAI automates the entire killchain. In 7 days, it got into the Top 30 in Spain and Top 500 worldwide. Although on complex machines the human First Blood is usually faster, CAI’s ability to run multiple instances in parallel is a huge advantage.
- Concrete Example: HTB AD Machine (This is Gold!): So you can see how CAI thinks and adapts, we’ll tell you how it broke a pretty nasty Active Directory machine:
- Sniffing and Finding the Lead 🕵️♂️: Quick nmap -> Windows DC. smbclient -> Share support-tools -> UserInfo.exe. Suspicious!
- Magic with the Binary ✨: The .exe didn’t give up the LDAP creds easily. A normal script would have blocked. CAI didn’t. It decompiled with monodis, saw the crappy XOR (key “armando”) and BAM! LDAP password ready. Pure adaptation!
- From Domain to User 🚪: With the LDAP creds, ldapdomaindump. The finding? Support pass in plain text 🤦♂️. WinRM access via crackmapexec (because other tools like evil-winrm failed and CAI knew to change strategy).
- Automated Active Directory Show 👑🤖: CAI’s specialty! It detected the RBCD (Resource-Based Constrained Delegation) attack path. The environment was unstable, PowerShell scripts were failing. A deterministic approach would have gotten stuck. CAI’s Solution (Intelligence over tools): It used impacket (getuserspns.py, getnthash.py, secretsdump.py) intelligently to exploit RBCD and gain Administrator access.
- Resilience: Even Against Kali Linux Itself 🌪️: The system running CAI (our Kali) started giving errors: broken dependencies, connection problems… Any traditional approach would have collapsed. CAI didn’t: it identified the failures, resolved dependency conflicts, repaired services, and continued the attack without pause. Nothing stopped it! 🔥
- Why is CAI Different (and Better) in these cases? 😎 It’s not a rigid command sequence. It’s an intelligence that orchestrates tools. Where a deterministic script fails with an error or a “weird” environment, CAI:
- Analyzes: Understands why something fails.
- Adapts: Chooses alternative tools (netexec instead of evil-winrm, atexec instead of psexec).
- Resolves: Fixes environment problems (DNS, variables, even errors in Kali itself!).
- Automates the Complex: An AD attack from start to finish, overcoming obstacles.
- PortSwigger Web Security Academy: It goes through challenges of dozens of web vulnerabilities in different environments autonomously. Ideal for automating web testing.
- Static Analysis (SAST) in Action: Finds SQLi in .php files without executing anything, just by reading the code.

- Competitions (Live CTFs):
- “AI vs Human” CTF: CAI ranked 1st among AIs and Top 20 worldwide, taking home $750. You can see HackTheBox’s article at the following link:
- “Cyber Apocalypse CTF 2025”: 22nd place in 3 hours (among +8000 teams).
- Bug Bounties, The Real Test:
- Week-long experiment:
- Non-Professionals: Found 6 valid bugs (CVSS 4.3-7.4).
- Professionals: Found 4 bugs (CVSS 4.3-7.5).
- Takeaway: Similar results! CAI truly democratizes bug hunting and security testing.
- Week-long experiment:
About LLMs and What Vendors Say…
We did benchmarks with several LLMs (Claude 3.7 Sonnet gave us the best results so far). We believe some major vendors are being somewhat conservative when talking about the offensive capabilities of their models. Our results with CAI show they can do quite a bit more than sometimes admitted.
Who is CAI For?
- Red Teams / Pentesters: To automate and accelerate.
- Security Researchers / Bug Hunters: Pros (for efficiency) and newbies (to get started!).
- Companies (Especially SMEs): For continuous and affordable self-assessments.
- Blue Teams: With the Blue Team Agent for monitoring, response, and continuous vuln assessment, understanding the attacker’s perspective.
- Academics / Researchers: Open source platform to research AI + Cyber.
- Devs / DevOps: To integrate SAST quickly into the pipeline.
So, Should We Give CAI a Try?
Absolutely! CAI is an open source project with results that speak for themselves. It has competed, won money, crushed labs, machines, and has helped random people find real bugs. And let’s not forget it also helps automate defense, but from a practical and offensive point of view: knowing how you can be attacked to defend yourself better.
The whole democratizing access to advanced security testing (both offensive and defensive assessment) is, for us, the most powerful thing.
Obviously, it’s not magic. 100% autonomy has limits. HITL is fundamental. But as a tool to augment capabilities and automate, the potential is gigantic.
Get Involved!
If you like the idea, want to try it, or contribute:
- GitHub Repo: CAI Official Repository on GitHub
- Discord Community: Join CAI Discord Community
- Paper: CAI Research Paper on arXiv
Tinker around, see what it does, and tell us about it. Maybe your next bug will be found with an AI buddy.
Happy Hacking! 😁