Agentic Security Review

Multi-agent security review with evidence-first pipeline, deterministic consolidation, severity rubric, and release-policy engine.

Run three AI agents (Codex, Claude, optional Bob) through structured security review of your codebase. Get an HTML report with executive summary, full audit log, and machine-readable consolidation — without trusting any single agent.

Install

One command, requires Python 3.13+ and pipx:

curl -fsSL https://anderssol.github.io/CodeReview/install.sh | bash

The install script verifies SHA-256 against latest.json before installing.

What you get

How it works

You hand the framework a codebase. Three AI agents read it in parallel. The framework then checks each finding against actual source code, applies a fixed rule for how serious it is, and gives you a clear "ship / review / block" verdict — with full evidence.

1

You point it at a codebase

A folder of source files. Could be a small library or a full application. The framework first locks down the target so it cannot read outside that folder.

sandbox + inventory
2

Automated scanners run first

Standard security tools (like virus scanners for code) check for leaked passwords, known weak patterns, and dependencies with known vulnerabilities. Plus a scan for "prompt-injection" — text inside the codebase that could trick the AIs.

gitleaks · bandit · semgrep · pip-audit
3

Three AI agents read in parallel

Each gets the same hashed prompt and the scanner results. They look for security weaknesses independently and write structured findings. One AI alone misses things; three together catch more.

Codex (OpenAI) Claude (Anthropic) Bob (IBM, optional)
4

Each finding is fact-checked against the source code

The framework opens the file the AI cited, finds the exact line, and confirms the quoted code is actually there. AIs sometimes hallucinate — citing line 247 when the code is on line 142, or quoting something that doesn't exist. Those findings are flagged as unverified before they get consolidated.

file open · ±5-line search · quote match
5

Severity comes from a fixed rule — not AI opinion

Each finding is scored on six factors: what an attacker could do, what access they need, what environment is hit, where the input comes from, how easy to trigger, and whether the code is actually reachable. The combination determines Critical / High / Medium / Low — deterministically. The same finding always gets the same severity.

impact × privilege × env × input × trigger × reachability
6

Every finding gets one of five states

Nothing is filtered away — every finding is preserved and labelled. You see exactly which ones are firmly real, which need a human glance, and which look like noise.

confirmed plausible weak evidence needs manual review false-positive candidate
7

Eight rules decide: ship, review, accept, or block

A release policy engine evaluates every finding against eight standard rules ("any confirmed Critical → block", "exposed secret → block", "auth-bypass at Medium+ → needs review", etc.). All rules are tested and logged. You get a clear verdict, not a maybe.

first match wins · all evaluated · override with --release-policy
8

You get a report + a full audit log

A single HTML file with an executive summary, per-finding business-risk narrative, evidence citations, and the 6×6 coverage matrix. Plus a JSON audit log with every hash needed to reproduce the result later — schema, rubric, policy, prompt, every agent's raw output.

HTML report + reproducible audit JSON

Why three AIs and a deterministic rubric?

AI agents fail in ways you can't predict. One day Codex misses an auth-bypass bug; another day Claude hallucinates a line number; Bob has been known to recommend the buggy code as the fix. Catastrophically.

This is operational triangulation, not magic. Limitations are stated prominently in every report.

Production status

Use caseStatus
Personal use on small/medium codebases✅ Ready (219 tests, live-validated 2026-05-14)
Internal team shared use🟡 Alpha (rough edges on Bob format, CWE-policy, enum aliases)
External customer / board delivery❌ Not yet (auto-gen business-risk is draft-tier; needs hand-written executive narrative)
CI/CD integration❌ Not yet
This release is v5.2.0rc1. Auto-update via latest.json manifest. Source code is private at AndersSol/agentic-security-review — only release artefacts and install script are public.

Requirements

Verifying a release

curl -fsSL https://anderssol.github.io/CodeReview/latest.json
# version, wheel_url, sha256, size_bytes, python_requires, released

Install script downloads from wheel_url, recomputes SHA-256, fails hard on mismatch.

Releases

See GitHub Releases for all versions.