OpenAI Codex Security Review: 10,561 Vulnerabilities Found in 1.2 Million Commits
Published March 12, 2026 · 9 min read
TL;DR — Key Takeaways
- Launched March 6, 2026 — in research preview for ChatGPT Pro and Enterprise users (no extra cost).
- Scale: scanned 1.2 million commits, flagged 10,561 high-severity and 792 critical vulnerabilities.
- Three-phase workflow: identify → verify → auto-generate remediation pull requests.
- Not the old Codex: this is a dedicated security product, unrelated to the 2021-era code-completion model.
- Downside: still in research preview, no enterprise audit trail or compliance reporting yet — Snyk and SonarQube remain the compliance-grade standard.
- Who it suits: ChatGPT Pro/Enterprise teams who want a fast AI-native first pass; not a standalone replacement for mature SAST tooling.
What Is OpenAI Codex Security?
OpenAI Codex Security is a dedicated AI-powered security scanning product that launched on March 6, 2026 — distinct from the original Codex coding model that debuted in 2021 and was later deprecated. Where the old Codex was a code-completion tool, this new product exists for one purpose: finding and patching security vulnerabilities in your codebase.
The product operates on commit history rather than just current code. It ingests a repository's full commit log, reasons about how code has changed over time, and surfaces vulnerabilities that may have been introduced at any point — not just in the most recent diff. This historical scan approach is a meaningful departure from how most static analysis tools (SAST) work, which typically analyze the current state of the codebase.
It is currently available in research preview to ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users at no extra charge. OpenAI has not announced general availability pricing, and the feature set is subject to change.
How We Evaluated It
Source review: Analysis based on OpenAI's March 2026 research preview announcement, published technical documentation, and community reports from early-access users in security engineering forums.
Comparison methodology: Codex Security's stated capabilities compared against Snyk (SAST/SCA), SonarQube (SAST), and GitHub Advanced Security's code scanning using publicly documented feature sets and G2/Gartner review data.
Independence: No affiliate relationship with OpenAI, Snyk, or SonarQube. This review draws on published data and independent community feedback, not sponsored access.
Limitation: As a research preview product, full hands-on testing at enterprise scale was not possible. Where claims cannot be independently verified, we note the source.
How It Works: The Three-Phase Process
OpenAI describes Codex Security as operating in three sequential phases. Understanding each phase helps clarify what is genuinely novel here versus what established SAST tools already do.
Phase 1: Identify
The scanner ingests the repository's commit history and applies AI reasoning to identify potential vulnerability patterns. Unlike rule-based scanners that match code against known CVE signatures, Codex Security reasons about code semantics — it can flag novel injection patterns, logic flaws in authentication flows, or misconfigured cryptographic operations that don't map to a pre-existing rule. OpenAI claims it found 10,561 high-severity issues and 792 critical issues across 1.2 million commits in its research preview data, though the public breakdown of which vulnerability classes this represents has not been released.
Phase 2: Verify
After flagging a potential issue, Codex Security runs a verification step to reduce false positives before surfacing it to the developer. This is where AI reasoning provides a more genuine advantage over pattern matching. Rule-based scanners often produce high false-positive rates — Snyk, for example, is sometimes criticized for flooding security teams with low-confidence findings. A verification step that filters on contextual plausibility could meaningfully reduce alert fatigue, though OpenAI has not published false-positive rate comparisons against established tools.
Phase 3: Fix
For verified findings, Codex Security generates a remediation patch and opens a pull request automatically. This is the most practically novel part of the workflow. Traditional SAST tools like SonarQube identify problems and leave remediation entirely to the developer. Codex Security closing the loop from detection to a proposed fix — even if that fix requires human review before merging — compresses the time from "vulnerability found" to "vulnerability resolved." For straightforward issues like SQL injection via string concatenation or hardcoded API keys, the automated PR approach is likely to be effective. For architectural vulnerabilities like broken access control patterns spanning multiple services, the generated patches will require deeper review.
The Numbers: What 1.2 Million Commits Actually Means
The headline figures from OpenAI's research preview require some context to interpret usefully.
1.2 million commits is not a single monolithic codebase — it represents the aggregate commit history across multiple repositories included in the research preview. The number of repositories, their languages, and their sizes have not been disclosed. This matters because vulnerability density varies enormously by language (C/C++ codebases tend to surface more memory safety issues; JavaScript sees more injection and prototype pollution patterns) and by codebase age and team size.
The 10,561 high-severity findings and 792 critical findings suggest a roughly 13:1 ratio of high to critical issues, which is broadly consistent with what enterprise SAST deployments report — most codebases have a larger tail of significant-but-not-critical vulnerabilities than true critical exposures. Whether these findings overlap with CVE-catalogued vulnerabilities or represent novel AI-detected patterns is not specified.
| Metric | Codex Security Research Preview |
|---|---|
| Commits scanned | 1,200,000 |
| High-severity findings | 10,561 |
| Critical findings | 792 |
| Launch date | March 6, 2026 |
| Status | Research preview (Pro + Enterprise) |
| Auto-patch generation | Yes — pull requests created automatically |
What Codex Security Does Well
1. Reasoning Beyond Known CVE Patterns
Rule-based SAST tools are fundamentally reactive: they identify vulnerabilities that have already been catalogued. Codex Security applies language model reasoning to code semantics, which means it can potentially flag novel vulnerability classes — flawed authorization logic, subtle race conditions in async code, or improper state management patterns — that don't map to existing rules. This is the most theoretically significant advantage, though independent benchmarks confirming it don't yet exist.
2. Commit-History Scanning
Most SAST tools analyze the current state of your codebase. Codex Security scans the commit history, which means it can identify vulnerabilities introduced in older commits that may still be present in the current codebase — and surface when and how they were introduced. For organizations doing post-incident forensics or acquiring a legacy codebase, scanning the commit timeline rather than just HEAD is meaningfully different.
3. Automated Remediation Pull Requests
The gap between "vulnerability detected" and "vulnerability fixed" is where most security programs break down. Engineering teams receive long lists of SAST findings, prioritization is unclear, and individual fixes get deprioritized against feature work. Codex Security's automated PR generation forces the remediation step into the existing code review workflow — a developer reviews and merges a PR rather than manually writing a fix from a security report. Even if fix quality is imperfect, getting patches into PR review is a meaningful process improvement.
4. Included in ChatGPT Pro — No Separate Budget Required
Snyk and SonarQube carry separate enterprise licensing costs. For teams already paying for ChatGPT Pro or Enterprise, access to Codex Security costs nothing additional during the research preview. For small engineering teams that lack a dedicated security budget but have ChatGPT subscriptions for development work, this lowers the barrier to at least a first-pass security scan substantially.
Genuine Limitations
- ✗Research preview means it is not production-ready for compliance use. SOC 2, ISO 27001, PCI-DSS, and HIPAA compliance workflows require audit trails, fix verification evidence, and reproducible scan results. Codex Security in research preview provides none of these — it's an investigative tool, not a compliance instrument. Snyk and SonarQube have years of enterprise hardening for regulated industries.
- ✗No published false-positive data. The research preview reports finding 10,561 high-severity issues. What it doesn't tell us is how many of those are genuine vulnerabilities versus false positives. Established SAST tools benchmark their false-positive rates. Until Codex Security publishes comparable data, security teams can't make an informed comparison on signal quality.
- ✗Patch quality for complex vulnerabilities is unproven. Automatically generated patches for straightforward issues are likely to be adequate. For multi-file architectural vulnerabilities — broken access control across a microservices boundary, for example — AI-generated patches are likely to be incomplete or even introduce new issues if merged without thorough review. The auto-PR approach requires developers to understand the vulnerability before approving the fix.
- ✗Limited language and ecosystem coverage (not yet disclosed). Snyk covers 10+ languages with deep ecosystem-specific analysis (npm, PyPI, Maven, etc.). SonarQube supports 30+ languages. OpenAI has not published which languages Codex Security supports or its coverage depth per language. For polyglot repositories or non-mainstream languages, coverage gaps are a real concern.
- ✗No CI/CD pipeline integration yet. Snyk and GitHub Advanced Security integrate directly into your CI/CD pipeline, blocking PRs that introduce new vulnerabilities. Codex Security in its current form does not offer this gating capability. It's a post-hoc scanner rather than a shift-left security gate.
Codex Security vs Snyk vs SonarQube vs GitHub Advanced Security
How does Codex Security sit relative to established SAST tools in a real security stack?
| Feature | Codex Security | Snyk | SonarQube | GH Advanced Security |
|---|---|---|---|---|
| Detection method | AI reasoning | Rule-based + CVE DB | Rule-based SAST | CodeQL semantic |
| Auto-fix / PR generation | ✅ Yes | ✅ Yes (limited) | ❌ No | Partial (Copilot) |
| CI/CD pipeline gating | ❌ Not yet | ✅ Yes | ✅ Yes | ✅ Yes |
| Compliance reporting | ❌ Research preview | ✅ Enterprise | ✅ Enterprise | ✅ GitHub Enterprise |
| Commit history scanning | ✅ Yes | Partial | ❌ Current state only | Partial (secret scan) |
| G2 rating | N/A (too new) | 4.5/5 | 4.4/5 | 4.3/5 |
| Pricing | Included (Pro $20/mo) | Free tier; Team from $25/mo | Free (CE); Enterprise priced | GitHub Advanced $19/mo |
The honest positioning: Codex Security in research preview is a useful investigative layer, not a compliance-grade replacement. Engineering teams in regulated industries should continue running Snyk or SonarQube in their pipelines. Teams with ChatGPT Pro access and no current SAST tooling have a low-cost reason to start with Codex Security as a first-pass scanner — it's better than nothing, and it may surface issues that complement a subsequent Snyk scan.
How to Get Access
Codex Security is accessible through ChatGPT's interface rather than as a standalone developer tool or CLI. As of March 2026, access requires:
- A ChatGPT Pro subscription ($20/month) or ChatGPT Enterprise plan
- Opting into the research preview (the process may vary — check ChatGPT settings or OpenAI's security preview announcement page)
- Connecting a repository — the specific supported hosting platforms (GitHub, GitLab, Bitbucket) have not been fully documented
This interface approach is worth noting as a limitation. Snyk and SonarQube integrate into your IDE (VS Code, IntelliJ), your CLI, and your CI/CD pipeline. A ChatGPT-interface-based security tool requires developers to leave their development environment to run scans, which reduces the likelihood of regular use. Whether OpenAI plans IDE or CI/CD integrations for Codex Security post-preview is not yet announced.
If you're building a security-conscious development workflow, see our AI coding tools guide for how Codex Security fits alongside AI-assisted coding tools like Cursor, GitHub Copilot, and Claude Code — and our Claude Code vs Cursor comparison for context on how different AI coding environments handle security-sensitive code generation.
Save on AI Subscriptions
Testing Codex? Get ChatGPT Plus at 30-40% off through shared plans on GamsGo — use code WK2NU
Frequently Asked Questions
What is OpenAI Codex Security?
How many vulnerabilities did Codex Security find?
Is Codex Security free?
How does Codex Security compare to Snyk or SonarQube?
Can Codex Security fix the vulnerabilities it finds?
Verdict: A Promising First Look, Not Yet a Compliance Standard
Use Codex Security if you...
- Already have a ChatGPT Pro or Enterprise subscription
- Want a quick AI-native first pass on a legacy codebase
- Lack dedicated SAST tooling and need to start somewhere
- Are doing an acquisition or code audit and want commit-history context
- Want automated PR generation to compress remediation time
Keep your existing tools if you need...
- SOC 2, PCI-DSS, or HIPAA compliance evidence
- CI/CD pipeline security gating
- Reproducible, auditable scan reports
- Deep language ecosystem analysis (npm dependencies, etc.)
- Published false-positive benchmarks before deploying
OpenAI Codex Security is the most interesting new security tool of early 2026 — the commit-history scanning angle and auto-PR generation are genuinely novel. But it's in research preview for a reason. Security engineering teams in regulated industries shouldn't replace Snyk or SonarQube with it yet. For teams that have been doing ad-hoc security reviews and have ChatGPT Pro access, running it as a first-pass scanner costs nothing extra and could surface vulnerabilities that have been sitting undetected in commit history for months.