Skip to main content

OpenAI Codex Security Review: 10,561 Vulnerabilities Found in 1.2 Million Commits

Published March 12, 2026 · 9 min read

TL;DR — Key Takeaways

  • Launched March 6, 2026 — in research preview for ChatGPT Pro and Enterprise users (no extra cost).
  • Scale: scanned 1.2 million commits, flagged 10,561 high-severity and 792 critical vulnerabilities.
  • Three-phase workflow: identify → verify → auto-generate remediation pull requests.
  • Not the old Codex: this is a dedicated security product, unrelated to the 2021-era code-completion model.
  • Downside: still in research preview, no enterprise audit trail or compliance reporting yet — Snyk and SonarQube remain the compliance-grade standard.
  • Who it suits: ChatGPT Pro/Enterprise teams who want a fast AI-native first pass; not a standalone replacement for mature SAST tooling.

What Is OpenAI Codex Security?

OpenAI Codex Security is a dedicated AI-powered security scanning product that launched on March 6, 2026 — distinct from the original Codex coding model that debuted in 2021 and was later deprecated. Where the old Codex was a code-completion tool, this new product exists for one purpose: finding and patching security vulnerabilities in your codebase.

The product operates on commit history rather than just current code. It ingests a repository's full commit log, reasons about how code has changed over time, and surfaces vulnerabilities that may have been introduced at any point — not just in the most recent diff. This historical scan approach is a meaningful departure from how most static analysis tools (SAST) work, which typically analyze the current state of the codebase.

It is currently available in research preview to ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users at no extra charge. OpenAI has not announced general availability pricing, and the feature set is subject to change.

How We Evaluated It

Source review: Analysis based on OpenAI's March 2026 research preview announcement, published technical documentation, and community reports from early-access users in security engineering forums.

Comparison methodology: Codex Security's stated capabilities compared against Snyk (SAST/SCA), SonarQube (SAST), and GitHub Advanced Security's code scanning using publicly documented feature sets and G2/Gartner review data.

Independence: No affiliate relationship with OpenAI, Snyk, or SonarQube. This review draws on published data and independent community feedback, not sponsored access.

Limitation: As a research preview product, full hands-on testing at enterprise scale was not possible. Where claims cannot be independently verified, we note the source.

How It Works: The Three-Phase Process

OpenAI describes Codex Security as operating in three sequential phases. Understanding each phase helps clarify what is genuinely novel here versus what established SAST tools already do.

Phase 1: Identify

The scanner ingests the repository's commit history and applies AI reasoning to identify potential vulnerability patterns. Unlike rule-based scanners that match code against known CVE signatures, Codex Security reasons about code semantics — it can flag novel injection patterns, logic flaws in authentication flows, or misconfigured cryptographic operations that don't map to a pre-existing rule. OpenAI claims it found 10,561 high-severity issues and 792 critical issues across 1.2 million commits in its research preview data, though the public breakdown of which vulnerability classes this represents has not been released.

Phase 2: Verify

After flagging a potential issue, Codex Security runs a verification step to reduce false positives before surfacing it to the developer. This is where AI reasoning provides a more genuine advantage over pattern matching. Rule-based scanners often produce high false-positive rates — Snyk, for example, is sometimes criticized for flooding security teams with low-confidence findings. A verification step that filters on contextual plausibility could meaningfully reduce alert fatigue, though OpenAI has not published false-positive rate comparisons against established tools.

Phase 3: Fix

For verified findings, Codex Security generates a remediation patch and opens a pull request automatically. This is the most practically novel part of the workflow. Traditional SAST tools like SonarQube identify problems and leave remediation entirely to the developer. Codex Security closing the loop from detection to a proposed fix — even if that fix requires human review before merging — compresses the time from "vulnerability found" to "vulnerability resolved." For straightforward issues like SQL injection via string concatenation or hardcoded API keys, the automated PR approach is likely to be effective. For architectural vulnerabilities like broken access control patterns spanning multiple services, the generated patches will require deeper review.

The Numbers: What 1.2 Million Commits Actually Means

The headline figures from OpenAI's research preview require some context to interpret usefully.

1.2 million commits is not a single monolithic codebase — it represents the aggregate commit history across multiple repositories included in the research preview. The number of repositories, their languages, and their sizes have not been disclosed. This matters because vulnerability density varies enormously by language (C/C++ codebases tend to surface more memory safety issues; JavaScript sees more injection and prototype pollution patterns) and by codebase age and team size.

The 10,561 high-severity findings and 792 critical findings suggest a roughly 13:1 ratio of high to critical issues, which is broadly consistent with what enterprise SAST deployments report — most codebases have a larger tail of significant-but-not-critical vulnerabilities than true critical exposures. Whether these findings overlap with CVE-catalogued vulnerabilities or represent novel AI-detected patterns is not specified.

MetricCodex Security Research Preview
Commits scanned1,200,000
High-severity findings10,561
Critical findings792
Launch dateMarch 6, 2026
StatusResearch preview (Pro + Enterprise)
Auto-patch generationYes — pull requests created automatically

What Codex Security Does Well

1. Reasoning Beyond Known CVE Patterns

Rule-based SAST tools are fundamentally reactive: they identify vulnerabilities that have already been catalogued. Codex Security applies language model reasoning to code semantics, which means it can potentially flag novel vulnerability classes — flawed authorization logic, subtle race conditions in async code, or improper state management patterns — that don't map to existing rules. This is the most theoretically significant advantage, though independent benchmarks confirming it don't yet exist.

2. Commit-History Scanning

Most SAST tools analyze the current state of your codebase. Codex Security scans the commit history, which means it can identify vulnerabilities introduced in older commits that may still be present in the current codebase — and surface when and how they were introduced. For organizations doing post-incident forensics or acquiring a legacy codebase, scanning the commit timeline rather than just HEAD is meaningfully different.

3. Automated Remediation Pull Requests

The gap between "vulnerability detected" and "vulnerability fixed" is where most security programs break down. Engineering teams receive long lists of SAST findings, prioritization is unclear, and individual fixes get deprioritized against feature work. Codex Security's automated PR generation forces the remediation step into the existing code review workflow — a developer reviews and merges a PR rather than manually writing a fix from a security report. Even if fix quality is imperfect, getting patches into PR review is a meaningful process improvement.

4. Included in ChatGPT Pro — No Separate Budget Required

Snyk and SonarQube carry separate enterprise licensing costs. For teams already paying for ChatGPT Pro or Enterprise, access to Codex Security costs nothing additional during the research preview. For small engineering teams that lack a dedicated security budget but have ChatGPT subscriptions for development work, this lowers the barrier to at least a first-pass security scan substantially.

Genuine Limitations

Codex Security vs Snyk vs SonarQube vs GitHub Advanced Security

How does Codex Security sit relative to established SAST tools in a real security stack?

FeatureCodex SecuritySnykSonarQubeGH Advanced Security
Detection methodAI reasoningRule-based + CVE DBRule-based SASTCodeQL semantic
Auto-fix / PR generation✅ Yes✅ Yes (limited)❌ NoPartial (Copilot)
CI/CD pipeline gating❌ Not yet✅ Yes✅ Yes✅ Yes
Compliance reporting❌ Research preview✅ Enterprise✅ Enterprise✅ GitHub Enterprise
Commit history scanning✅ YesPartial❌ Current state onlyPartial (secret scan)
G2 ratingN/A (too new)4.5/54.4/54.3/5
PricingIncluded (Pro $20/mo)Free tier; Team from $25/moFree (CE); Enterprise pricedGitHub Advanced $19/mo

The honest positioning: Codex Security in research preview is a useful investigative layer, not a compliance-grade replacement. Engineering teams in regulated industries should continue running Snyk or SonarQube in their pipelines. Teams with ChatGPT Pro access and no current SAST tooling have a low-cost reason to start with Codex Security as a first-pass scanner — it's better than nothing, and it may surface issues that complement a subsequent Snyk scan.

How to Get Access

Codex Security is accessible through ChatGPT's interface rather than as a standalone developer tool or CLI. As of March 2026, access requires:

This interface approach is worth noting as a limitation. Snyk and SonarQube integrate into your IDE (VS Code, IntelliJ), your CLI, and your CI/CD pipeline. A ChatGPT-interface-based security tool requires developers to leave their development environment to run scans, which reduces the likelihood of regular use. Whether OpenAI plans IDE or CI/CD integrations for Codex Security post-preview is not yet announced.

If you're building a security-conscious development workflow, see our AI coding tools guide for how Codex Security fits alongside AI-assisted coding tools like Cursor, GitHub Copilot, and Claude Code — and our Claude Code vs Cursor comparison for context on how different AI coding environments handle security-sensitive code generation.

Save on AI Subscriptions

Testing Codex? Get ChatGPT Plus at 30-40% off through shared plans on GamsGo — use code WK2NU

See GamsGo Pricing

Frequently Asked Questions

What is OpenAI Codex Security?
OpenAI Codex Security is an AI-powered code security scanner — not to be confused with the original Codex coding model from 2021. It launched in March 2026 in research preview and uses a three-phase process (identify, verify, fix) to scan codebases and commit histories for vulnerabilities, then automatically generates pull requests to address the issues it finds.
How many vulnerabilities did Codex Security find?
In OpenAI's research preview data, Codex Security scanned 1.2 million commits across multiple repositories and identified 10,561 high-severity vulnerabilities and 792 critical vulnerabilities. It also generated automated patches for the findings. OpenAI has not published the false-positive rate or a breakdown of vulnerability classes.
Is Codex Security free?
During the research preview, Codex Security is included at no additional cost for ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users. There is no standalone free tier. Pricing after research preview has not been announced.
How does Codex Security compare to Snyk or SonarQube?
Snyk and SonarQube use established rule-based pattern matching with CVE databases, CI/CD pipeline integration, and compliance reporting. Codex Security uses AI reasoning to potentially detect novel vulnerability patterns beyond known CVEs and auto-generates remediation PRs. For compliance workflows (SOC 2, PCI-DSS), Snyk and SonarQube are mature and auditable. Codex Security is better positioned as a complementary investigative layer for teams already using a ChatGPT subscription.
Can Codex Security fix the vulnerabilities it finds?
Yes — fix generation is part of its three-phase workflow. After verifying a finding, Codex Security generates a remediation patch and opens a pull request automatically. Patch quality is likely to be solid for straightforward issues (injection flaws, hardcoded secrets) and less reliable for complex multi-file or architectural vulnerabilities. Human review before merging is essential.

Verdict: A Promising First Look, Not Yet a Compliance Standard

Use Codex Security if you...

  • Already have a ChatGPT Pro or Enterprise subscription
  • Want a quick AI-native first pass on a legacy codebase
  • Lack dedicated SAST tooling and need to start somewhere
  • Are doing an acquisition or code audit and want commit-history context
  • Want automated PR generation to compress remediation time

Keep your existing tools if you need...

  • SOC 2, PCI-DSS, or HIPAA compliance evidence
  • CI/CD pipeline security gating
  • Reproducible, auditable scan reports
  • Deep language ecosystem analysis (npm dependencies, etc.)
  • Published false-positive benchmarks before deploying

OpenAI Codex Security is the most interesting new security tool of early 2026 — the commit-history scanning angle and auto-PR generation are genuinely novel. But it's in research preview for a reason. Security engineering teams in regulated industries shouldn't replace Snyk or SonarQube with it yet. For teams that have been doing ad-hoc security reviews and have ChatGPT Pro access, running it as a first-pass scanner costs nothing extra and could surface vulnerabilities that have been sitting undetected in commit history for months.