OpenAI Codex Security Review: 10,561 Vulnerabilities Found in 1.2 Million Commits

Q: How many vulnerabilities did Codex Security find?

In OpenAI's public research preview data, Codex Security scanned 1.2 million commits and identified 10,561 high-severity vulnerabilities and 792 critical vulnerabilities. It also generated patches for the issues it flagged, producing automated pull requests for remediation.

Q: Is Codex Security free?

Codex Security is currently available at no additional cost to ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users. It is in research preview as of March 2026, so availability, pricing, and scope may change before general availability. There is no standalone free tier separate from a ChatGPT Pro or Enterprise subscription.

Q: How does Codex Security compare to Snyk or SonarQube?

Snyk and SonarQube use rule-based pattern matching with large CVE databases and deep CI/CD integration. Codex Security uses generative AI reasoning to identify novel vulnerability patterns beyond known CVE signatures. The practical tradeoff: Codex Security may catch logic-level issues that rule-based scanners miss, but Snyk and SonarQube have mature audit trails, compliance reporting, and years of enterprise hardening. They are complementary rather than direct replacements in most security stacks.

Q: Can Codex Security fix the vulnerabilities it finds?

Yes — fix generation is part of its three-phase process. After identifying and verifying a vulnerability, Codex Security produces a remediation patch and opens a pull request automatically. The quality of these patches varies: straightforward issues like injection flaws and hardcoded credentials receive solid fixes, while complex architectural vulnerabilities may require human review before merging.

Published March 12, 2026 · 9 min read

TL;DR — Key Takeaways

Launched March 6, 2026 — in research preview for ChatGPT Pro and Enterprise users (no extra cost).
Scale: scanned 1.2 million commits, flagged 10,561 high-severity and 792 critical vulnerabilities.
Three-phase workflow: identify → verify → auto-generate remediation pull requests.
Not the old Codex: this is a dedicated security product, unrelated to the 2021-era code-completion model.
Downside: still in research preview, no enterprise audit trail or compliance reporting yet — Snyk and SonarQube remain the compliance-grade standard.
Who it suits: ChatGPT Pro/Enterprise teams who want a fast AI-native first pass; not a standalone replacement for mature SAST tooling.

What Is OpenAI Codex Security?

OpenAI Codex Security is a dedicated AI-powered security scanning product that launched on March 6, 2026 — distinct from the original Codex coding model that debuted in 2021 and was later deprecated. Where the old Codex was a code-completion tool, this new product exists for one purpose: finding and patching security vulnerabilities in your codebase.

The product operates on commit history rather than just current code. It ingests a repository's full commit log, reasons about how code has changed over time, and surfaces vulnerabilities that may have been introduced at any point — not just in the most recent diff. This historical scan approach is a meaningful departure from how most static analysis tools (SAST) work, which typically analyze the current state of the codebase.

It is currently available in research preview to ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users at no extra charge. OpenAI has not announced general availability pricing, and the feature set is subject to change. For a broader look at the Codex platform including its code generation capabilities, see our OpenAI Codex review.

How We Evaluated It

Source review: Analysis based on OpenAI's March 2026 research preview announcement, published technical documentation, and community reports from early-access users in security engineering forums.

Comparison methodology: Codex Security's stated capabilities compared against Snyk (SAST/SCA), SonarQube (SAST), and GitHub Advanced Security's code scanning using publicly documented feature sets and G2/Gartner review data.

Independence: No affiliate relationship with OpenAI, Snyk, or SonarQube. This review draws on published data and independent community feedback, not sponsored access.

Limitation: As a research preview product, full hands-on testing at enterprise scale was not possible. Where claims cannot be independently verified, we note the source.

How It Works: The Three-Phase Process

OpenAI describes Codex Security as operating in three sequential phases. Understanding each phase helps clarify what is genuinely novel here versus what established SAST tools already do.

Phase 1: Identify

The scanner ingests the repository's commit history and applies AI reasoning to identify potential vulnerability patterns. Unlike rule-based scanners that match code against known CVE signatures, Codex Security reasons about code semantics — it can flag novel injection patterns, logic flaws in authentication flows, or misconfigured cryptographic operations that don't map to a pre-existing rule. OpenAI claims it found 10,561 high-severity issues and 792 critical issues across 1.2 million commits in its research preview data, though the public breakdown of which vulnerability classes this represents has not been released.

Phase 2: Verify

After flagging a potential issue, Codex Security runs a verification step to reduce false positives before surfacing it to the developer. This is where AI reasoning provides a more genuine advantage over pattern matching. Rule-based scanners often produce high false-positive rates — Snyk, for example, is sometimes criticized for flooding security teams with low-confidence findings. A verification step that filters on contextual plausibility could meaningfully reduce alert fatigue, though OpenAI has not published false-positive rate comparisons against established tools.

Phase 3: Fix

For verified findings, Codex Security generates a remediation patch and opens a pull request automatically. This is the most practically novel part of the workflow. Traditional SAST tools like SonarQube identify problems and leave remediation entirely to the developer. Codex Security closing the loop from detection to a proposed fix — even if that fix requires human review before merging — compresses the time from "vulnerability found" to "vulnerability resolved." For straightforward issues like SQL injection via string concatenation or hardcoded API keys, the automated PR approach is likely to be effective. For architectural vulnerabilities like broken access control patterns spanning multiple services, the generated patches will require deeper review.

The Numbers: What 1.2 Million Commits Actually Means

The headline figures from OpenAI's research preview require some context to interpret usefully.

1.2 million commits is not a single monolithic codebase — it represents the aggregate commit history across multiple repositories included in the research preview. The number of repositories, their languages, and their sizes have not been disclosed. This matters because vulnerability density varies enormously by language (C/C++ codebases tend to surface more memory safety issues; JavaScript sees more injection and prototype pollution patterns) and by codebase age and team size.

The 10,561 high-severity findings and 792 critical findings suggest a roughly 13:1 ratio of high to critical issues, which is broadly consistent with what enterprise SAST deployments report — most codebases have a larger tail of significant-but-not-critical vulnerabilities than true critical exposures. Whether these findings overlap with CVE-catalogued vulnerabilities or represent novel AI-detected patterns is not specified.

Metric	Codex Security Research Preview
Commits scanned	1,200,000
High-severity findings	10,561
Critical findings	792
Launch date	March 6, 2026
Status	Research preview (Pro + Enterprise)
Auto-patch generation	Yes — pull requests created automatically

What Codex Security Does Well

1. Reasoning Beyond Known CVE Patterns

Rule-based SAST tools are fundamentally reactive: they identify vulnerabilities that have already been catalogued. Codex Security applies language model reasoning to code semantics, which means it can potentially flag novel vulnerability classes — flawed authorization logic, subtle race conditions in async code, or improper state management patterns — that don't map to existing rules. This is the most theoretically significant advantage, though independent benchmarks confirming it don't yet exist.

2. Commit-History Scanning

Most SAST tools analyze the current state of your codebase. Codex Security scans the commit history, which means it can identify vulnerabilities introduced in older commits that may still be present in the current codebase — and surface when and how they were introduced. For organizations doing post-incident forensics or acquiring a legacy codebase, scanning the commit timeline rather than just HEAD is meaningfully different.

3. Automated Remediation Pull Requests

The gap between "vulnerability detected" and "vulnerability fixed" is where most security programs break down. Engineering teams receive long lists of SAST findings, prioritization is unclear, and individual fixes get deprioritized against feature work. Codex Security's automated PR generation forces the remediation step into the existing code review workflow — a developer reviews and merges a PR rather than manually writing a fix from a security report. Even if fix quality is imperfect, getting patches into PR review is a meaningful process improvement.

4. Included in ChatGPT Pro — No Separate Budget Required

Snyk and SonarQube carry separate enterprise licensing costs. For teams already paying for ChatGPT Pro or Enterprise, access to Codex Security costs nothing additional during the research preview. For small engineering teams that lack a dedicated security budget but have ChatGPT subscriptions for development work, this lowers the barrier to at least a first-pass security scan substantially.

Genuine Limitations

✗Research preview means it is not production-ready for compliance use. SOC 2, ISO 27001, PCI-DSS, and HIPAA compliance workflows require audit trails, fix verification evidence, and reproducible scan results. Codex Security in research preview provides none of these — it's an investigative tool, not a compliance instrument. Snyk and SonarQube have years of enterprise hardening for regulated industries.
✗No published false-positive data. The research preview reports finding 10,561 high-severity issues. What it doesn't tell us is how many of those are genuine vulnerabilities versus false positives. Established SAST tools benchmark their false-positive rates. Until Codex Security publishes comparable data, security teams can't make an informed comparison on signal quality.
✗Patch quality for complex vulnerabilities is unproven. Automatically generated patches for straightforward issues are likely to be adequate. For multi-file architectural vulnerabilities — broken access control across a microservices boundary, for example — AI-generated patches are likely to be incomplete or even introduce new issues if merged without thorough review. The auto-PR approach requires developers to understand the vulnerability before approving the fix.
✗Limited language and ecosystem coverage (not yet disclosed). Snyk covers 10+ languages with deep ecosystem-specific analysis (npm, PyPI, Maven, etc.). SonarQube supports 30+ languages. OpenAI has not published which languages Codex Security supports or its coverage depth per language. For polyglot repositories or non-mainstream languages, coverage gaps are a real concern.
✗No CI/CD pipeline integration yet. Snyk and GitHub Advanced Security integrate directly into your CI/CD pipeline, blocking PRs that introduce new vulnerabilities. Codex Security in its current form does not offer this gating capability. It's a post-hoc scanner rather than a shift-left security gate.

Codex Security vs Snyk vs SonarQube vs GitHub Advanced Security

How does Codex Security sit relative to established SAST tools in a real security stack?

Feature	Codex Security	Snyk	SonarQube	GH Advanced Security
Detection method	AI reasoning	Rule-based + CVE DB	Rule-based SAST	CodeQL semantic
Auto-fix / PR generation	✅ Yes	✅ Yes (limited)	❌ No	Partial (Copilot)
CI/CD pipeline gating	❌ Not yet	✅ Yes	✅ Yes	✅ Yes
Compliance reporting	❌ Research preview	✅ Enterprise	✅ Enterprise	✅ GitHub Enterprise
Commit history scanning	✅ Yes	Partial	❌ Current state only	Partial (secret scan)
G2 rating	N/A (too new)	4.5/5	4.4/5	4.3/5
Pricing	Included (Pro $20/mo)	Free tier; Team from $25/mo	Free (CE); Enterprise priced	GitHub Advanced $19/mo

The honest positioning: Codex Security in research preview is a useful investigative layer, not a compliance-grade replacement. Engineering teams in regulated industries should continue running Snyk or SonarQube in their pipelines. Teams with ChatGPT Pro access and no current SAST tooling have a low-cost reason to start with Codex Security as a first-pass scanner — it's better than nothing, and it may surface issues that complement a subsequent Snyk scan. For a broader look at AI tools that assist with code review workflows, see our AI code review platforms comparison.

How to Get Access

Codex Security is accessible through ChatGPT's interface rather than as a standalone developer tool or CLI. As of March 2026, access requires:

A ChatGPT Pro subscription ($20/month) or ChatGPT Enterprise plan
Opting into the research preview (the process may vary — check ChatGPT settings or OpenAI's security preview announcement page)
Connecting a repository — the specific supported hosting platforms (GitHub, GitLab, Bitbucket) have not been fully documented

This interface approach is worth noting as a limitation. Snyk and SonarQube integrate into your IDE (VS Code, IntelliJ), your CLI, and your CI/CD pipeline. A ChatGPT-interface-based security tool requires developers to leave their development environment to run scans, which reduces the likelihood of regular use. Whether OpenAI plans IDE or CI/CD integrations for Codex Security post-preview is not yet announced. For teams looking to integrate Codex into existing pipelines via API, our Codex proxy and ChatGPT API guide covers the setup process.

If you're building a security-conscious development workflow, see our AI coding tools guide for how Codex Security fits alongside AI-assisted coding tools like Cursor, GitHub Copilot, and Claude Code — and our Claude Code vs Cursor comparison for context on how different AI coding environments handle security-sensitive code generation.

Save on AI Subscriptions

Testing Codex? Get ChatGPT Plus at 30-40% off through shared plans on GamsGo — use code WK2NU

See GamsGo Pricing

Frequently Asked Questions

What is OpenAI Codex Security?

OpenAI Codex Security is an AI-powered code security scanner — not to be confused with the original Codex coding model from 2021. It launched in March 2026 in research preview and uses a three-phase process (identify, verify, fix) to scan codebases and commit histories for vulnerabilities, then automatically generates pull requests to address the issues it finds.

How many vulnerabilities did Codex Security find?

In OpenAI's research preview data, Codex Security scanned 1.2 million commits across multiple repositories and identified 10,561 high-severity vulnerabilities and 792 critical vulnerabilities. It also generated automated patches for the findings. OpenAI has not published the false-positive rate or a breakdown of vulnerability classes.

Is Codex Security free?

During the research preview, Codex Security is included at no additional cost for ChatGPT Pro subscribers ($20/month) and ChatGPT Enterprise users. There is no standalone free tier. Pricing after research preview has not been announced.

How does Codex Security compare to Snyk or SonarQube?

Snyk and SonarQube use established rule-based pattern matching with CVE databases, CI/CD pipeline integration, and compliance reporting. Codex Security uses AI reasoning to potentially detect novel vulnerability patterns beyond known CVEs and auto-generates remediation PRs. For compliance workflows (SOC 2, PCI-DSS), Snyk and SonarQube are mature and auditable. Codex Security is better positioned as a complementary investigative layer for teams already using a ChatGPT subscription.

Can Codex Security fix the vulnerabilities it finds?

Yes — fix generation is part of its three-phase workflow. After verifying a finding, Codex Security generates a remediation patch and opens a pull request automatically. Patch quality is likely to be solid for straightforward issues (injection flaws, hardcoded secrets) and less reliable for complex multi-file or architectural vulnerabilities. Human review before merging is essential.

Verdict: A Promising First Look, Not Yet a Compliance Standard

Use Codex Security if you...

Already have a ChatGPT Pro or Enterprise subscription
Want a quick AI-native first pass on a legacy codebase
Lack dedicated SAST tooling and need to start somewhere
Are doing an acquisition or code audit and want commit-history context
Want automated PR generation to compress remediation time

Keep your existing tools if you need...

SOC 2, PCI-DSS, or HIPAA compliance evidence
CI/CD pipeline security gating
Reproducible, auditable scan reports
Deep language ecosystem analysis (npm dependencies, etc.)
Published false-positive benchmarks before deploying

OpenAI Codex Security is the most interesting new security tool of early 2026 — the commit-history scanning angle and auto-PR generation are genuinely novel. But it's in research preview for a reason. Security engineering teams in regulated industries shouldn't replace Snyk or SonarQube with it yet. For teams that have been doing ad-hoc security reviews and have ChatGPT Pro access, running it as a first-pass scanner costs nothing extra and could surface vulnerabilities that have been sitting undetected in commit history for months.