Is Mythos the Sputnik moment for AI in enterprise technology?

Prosaic Times Podcast

0:00

-16:56

Is Mythos the Sputnik moment for AI in enterprise technology?

The imperative for spec-driven, immutable engineering, verified by adversarial automation rather than manual bottlenecks

James Kaplan

Jun 20, 2026

On October 4, 1957 the Soviet Union launched Sputnik 1, the first man-made object to achieve Earth orbit. In the Right Stuff [1], Tom Wolfe described the shock and dislocation felt by American elites. They had built the arsenal of democracy and exploded the atomic bomb. And now a backward former supplicant, one that required American trucks to hold off the Wehrmacht, had beaten them into space. What did the United States have? An underfunded, shambolic collection of civilian and military programs designed to satisfy bureaucratic and diplomatic imperatives rather than for speed and effectiveness.

They responded. Then Senate Majority Leader Lyndon Johnson said Americans would not go to sleep by the light of a Communist Moon. The National Aeronautics and Space Act of 1958 created the National Aeronautics and Space Administration (NASA) with responsibility for the American space program. The National Defense Education Act of 1958 sought to dismantle John Dewey’s legacy in American education, pushing schools to replace “life adjustment skills” with set theory and symbolic logic. Less than 12 years after Sputnik, Neil Armstrong and Buzz Aldrin walked on the surface of the Moon.

Could Anthropic’s recent announcement of how Mythos can identify and exploit cybersecurity vulnerabilities create the Sputnik moment that will spur companies to use AI to change the way they operate enterprise technology?

The risks are real, and companies will need to move beyond buying tools and to build an agentic governance loop that uses a living graph of the environment to provide the context for spec-driven, immutable engineering, verified by adversarial automation rather than manual bottlenecks -- and then sustain and expand this change over time.

Despite early indicators of transformative improvements, AI adoption in running enterprise technology has been shallow.
Despite some fear-mongering, the impact of AI on the cybersecurity balance of power between attackers and defenders has been muted to date -- Mythos and subsequent models could change that.
Mythos and subsequent models could dramatically improve companies’ cybersecurity posture in the medium term -- but they will need to use AI to accelerate their enterprise technology metabolism dramatically.
Of course, the idea of a Sputnik moment is as much a warning as a call to action -- one-time programs are a lot easier than sustained cultural change.

AI adoption in running enterprise technology has been disappointingly shallow

Technology engineering and operations is one of the most exciting applications of AI for large companies. Large language models excel at interpreting and generating the structured content used in software engineering or technology configuration. AI can replace procedural programming with declarative programming [3], via spec-driven development. Agentic processes can better accommodate the edge cases and exceptions that have historically bedeviled efforts to automate technology operations. The early results have been exciting. My McKinsey colleagues have found that using AI to reinvent engineering processes can double team throughput. AWS has started to use agentic processes to reduce incident resolution time by three-quarters in some cases. Applied ruthlessly, AI could transform the economics of enterprise technology.

Yet adoption has been shallow. In last year’s DORA State of DevOps report, 90 percent of software practitioners said they use AI in some way, but most never used it in agent or autonomous mode and only 17 percent used it every day. The situation is no better with the cybersecurity team. According to a SANS report, Security Operations Centers use AI/ML tools, but don’t integrate them into their processes:

AI is present inside the SOC but not operationalized. Analysts use it informally, often with mixed reliability, while leadership has not yet established a consistent model for where AI belongs, how its output should be validated, or which workflows are mature enough to benefit from augmentation.

All this accords with my own observations: technology teams use AI as a tool to generate a code snippet or research an issue, rather than a lever to rip toil out of the way they do business. Why is this? The technology is still relatively new. Teams may be cautious or may not have the mental bandwidth required for change. Vendors have promised just installing a tool will solve their problems. And CIOs have not built the institutional support required to fund and prosecute the required change.

Mythos could change the cybersecurity balance of power between attackers and defenders

Since OpenAI released ChatGPT 4.0 in 2023 the great and the good have warned us about AI-enabled cyberattacks. The World Economic Forum said that specialized language models would allow hackers to get around endpoint security devices. The FBI said that AI would allow criminals to scale fraud schemes in a way that would swamp law enforcement. The UK’s National Cyber Security Centre said that GenAI lowers the barrier to entry for novice hackers allowing them to use vectors previously only available to experts. Some predictions approached fear-mongering -- sentient malware and HackerGPTs collapsing cybersecurity defenses. [4]

The worst...has not happened. I checked this morning, and the digital world continues to function. Only 16 percent of companies suffering breaches said they saw evidence of AI in prosecuting the attack. According to the Verizon Data Breach Investigations Report attackers have been just as dilatory as enterprises in using AI to reinvent their business processes:

It turns out the state-sponsored actors are just like legitimate organizations in their GenAI implementation life cycles. Attempts are being made, maybe some improvements are being found, but no one is revolutionizing anything yet.

At least as of 2024, GenAI tools could potentially assist attackers, but could not execute sophisticated attacks for them. One analysis found that GPT-4 only achieved a 7 percent success rate in exploiting vulnerabilities without clear human guidance.

Even before Mythos, the potential and the direction of travel have been worrisome. The structural factors that make LLMs effective in building and running systems also apply in compromising them.

Intel matters in undertaking a cyberattack. LLMs have breadth of vulnerability knowledge no human analyst can read or retain -- LLM training data spans public CVEs, security research, disclosed exploits, and documented attack strategies.
Success requires patience. Agents will cycle through potential vectors without boredom or fatigue.
System compromise provides agents with a clear objective function they can optimize against.

As a result, researchers have started to demonstrate that teams of LLM agents can cooperate to exploit zero-day vulnerabilities

Then came Mythos. Obviously we should be restrained in thinking about the implications of any software that isn’t generally available yet. And we’ve heard the too dangerous to release warning before. In its public statements, Anthropic said that Mythos had identified thousands of high-severity vulnerabilities across major operating systems and browsers—including legacy flaws like a 27-year-old bug in OpenBSD that evaded decades of manual audits. The model further demonstrated the ability to build complete, working exploits. Mythos can independently “chain” multiple vulnerabilities to gain a foothold, escalate privileges, and move laterally through a network, effectively allowing users with no formal security training to execute professional-grade, multi-stage cyberattacks at machine speed. Finding zero-days may get the headlines, but the ability to scale and operate autonomously may create the real risk.

This probably won’t realize the most dire predictions of 2024. [5] Several commentators have observed that a model’s ability to identify vulnerabilities and form plans doesn’t mean it will succeed in the face of sophisticated defenses (including the ones they have developed). But how many companies have sophisticated defenses like zero-trust in place comprehensively? And one compromise in the software supply chain could disable hundreds or thousands of institutions. Naturally, Anthropic and other frontier labs will seek to implement guardrails that limit attackers’ ability to exploit their models. The guardrails will not be perfect. And they will not apply to many of the open-weight models that will likely have Mythos-level capability within, maybe, a year.

Mythos and subsequent models could dramatically improve companies’ cybersecurity posture in the medium term. Could, not will.

After the Mythos announcement, American business and governmental elites acted. Anthropic delayed general availability of Mythos and launched Glasswing, giving early Mythos access to leading technology institutions so they could use it to identify vulnerabilities. Treasury Secretary and Federal Reserve Chair Jerome Powell called banking CEOs to Washington DC so they could urge them to take the risk seriously -- I expect they were pushing on an open door. Technology companies like AWS, MSFT, CRWD and CSCO reported that they were using Mythos to harden their products.

In the medium term, Mythos (and subsequent models) could provide a dramatic uplift in cybersecurity defenses. Companies spend fortunes each year scanning their code for vulnerabilities [6] -- Mythos-type capabilities will provide a level of transparency into vulnerabilities that we never could have imagined before. Most companies of any size do penetration testing, [7] but only the biggest tech spenders have dedicated red-team operations that figure out how a sophisticated attacker might compromise their environment. Mythos-type models should make this capability available to a much broader range of companies.

They may also revolutionize cybersecurity risk management and cyber insurance. Cyber-risk valuation frameworks like FAIR have foundered on the problem of likelihood assessment. Practitioners should be able to use a model like Mythos to simulate attack paths, determine the probability of success and make more fact-based remediation decisions. It could also revolutionize cyber-insurance, a segment historically held back by underwriting challenges.

And yet -- speed matters, and manual remediation is too slow. Mythos can help companies identify vulnerabilities. But identification protects nothing unless companies apply security patches from vendors and install fixes to code they have developed internally. That is the remediation gap the governance loop above is meant to close; in practice it breaks down into three concrete moves:

1. Create a living graph of your technology environment. You will very quickly face an overwhelming pipeline of vulnerabilities to remediate and vendor patches to apply. Not every one will be equally important, and the most critical nodes in your environment may not be immediately apparent given all the dependencies among business processes, systems, data and technology infrastructure.

Modeling your environment as a graph will allow you to identify the most critical nodes and prioritize what to remediate first. Ultimately every node in the graph should anchor in a non-human identity -- don’t connect IP addresses; connect non-human identities. Building the graph will also be an important step in moving to a zero-trust architecture.

2. Use spec-driven engineering to get to policy-driven systems. If you have bespoke software you will need to fix it. Autocomplete (or even asking models to write discrete code blocks) will not allow you to move quickly enough.

You need to retrain your engineering teams on how to use agents to diagnose root causes, build PRDs and execute on them autonomously. And you may need to do this on a timescale of months, not years.

As you develop strong capabilities in spec-driven development, you can accelerate efforts to retire technical debt, resulting in a more resilient environment. And you will want to define architecture, configuration and behavior in terms of policy-as-code so you can repave systems that demonstrate drift.

3. Move change control from human analysis to proof of safety. In many companies, the change approval board acts as a brake and a bottleneck on evolving the environment. It doesn’t have to be this way, and it cannot continue to be this way if companies seek to remediate the vulnerabilities Mythos identifies before attackers can exploit them. Heavyweight change approval processes are often ineffective. Teams of agents may collaborate to form an automated patch management pipeline.

Before you deploy a change, it must prove itself in a sandbox, both in terms of whether it breaks something and whether an adversary agent can compromise it, replacing the bottleneck of human analysis with the proof of safety. And you should deploy changes in stages, testing impact as you go. For years companies like Netflix have reconciled speed and safety by using canary analysis for staged change deployment.

None of these interventions are simple. [9] All will take attention, effort and time. But what is the alternative? Outsourcing might help, but it doesn’t remove the remediation burden at a stroke. Waiting for regulatory guidance (across dozens of jurisdictions and agencies) is uncertain and will likely take too much time. The age of security by obscurity is over. The cost of stasis may exceed the cost of change.

One-time programs are a lot easier than sustained cultural and organizational change

Less than a dozen years after Sputnik, Neil Armstrong and Buzz Aldrin explored the Sea of Tranquility. Ten more astronauts walked on the Moon in the next three years. Then, nothing. What poverty of the human spirit, what richness of bureaucratic incompetence caused us to tread on the moon and then retreat, without returning? Only this month has any human again transcended low Earth orbit?

Sputnik was a shock to the American educational system. By the 1980s, the National Science Foundation warned that Americans were in danger of scientific illiteracy. Shortly afterwards, the famous Nation at Risk report warned that post-Sputnik gains had wasted away. Not all the news is bad! American students have made large gains in fluid reasoning in recent decades -- and the dire standing of American students in global league tables may have more to do with compositional effects than school performance. But some of the news is really bad -- math scores have collapsed in the wake of Covid.

Just like a space program, the exploitation of AI in the enterprise is a generational project. Just like education reform, the ability of enterprise technology to use AI to build, run and protect systems is a foundational capability. Will your company move quickly enough to respond to the immediate challenge posed by AI-enabled cyberattacks? Will it sustain focus and attention over time to foster the capabilities to use AI not only to protect existing systems but to also make transformative leaps in business innovation, efficiency and resiliency?

Footnotes

[1] A great book. I read it every year in high school. The movie is pretty good too.

[2] Might I be evoking Arnold Toynbee’s theory of civilizational challenge and response here? Maybe.

[3] SQL is overwhelmingly the world’s most used declarative programming language. Perhaps its adoption provides the best historical parallel for spec-driven development. Replacing all the technical minutiae required for a query with a few SQL statements turned weeks’ worth of work into minutes.

[4] References not provided in order to protect the guilty.

[5] At first glance, you might ask: “How much does this matter for the enterprise? Once you get past the national security domain, how many attacks rely on zero-days?” More than you might think: a Mandiant analysis found that 70 percent of serious breaches they tracked involved a zero-day exploit. And of course a capable agentic attacker could assemble a sophisticated campaign out of a series of n-day exploits.

[6] The global security and vulnerability management market will shortly grow to the USD 20 billion.

[7] Itself a billion-dollar market.

[8] Special thanks to my colleagues Rich Isenberg and Charlie Lewis on these topics.

[9] These changes will require coordination across the technology organization. The infrastructure team will likely have to build the living graph of the environment, with input from each application team. Your core architecture or engineering team will lead the transition to spec-driven development, but much of the work will fall on application teams—and on infrastructure teams as they move from configuring systems to automating services. Transforming change control and patch management will require collaboration across the developer toolchain, infrastructure, and cybersecurity teams.