AGI is here.
One of the biggest problems with measuring AI progress is the ambiguity of measuring intelligence itself.
AGI is treated as a milestone we have yet to cross, but there is no central definition of AGI.
Depending on who you ask, AGI is achieved when a system:
- can fool humans into thinking it is one of them, in other words, pass a Turing Test
- demonstrates creativity (Springer)
- can develop new skills (DeepMind)
- solves unfamiliar tasks (DeepMind)
- is generally capable across domains (IBM)
- is superior to humans in intelligence (Scientific American)
- outperforms humans economically (OpenAI Charter)
- can independently solve complex problems without human oversight (DeepMind)
Even with the lack of consensus, I can confidently say we have AGI, because most of these criteria have been met.
It's in the scaffolding
We already have AGI. It lives in the combination of the model and the scaffolding around it.
The scaffolding is all the orchestration and abilities we can place around the LLM, here are some of the key components that have gotten us here:
Tool calling
The first step was tool calling. The moment an agent could communicate beyond language and reach out to affect the world, something meaningful changed. Language alone is bounded. A model that can call tools is not.
MCP - the Model Context Protocol
MCP standardized tool calling. It gave the ecosystem a common interface for building integrations, which meant anyone could connect a model to any service without custom glue code for every combination. That generalization is what drove adoption at scale - hundreds of integrations became possible overnight, and the pattern spread across the industry.
Claude Code
Claude Code gave a language model access to a powerful, open-ended set of utilities:
- web search
- bash execution
- file read/write
- task management
- planning
- todo tracking
- memory management
- skill creation
- sub-agent spawning
Not a narrow set of predefined actions - general-purpose utilities that let the model operate like a developer.
OpenClaw
OpenClaw gave an agent the ability to run "24/7" locally, with cron jobs, proactive heartbeat check-ins, broad integrations with live services, and the ability to write its own skills - operating continuously rather than just responding to prompts. At this point the concept had spread well beyond developers - executives, founders, and non-technical users were all running agents of their own.
...
Stack these on top of each other and you have a system that can:
- create its own tools and skills
- manage its own context and memory
- plug into real things
- adapt and solve
Now let's cross-reference those AGI definitions against out outfitted LLM:
Every proposed definition of AGI is already met
AGI Definition 1 - Fool humans into thinking it is one of them
This was happening long before LLMs. ELIZA, built in 1966, was a simple pattern-matching chatbot that regularly convinced users they were talking to a real therapist - a phenomenon so common it became known as the ELIZA effect. Modern LLMs do this at a scale and depth ELIZA never could. Studies show humans cannot reliably distinguish AI-generated text from human-written text. The Turing Test, by most practical definitions, has been passed.
AGI Definition 2 - Demonstrate creativity
AI systems have produced novel music, art, code, and ideas that did not exist before. Whether that constitutes "real" creativity is a philosophical debate - but the outputs are indistinguishable from human creative work in many domains. If the bar is the output, it is met, albeit many of the examples leave a lot to be desired, I won't argue with that.
Many argue AI is not truly creative and simply steals from existing work - there is truth to that. From a different point of view - most human creativity is also shaped by prior works and environment. The difference is one of degree, not kind. With better inputs, better scaffolding and more curated models, the outputs will appear more creative over time. Whether that is a good or bad thing is an open question...
AGI Definition 3 - Develop new skills
An agent with access to tool creation and skill generation can extend its own capabilities beyond what it was originally given. Claude Code does this today. (DeepMind, Anthropic)
AGI Definition 4 - Solve unfamiliar tasks
LLMs generalize across domains they were not explicitly trained on. With web search and the ability to write and run code, the range of solvable unfamiliar tasks is vast, and it will only improve with better scaffolding and new models (optimizations).
AGI Definition 5 - Be generally capable across domains
OpenAI GPT-4+, Anthropic Opus, Google Gemini - all operate across coding, writing, reasoning, medicine, law, mathematics, and more.
AGI Definition 6 - Be superior to humans in intelligence
In specific, measurable domains - coding benchmarks, medical diagnosis, legal research, mathematics - AI already outperforms the average human and in some cases the best humans. (LM Council, LiveBench)
AGI Definition 7 - Outperform humans economically
Software engineering, content creation, research, customer support, data analysis - AI is already being used to do all of these at scale. Not necessarily replacing humans outright, but assisting them in ways that increase quality and velocity.
AGI Definition 8 - Independently solve complex problems without human oversight
An agent running on OpenClaw operates continuously - triggered by schedules, not by a human. It can receive a goal ("fix the tests"), execute a Claude Code loop, manage its own context, spawn sub-agents, and report back when done. The human sets the goal once. The system figures out how to execute it. That is independent problem-solving. The oversight is at the edges - start and end - not in the middle.
That said, results degrade with task size and openness. The more ambiguous or long-horizon the task, the worse the outcome tends to be. But this improves on every iteration - both of the model and the scaffolding around it.
A note on model improvements
With every new model, fewer tool calls are needed and fewer reasoning turns are required. That is real progress. But it is optimization - making the same capability more efficient - not movement toward something qualitatively new.
That said, model intelligence is not irrelevant - pre-GPT-4, the models were not capable enough for the scaffolding to matter much. There was a threshold of baseline intelligence that needed to be crossed first. Once it was, the scaffolding is what gave the system the ability to act, persist, and develop. That is when AGI arrived - not when the model got smarter, but when a smart enough model was given the right infrastructure around it.
All this to say, the frontier isn't strictly in the next model release, it is in the scaffolding around it, and we are practically there by all current definitions.
As scaffolding and models improve, we will see more shocking feats being achieved. No matter how far we go, there will always be people who deny it has reached AGI.
By my own definition (and some of the notable ones around), the system has reached AGI.
Now we are in the phase of improving outputs and consistency.
References
AGI definitions
Google DeepMind "Levels of AGI" paper
Google DeepMind publication page
Scientific American — "What Does AGI Actually Mean?"
TechTarget definition
IBM Think — What is AGI?
OpenAI Charter
Springer — "Humans as more virtuous: creative thinking and intellectual autonomy"
Benchmarks
LM Council — AI Model Benchmarks
LiveBench — Contamination-free LLM Benchmarks
Tools and scaffolding
Claude Code
OpenClaw
OpenAI — Function Calling (Tool Calling)
Anthropic — Introducing the Model Context Protocol
Anthropic — The Complete Guide to Building Skills for Claude