The Expertise Ceiling: What AI Does to Skill, and Why Education Is Most Exposed

AI makes you better at the things you are already good at. If you are a subject matter expert, it sharpens and extends you. If you are not, you run into an immediate problem: it is very difficult to know what a good result looks like. You can generate output all day, but you cannot judge it, and the tool will not tell you when it has handed you something mediocre, because mediocre is precisely what it produces by default.

Someone working in education later put the same idea to me more simply, calling AI a force multiplier for your existing skills and knowledge. It is a more elegant phrasing of the same observation. A multiplier is neutral. Applied to strong fundamentals, it produces excellent work. Applied to weak fundamentals, it produces mediocrity at scale, delivered faster and more confidently than ever before. The multiplication is not the problem. The input is.

This article works through where that leads: for professionals, for industries, and finally for education, which turns out to be the most exposed institution of all.

Three groups of users

In practice, people using AI seem to sort into three broad groups.

The first are the abstainers. Some are genuine Luddites who refuse to engage with the technology at all. But it is worth separating them from a smaller emerging group of deliberate abstainers, particularly in writing and craft trades, who understand AI perfectly well and are refusing it strategically. They are betting that verified human work becomes a premium category, the way handmade furniture did after mass manufacturing. Whether the bet pays off is open, but it is a considered position, not ignorance.

The second group is the large middle, and it is bigger than the obvious examples suggest. The visible version is the average operator producing average output at volume. In web development, the pattern is already established: AI helps ordinary developers spin up generic sites with generic content, those sites feed back into the data AI learns from, and the next generation of output is slightly more generic again. This recursive spiral has a name in the research world, model collapse. Controlled studies have shown that models trained on their own output degrade generation by generation, losing the distinctive, expert material first while everything regresses towards a bland centre.

But the middle group extends well beyond people producing work in their own trade. It includes everyone using AI to produce things they have never been able to produce themselves: the manager generating graphics with no design background, the tradesperson writing marketing emails with no copywriting experience, the small business owner producing reports, proposals and social content across half a dozen disciplines they have never worked in. The problem is not that they are using the tool. It is that they have no capacity to assess the quality of what comes back, because the ability to evaluate work in a field is built from the same knowledge as the ability to produce it. This is the core finding of the Dunning-Kruger research: the skills required to be good at something are the same skills required to recognise good, so people lacking them do not just produce weak work, they cannot see that it is weak. AI supercharges this. It hands confident, polished-looking output to people who were previously protected from publishing bad work by the fact that they could not produce any work at all. The frightening part is that none of it will look bad. It will look fine, identically fine, everywhere.

Importantly, many people in this middle group are not deceiving themselves. Plenty know exactly what they are doing and have made a commercial decision that good enough sells. The self-deception risk sits in a narrower band: people who care about the quality of what they put out but have no way of verifying it.

The third group uses AI deliberately as an extension of genuine expertise, while also using it to upskill, to do better work, to avoid becoming the average of the internet. Membership of this group is demonstrated, not declared. And it carries a structural advantage that creates a nasty loop: the people who extract genuinely novel output from these models are almost always those who already have deep domain knowledge, because they know what to ask, what to reject, and where the model’s defaults are wrong. Expertise is what lets you escape the guardrails. But if AI erodes the path to building expertise, fewer people can ever escape them. The average of the internet becomes a ceiling rather than a floor, and most users never know the ceiling is there.

The test that actually matters, and where it gets complicated

In working through this problem, one measure keeps surviving scrutiny where others fall away: are your skills improving independently of the tool?

This works as a first-pass test because it is falsifiable. Write the proposal without the assistant. Diagnose the problem before asking. The gap between your unaided output and your assisted output is measurable, and the direction that gap is moving tells you which group you are actually in, regardless of which one you would claim. If your unaided capability is rising, the tool is genuinely extending you. If it is flat or falling while your output holds steady, it has been a crutch dressed up as an extension, and the difference only shows up when the ground shifts.

But the test needs a refinement, because there is a legitimate use of AI that fails it while being entirely healthy. Consider a developer who spent years building standard websites, then highly customised ones, and who now orchestrates builds on architectures they would never previously have attempted, modern frameworks, edge infrastructure, the lot. The quality of the outcome is rising. The scope of what they can deliver has expanded enormously. It is like having a team at their fingertips. And yet if the tool vanished tomorrow, they could not write that code unaided, because they are not writing it at all. On the strict version of the test, that looks like a crutch. It clearly is not.

The resolution is that the test should target the skill actually being exercised, not the skill being delegated. That developer is not exercising the skill of writing framework code. They are exercising architectural judgement, direction, specification and evaluation, the same skills a good technical lead exercises over a team of engineers whose code they could not personally write line by line. And those skills are improving independently of the tool. They could spec, direct and assess one of these builds far better than they could two years ago. The delegation is safe for a second reason too: they retain the evaluative competence to judge the result. They know whether the site performs, whether the structure is sensible, whether something smells wrong, because that judgement was built over years of doing adjacent work by hand.

So the fuller test has two parts. First, is the skill you are actually exercising, whether that is production or direction, improving independently of the tool? Second, do you retain the evaluative competence to judge what the tool hands back? Delegating work you can evaluate is how every functional team in history has operated. The danger zone is delegating work you can neither produce nor evaluate, which is exactly where the middle group lives.

One further distinction is worth holding onto: this skills test is not the same thing as thinking critically about AI, and the two should not be collapsed into one. The test measures outcomes. Critical awareness is a disposition, and there is a strong argument that awareness alone puts you ahead of the vast majority of users. If you genuinely hold in your head that the model regresses towards the average, that it is confidently wrong in ways designed to be hard to spot, that fluency is not accuracy, and that convenience carries a skill development cost, you are operating with a mental model most people simply do not have. Awareness changes behaviour even passively. You double-check more, you prompt more precisely, you notice when output smells like the average of the internet.

The caveat is that awareness is not immunity. Everyone knows social media is engineered for compulsion, and the knowledge helps far less than it should. Cognitive offloading is particularly sneaky because it does not feel like anything. There is no moment where you notice you did not struggle with the problem. That is the entire point of it. Research backs this up: an MIT Media Lab study using EEG monitoring found that people writing essays with AI assistance showed measurably lower brain connectivity, weaker memory of their own work, and diminished ownership of it, a state the researchers called cognitive debt. A separate Microsoft and Carnegie Mellon study of knowledge workers found that the more people trusted AI, the less critical thinking they applied.

So the cleaner framing is this: awareness gets you into the top tier, and the periodic skills audit keeps you there. One is the entry condition, the other is the maintenance schedule.

Why education is uniquely exposed

Everything above applies to adults. Now run it through a school.

An adult professional can audit themselves because they have a baseline. They know what they could do unaided two years ago. A student has no baseline. They have never known what their unaided capability would have been, so the gap between extension and crutch is not just unexamined for them, it is unmeasurable. Worse, the two-part test fails on both parts at once. Students are not yet exercising a higher-order skill like direction or evaluation, and they have no evaluative competence to judge what the tool hands them, because evaluative competence is built by doing the work. A student using AI is, almost by definition, delegating work they can neither produce nor evaluate. They sit in the danger zone not through any failing, but structurally.

And the skills at risk are foundational ones: the grit built by sitting with a hard problem, the frustration tolerance that transfers into work, health and relationships. These are like compound interest, built early or built with great difficulty later. A generation that never sat with a hard problem for twenty minutes does not just know less. It has a different relationship with difficulty itself. And my genuine fear is that this does not stay contained within schoolwork. Character formed around the avoidance of discomfort permeates every part of a life. Are these kids physically weaker? Are they less capable of sitting in a period of discomfort with no reward in sight, or a reward that is delayed? A mind trained to expect instant resolution does not switch that expectation off at the school gate. This is not really an education risk. It is a character formation risk that happens to run through schools.

The three groups map onto education uncomfortably. Adults sort into them partly through choice and partly through pre-existing skill. Students have not yet had the chance to build the fundamentals that make the third group possible. So the honest question is not which group a student is in. It is which group the education system is manufacturing. On current default settings, generic tools, no verification curriculum, and assessment that rewards output over process, the system is a production line for the middle group. That is not a personal failing at that age. It is the factory default. Membership of the third group for the next generation has to be deliberately built, and almost nowhere is building it.

There is a deeper compatibility problem underneath all of this. The current education system rewards exactly what AI produces: the well-structured, the accurate, the to-the-letter, the verifiable against the rubric. It does not reward the risk taker. It does not reward the kid who sits outside the system. It does not even reward the kid who thinks differently to their teacher. AI lives by the guardrails, and we have built an education system that rewards living by the guardrails. The tool and the system fit together perfectly, and that perfect fit is precisely the problem. Which brings us to incentives.

The incentive problem

The final question this line of thinking arrives at is the sharpest one: is there a point in the chain where people are commercially, financially or otherwise motivated to regress to the middle, and does something need to sit deliberately before that point to counteract it?

The answer appears to be yes, and it is where the whole issue stops being pedagogical and becomes economic. Regression to the middle is not just a cognitive risk. It is commercially rewarded in the short term. The market pays for good enough delivered fast, so the rational move for the average operator is to lean into the tool and ship. Education has its own version of this. Schools are measured on throughput, results and cost per student, and AI improves all three metrics while quietly hollowing out the thing the metrics were meant to measure. A principal who deploys AI tutoring and AI marking looks efficient. The skill debt does not appear on any dashboard they answer to. But then, neither does grit. Neither does attitude. The qualities we value most highly outside of education have always been invisible to a linear system still stuck in the industrial age, a system that measures one dimension and rewards it. What we need is a system that measures and rewards in three dimensions, and AI has just made the gap between what gets measured and what actually matters wider than it has ever been.

Worse, every incentive currently in play points the same direction. Student convenience, teacher workload, school metrics and edtech revenue all reward the same drift towards the middle, and the feedback that would reveal the cost arrives years after the decisions that incurred it. An industry, or a school system, whose unaided capability is quietly declining while its output quality holds steady is in a fragile position, because the quality of the output now depends entirely on the tool. Take the tool away, restrict it, price it out of reach, and there is nothing underneath holding the work up. And there is no dramatic moment where this becomes visible. That is what makes it pernicious. Nothing snaps. Nothing breaks on a particular Tuesday. We just wake up somewhere in the future, look around, and wonder how we got here. Education is running this experiment at civilisational scale.

You cannot fix this by telling people to try harder. The incentives themselves have to change, which means the counterweight has to be built into the structure of the system. The plausible shape of it includes assessment designed to measure unaided capability at key checkpoints, so the baseline problem is fixed by design. It includes verification and calibrated scepticism taught as core curriculum from a young age: question what you read, know where it came from, never regurgitate without thinking. In an AI-saturated world the valuable skill shifts from producing answers to judging them, and almost nobody teaches it. And it includes properly valuing unaided competence: seeking it out, measuring it, rewarding it. Not because unaided work is superior, I would take someone who is excellent at directing AI and assessing its output over raw unaided competence most days of the week. But unaided competence is the foundation that assessment skill is built on, and right now almost nobody is measuring whether it still exists. That correction will come eventually. It will just come slowly, and by the time it arrives, a generation of students will have already passed through the system it was meant to fix.

What society actually pays for

Strip away the job titles and the credentials, and what society has always valued, above almost everything else, is the problem solver. The inventor. The entrepreneur. The doctor. The tradesperson who can look at something broken and see the fix nobody else can. Every one of them, in whatever form they take, is ultimately paid for the same thing: looking at something unresolved and resolving it. Credentials are just proxies we use when we cannot observe the real thing directly.

I watched this play out in my own career. I went from machine operator to running an area, to running the warehouse, to starting an R&D department, to importing, to exporting, to general manager. All of it within four years, in my mid-twenties. Not because I held qualifications for each of those roles, I did not. Because I could solve problems, and problem solving transfers. Each new domain was unfamiliar, but the approach to it was not.

And it is worth being precise about what made that possible, because it refines something from earlier in this article. The expert eye that let me operate across those domains was not subject expertise. I was not an expert in R&D or international trade when I walked into them. It was a portable set of principles: critical thinking, reverse engineering, pulling a problem apart until you find the piece that is actually broken, recognising when an answer does not smell right even before you can articulate why. That is an expert eye of a different kind. Not expertise in the subject, but expertise in the act of solving itself. It is the evaluative competence discussed earlier, in its most transferable form, and it is the closest thing there is to a universal skill.

Now ask what the modern problem solver looks like if we stop building that eye. If the only way you can solve a problem is to have a conversation with AI and accept the output as the solution, with no critical assessment, no pulling it apart, no instinct for when it is wrong, then you are not solving problems at all. You are relaying them. The solution belongs to the tool, and you are interchangeable with anyone else capable of typing the same question. That is what a modern problem solver becomes by default, and the market will eventually price it accordingly.

Because AI does not reduce the value of problem solving. It raises it. When answers are abundant and free, the scarce thing becomes the judgement wrapped around them: framing the right problem, interrogating the output, knowing which answer to reject. Society will keep paying its premium for exactly what it has always paid for. What is at risk is not the demand for problem solvers. It is the supply. And an education system running on the defaults described above is quietly shutting down the production line for the one thing society has never stopped buying.

Where this lands

None of this is an argument against AI in education. It is an argument about speed. The tool amplifies whatever expertise you bring to it. Judging its output requires knowledge students do not yet have. The systems most eager to adopt it are the ones least equipped to measure what it is costing them, and every natural incentive points towards the middle. The corrective has to be built deliberately, and building it takes time, which is the one thing education does not have. Move too slowly, even five years, let alone ten, and you have lost a generation: kids lacking grit, lacking critical thinking, lacking the ability to turn up in the world and do the doing unaided. A generation, in other words, short of problem solvers, the people every society has always been built on. Not a hypothetical generation. The ones sitting in classrooms right now.

For any individual, the useful habit remains the two-part audit: is the skill you are actually exercising improving independently of the tool, and can you genuinely evaluate what it hands back? If the answer is no, that is not a reason to stop using it. It is a reason to look at yourself and ask harder questions. Where do I want to be? What have I accepted about myself? What does success look like long term, not just short term?

Why this matters

This is personal for me. I have kids going through the system right now, and I feel like I am hanging on for dear life to teach them things like grit, against a system that rewards participation, that optimises for the middle, and that never makes them uncomfortable for any lengthy period of time.

And here is where that connects to everything above. Earlier I made the point that the education system rewards exactly what AI produces: the structured, the accurate, the verifiable. Follow that one step further. If everything the system measures is something a machine can now generate, then the measurement itself is broken. An ATAR increasingly tells you how good the output was, not how capable the child is, and the gap between those two things is exactly the width of the tool. Which changes the status of everything the system has never measured. Grit. Attitude. Growth over time. Character. These qualities were always undervalued. Now they are something more than that: they are the only signals left that a machine cannot fake. The three dimensional view of a child is no longer a nice idea for a better system. It is the necessary response to this one.

Now imagine the alternative. Imagine a system that measured grit. That measured attitude. That measured and rewarded behaviour, tracked over years, not terms. Imagine if entry to university and the other pathways and opportunities beyond school relied on that three dimensional view of a child, rather than on who is good at sitting still and regurgitating numbers and quotes, or who is good at using the tools at hand. Imagine assessing a university application and seeing the whole person: this kid was an outstanding community citizen. Their behaviour was ordinary at times, but across the years you can see the trend, growth, learning, hearing feedback and actually putting it into practice. That is character building made visible. Imagine grit and relationships were measured, and on that application you could see the person, not just the As, not just the ATAR. If you were the one assessing those applications, you already know your choices would be different. The people you admitted would be different. You know that in your gut, reading this. So why are we not architecting a system that values it?

I know why. Because it is inherently risky, and it is risky for the leaders and decision makers specifically. We have built a system where they are accountable to everybody, where they answer to everybody, where they cannot go on their gut and cannot take a risk, even slowly, even carefully. So the safe path is the measurable path, and the measurable path is the one dimensional one. The people with the power to change the system are the people the system punishes hardest for trying.

References

Kosmyna, N. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab, preprint. arxiv.org/abs/2506.08872

Lee, H-P. et al. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. Microsoft Research and Carnegie Mellon University, CHI 2025. dl.acm.org/doi/10.1145/3706598.3713778

Shumailov, I. et al. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755-759. nature.com/articles/s41586-024-07566-y

Kruger, J. and Dunning, D. (1999). Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments. Journal of Personality and Social Psychology, 77(6). doi.org/10.1037/0022-3514.77.6.1121