Education Rewards Exactly What AI Produces. That Should Worry You.

AI makes you better at the things you are already good at. If you are a subject matter expert, it sharpens and extends you. If you are not, you run into an immediate problem: it is very difficult to know what a good result looks like. You can generate output all day, but you cannot judge it, and the tool will not tell you when it has handed you something mediocre, because mediocre is precisely what it produces by default.

Someone working in education later put the same idea to me more simply, calling AI a force multiplier for your existing skills and knowledge. It is a more elegant phrasing of the same observation. A multiplier is neutral. Applied to strong fundamentals, it produces excellent work. Applied to weak fundamentals, it produces mediocrity at scale, delivered faster and more confidently than ever before. The multiplication is not the problem. The input is.

This article works through where that leads: for professionals, for industries, and finally for education, which turns out to be the most exposed institution of all.

Three groups of users

In practice, people using AI seem to sort into three broad groups.

The first are the abstainers. Some are genuine Luddites who refuse to engage with the technology at all. But it is worth separating them from a smaller emerging group of deliberate abstainers, particularly in writing and craft trades, who understand AI perfectly well and are refusing it strategically. They are betting that verified human work becomes a premium category, the way handmade furniture did after mass manufacturing. Whether the bet pays off is open, but it is a considered position, not ignorance.

The second group is the large middle, and it is bigger than the obvious examples suggest. The visible version is the average operator producing average output at volume. In web development, the pattern is already established: AI helps ordinary developers spin up generic sites with generic content, those sites feed back into the data AI learns from, and the next generation of output is slightly more generic again. This recursive spiral has a name in the research world, model collapse. Controlled studies have shown that models trained on their own output degrade generation by generation, losing the distinctive, expert material first while everything regresses towards a bland centre.

But the middle group extends well beyond people producing work in their own trade. It includes everyone using AI to produce things they have never been able to produce themselves: the manager generating graphics with no design background, the tradesperson writing marketing emails with no copywriting experience, the small business owner producing reports, proposals and social content across half a dozen disciplines they have never worked in. The problem is not that they are using the tool. It is that they have no capacity to assess the quality of what comes back, because the ability to evaluate work in a field is built from the same knowledge as the ability to produce it.

This is the core finding of the Dunning-Kruger research: the skills required to be good at something are the same skills required to recognise good, so people lacking them do not just produce weak work, they cannot see that it is weak. AI supercharges this. It hands confident, polished-looking output to people who were previously protected from publishing bad work by the fact that they could not produce any work at all. The frightening part is that none of it will look bad. It will look fine, identically fine, everywhere.

Importantly, many people in this middle group are not deceiving themselves. Plenty know exactly what they are doing and have made a commercial decision that good enough sells. The self-deception risk sits in a narrower band: people who care about the quality of what they put out but have no way of verifying it.

The third group uses AI deliberately as an extension of genuine expertise, while also using it to upskill, to do better work, to avoid becoming the average of the internet. Membership of this group is demonstrated, not declared. And it carries a structural advantage that creates a nasty loop: the people who extract genuinely novel output from these models are almost always those who already have deep domain knowledge, because they know what to ask, what to reject, and where the model’s defaults are wrong.

Expertise is what lets you escape the guardrails. But if AI erodes the path to building expertise, fewer people can ever escape them. The average of the internet becomes a ceiling rather than a floor, and most users never know the ceiling is there.

The test that actually matters, and where it gets complicated

In working through this problem, one measure keeps surviving scrutiny where others fall away: are your skills improving independently of the tool?

This works as a first-pass test because it is falsifiable. Write the proposal without the assistant. Diagnose the problem before asking. The gap between your unaided output and your assisted output is measurable, and the direction that gap is moving tells you which group you are actually in, regardless of which one you would claim. If your unaided capability is rising, the tool is genuinely extending you. If it is flat or falling while your output holds steady, it has been a crutch dressed up as an extension, and the difference only shows up when the ground shifts.

But the test needs a refinement, because there is a legitimate use of AI that fails it while being entirely healthy. Consider a developer who spent years building standard websites, then highly customised ones, and who now orchestrates builds on architectures they would never previously have attempted, modern frameworks, edge infrastructure, the lot. The quality of the outcome is rising. The scope of what they can deliver has expanded enormously. It is like having a team at their fingertips. And yet if the tool vanished tomorrow, they could not write that code unaided, because they are not writing it at all. On the strict version of the test, that looks like a crutch. It clearly is not.

The resolution is that the test should target the skill actually being exercised, not the skill being delegated. That developer is not exercising the skill of writing framework code. They are exercising architectural judgement, direction, specification and evaluation, the same skills a good technical lead exercises over a team of engineers whose code they could not personally write line by line. And those skills are improving independently of the tool. They could spec, direct and assess one of these builds far better than they could two years ago. The delegation is safe for a second reason too: they retain the evaluative competence to judge the result. They know whether the site performs, whether the structure is sensible, whether something smells wrong, because that judgement was built over years of doing adjacent work by hand.

So the fuller test has two parts. First, is the skill you are actually exercising, whether that is production or direction, improving independently of the tool? Second, do you retain the evaluative competence to judge what the tool hands back? Delegating work you can evaluate is how every functional team in history has operated. The danger zone is delegating work you can neither produce nor evaluate, which is exactly where the middle group lives.

One further distinction is worth holding onto: this skills test is not the same thing as thinking critically about AI, and the two should not be collapsed into one. The test measures outcomes. Critical awareness is a disposition, and there is a strong argument that awareness alone puts you ahead of the vast majority of users. If you genuinely hold in your head that the model regresses towards the average, that it is confidently wrong in ways designed to be hard to spot, that fluency is not accuracy, and that convenience carries a skill development cost, you are operating with a mental model most people simply do not have. Awareness changes behaviour even passively. You double-check more, you prompt more precisely, you notice when output smells like the average of the internet.

The caveat is that awareness is not immunity. Everyone knows social media is engineered for compulsion, and the knowledge helps far less than it should. Cognitive offloading is particularly sneaky because it does not feel like anything. There is no moment where you notice you did not struggle with the problem. That is the entire point of it. Research backs this up: an MIT Media Lab study using EEG monitoring found that people writing essays with AI assistance showed measurably lower brain connectivity, weaker memory of their own work, and diminished ownership of it, a state the researchers called cognitive debt. A separate Microsoft and Carnegie Mellon study of knowledge workers found that the more people trusted AI, the less critical thinking they applied.

So the cleaner framing is this: awareness gets you into the top tier, and the periodic skills audit keeps you there. One is the entry condition, the other is the maintenance schedule.

Why education is uniquely exposed

Everything above applies to adults. Now run it through a school.

An adult professional can audit themselves because they have a baseline. They know what they could do unaided two years ago.

A student has no baseline. They have never known what their unaided capability would have been, so the gap between extension and crutch is not just unexamined for them, it is unmeasurable.

Worse, the two-part test fails on both parts at once. Students are not yet exercising a higher-order skill like direction or evaluation, and they have no evaluative competence to judge what the tool hands them, because evaluative competence is built by doing the work. A student using AI is, almost by definition, delegating work they can neither produce nor evaluate. They sit in the danger zone not through any failing, but structurally.

And the skills at risk are foundational ones: the grit built by sitting with a hard problem, the frustration tolerance that transfers into work, health and relationships. These are like compound interest, built early or built with great difficulty later.

A generation that never sat with a hard problem for twenty minutes does not just know less. It has a different relationship with difficulty itself. And my genuine fear is that this does not stay contained within schoolwork. Character formed around the avoidance of discomfort permeates every part of a life.

Are these kids physically weaker? Are they less capable of sitting in a period of discomfort with no reward in sight, or a reward that is delayed? A mind trained to expect instant resolution does not switch that expectation off at the school gate. This is not really an education risk. It is a character formation risk that happens to run through schools.

The three groups map onto education uncomfortably. Adults sort into them partly through choice and partly through pre-existing skill. Students have not yet had the chance to build the fundamentals that make the third group possible. So the honest question is not which group a student is in. It is which group the education system is manufacturing. On current default settings, generic tools, no verification curriculum, and assessment that rewards output over process, the system is a production line for the middle group. That is not a personal failing at that age. It is the factory default. Membership of the third group for the next generation has to be deliberately built, and almost nowhere is building it.

There is a deeper compatibility problem underneath all of this. The current education system rewards exactly what AI produces: the well-structured, the accurate, the to-the-letter, the verifiable against the rubric. It does not reward the risk taker. It does not reward the kid who sits outside the system. It does not even reward the kid who thinks differently to their teacher.

AI lives by the guardrails, and we have built an education system that rewards living by the guardrails. The tool and the system fit together perfectly, and that perfect fit is precisely the problem.

Which brings us to incentives.

The incentive problem

The final question this line of thinking arrives at is the sharpest one: is there a point in the chain where people are commercially, financially or otherwise motivated to regress to the middle, and does something need to sit deliberately before that point to counteract it?

The answer appears to be yes, and it is where the whole issue stops being pedagogical and becomes economic. Regression to the middle is not just a cognitive risk. It is commercially rewarded in the short term. The market pays for good enough delivered fast, so the rational move for the average operator is to lean into the tool and ship. Education has its own version of this. Schools are measured on throughput, results and cost per student, and AI improves all three metrics while quietly hollowing out the thing the metrics were meant to measure. A principal who deploys AI tutoring and AI marking looks efficient. The skill debt does not appear on any dashboard they answer to. But then, neither does grit. Neither does attitude. The qualities we value most highly outside of education have always been invisible to a linear system still stuck in the industrial age, a system that measures one dimension and rewards it. What we need is a system that measures and rewards in three dimensions, and AI has just made the gap between what gets measured and what actually matters wider than it has ever been.

Worse, every incentive currently in play points the same direction. Student convenience, teacher workload, school metrics and edtech revenue all reward the same drift towards the middle, and the feedback that would reveal the cost arrives years after the decisions that incurred it. An industry, or a school system, whose unaided capability is quietly declining while its output quality holds steady is in a fragile position, because the quality of the output now depends entirely on the tool. Take the tool away, restrict it, price it out of reach, and there is nothing underneath holding the work up. And there is no dramatic moment where this becomes visible. That is what makes it pernicious. Nothing snaps. Nothing breaks on a particular Tuesday. We just wake up somewhere in the future, look around, and wonder how we got here. Education is running this experiment at civilisational scale.

You cannot fix this by telling people to try harder. The incentives themselves have to change, which means the counterweight has to be built into the structure of the system. The plausible shape of it includes assessment designed to measure unaided capability at key checkpoints, so the baseline problem is fixed by design. It includes verification and calibrated scepticism taught as core curriculum from a young age: question what you read, know where it came from, never regurgitate without thinking.

In an AI-saturated world the valuable skill shifts from producing answers to judging them, and almost nobody teaches it.

And it includes properly valuing unaided competence: seeking it out, measuring it, rewarding it. Not because unaided work is superior, I would take someone who is excellent at directing AI and assessing its output over raw unaided competence most days of the week. But unaided competence is the foundation that assessment skill is built on, and right now almost nobody is measuring whether it still exists. That correction will come eventually. It will just come slowly, and by the time it arrives, a generation of students will have already passed through the system it was meant to fix.

What society actually pays for

Strip away the job titles and the credentials, and what society has always valued, above almost everything else, is the problem solver. The inventor. The entrepreneur. The doctor. The tradesperson who can look at something broken and see the fix nobody else can. Every one of them, in whatever form they take, is ultimately paid for the same thing: looking at something unresolved and resolving it. Credentials are just proxies we use when we cannot observe the real thing directly.

I watched this play out in my own career. I went from machine operator to running an area, to running the warehouse, to starting an R&D department, to importing, to exporting, to general manager. All of it within four years, in my mid-twenties. Not because I held qualifications for each of those roles, I did not. Because I could solve problems, and problem solving transfers. Each new domain was unfamiliar, but the approach to it was not.

And it is worth being precise about what made that possible, because it refines something from earlier in this article. The expert eye that let me operate across those domains was not subject expertise. I was not an expert in R&D or international trade when I walked into them. It was a portable set of principles: critical thinking, reverse engineering, pulling a problem apart until you find the piece that is actually broken, recognising when an answer does not smell right even before you can articulate why. That is an expert eye of a different kind. Not expertise in the subject, but expertise in the act of solving itself. It is the evaluative competence discussed earlier, in its most transferable form, and it is the closest thing there is to a universal skill.

Now ask what the modern problem solver looks like if we stop building that eye. If the only way you can solve a problem is to have a conversation with AI and accept the output as the solution, with no critical assessment, no pulling it apart, no instinct for when it is wrong, then you are not solving problems at all. You are relaying them. The solution belongs to the tool, and you are interchangeable with anyone else capable of typing the same question. That is what a modern problem solver becomes by default, and the market will eventually price it accordingly.

Because AI does not reduce the value of problem solving. It raises it. When answers are abundant and free, the scarce thing becomes the judgement wrapped around them: framing the right problem, interrogating the output, knowing which answer to reject. Society will keep paying its premium for exactly what it has always paid for. What is at risk is not the demand for problem solvers. It is the supply. And an education system running on the defaults described above is quietly shutting down the production line for the one thing society has never stopped buying.

Where this lands

None of this is an argument against AI in education. It is an argument about speed. The tool amplifies whatever expertise you bring to it. Judging its output requires knowledge students do not yet have. The systems most eager to adopt it are the ones least equipped to measure what it is costing them, and every natural incentive points towards the middle. The corrective has to be built deliberately, and building it takes time, which is the one thing education does not have.

Move too slowly, even five years, let alone ten, and you have lost a generation: kids lacking grit, lacking critical thinking, lacking the ability to turn up in the world and do the doing unaided. A generation, in other words, short of problem solvers, the people every society has always been built on. Not a hypothetical generation. The ones sitting in classrooms right now.

For any individual, the useful habit remains the two-part audit: is the skill you are actually exercising improving independently of the tool, and can you genuinely evaluate what it hands back? If the answer is no, that is not a reason to stop using it. It is a reason to look at yourself and ask harder questions. Where do I want to be? What have I accepted about myself? What does success look like long term, not just short term?

What schools should actually do

One filter sits over everything: every classroom use of AI has to pass the two part test. Which skill is the student actually exercising, and are they building the ability to judge what comes back? Fail both, and it doesn’t belong in a school. That single rule kills most of what currently passes for AI integration, which is usually just output acceleration.

The analogue foundation

Grind out the foundation years in analogue. Put the computers away. Write things down. Assess by hand, in person, in controlled environments.

This isn’t nostalgia, it’s the training method. Norwegian EEG research found handwriting activates far more of the brain’s memory and learning networks than typing. The well known laptop studies found students taking notes by hand retained more than typists, because handwriting is slow enough to force processing instead of transcription. The slowness is the mechanism. The friction is where the learning lives.

Controlled handwritten assessment also solves this article’s hardest problem: the baseline. A kid who sits handwritten assessments through primary and middle school has an unfakeable record of unaided capability. The checkpoint system doesn’t need inventing. It needs un-abandoning.

Aviation solved this decades ago. Autopilot flies most of every flight, yet pilots train manually first and are required to keep hand flying, because when the automation fails, the human needs the manual skill to catch it. Automation made the manual skill more important, not less. That’s the answer to any parent who says analogue is preparing kids for a world that no longer exists. It’s preparing them to be the pilot, not the passenger.

The principle is go slow to go fast. Analogue is slower than getting the answer now, but it builds the skills that later make a kid sharp with the tool: better prompts, better judgement of what comes out. It’s a compound interest bet, and compound interest looks like losing at the start. Sold as deprivation, it dies at the first parent survey. Sold as strength training, the way athletes do slow drills before game speed, it becomes a differentiator parents seek out.

Earn the tool

AI access should be earned, not given. Nobody hands a seven year old a calculator and skips arithmetic. A driver’s licence is staged, earned and revocable. AI should work the same way: unlock the tool in a domain by proving unaided capability first. Want AI on essays? Show me you can write one alone.

That makes grit intrinsic to the system instead of a lecture. Struggle becomes the price of the power tool, which is how motivation already works in every game they play. The tool amplifies the trained, so training comes first. That’s how you build warriors with the tool in their hand, not dependents waiting for the easy answer. A warrior earns the weapon through drill. A weapon in untrained hands is a liability with a handle.

The teacher becomes the guide

A teacher’s authority used to rest on knowing more than the student. AI ended that. Any kid with a phone can out recall any teacher. But knowing more was never the real job.

The kid is the hero of this story. The teacher is the guide. And a guide’s authority isn’t built on information, it’s built on relationships and wisdom. Knowing the kid. Knowing the path ahead of them. Knowing when to push, when to let them sit in the struggle, and when to step in. John Maxwell, the leadership writer, said it best: people don’t care how much you know until they know how much you care.

That’s the one job in the room AI can’t touch. The machine has all the answers and no idea who your kid is. It doesn’t know what they’re carrying from home this week. It doesn’t care whether they show up. A guide does. Decades of research say the teacher-student relationship is one of the strongest predictors of how a kid turns out, and it’s the part of teaching AI makes more valuable, not less.

Put the student above the tool

Dependence is asking AI for answers. Agency is directing and judging it. So flip the default: students mark AI’s work more than they receive it. Hand them AI output on something they know deeply and make them find what’s wrong. Run error hunts on questions the model fumbles. A kid who has caught AI being confidently wrong five times has learned something no lecture delivers: fluency isn’t accuracy.

Teach prompting as briefing a contractor: specify, constrain, reject, iterate. And make it real: if you submit it, you defend it. Any line, any claim, explain it or lose it. Judgement can’t be delegated, even when the drafting is.

The sequence

Primary is foundation: analogue, unaided, physical. Middle school is apprenticeship: supervised use, marking the machine’s work, the first earned licences. Senior years are mastery: real orchestration, full access, every submission defended in person. Unaided checkpoints run the whole way, so the baseline is never lost and the licence never unconditional.

This is grit as a property of the system, not a poster. Struggle gates the tool. Checkpoints are unaided. Defence is personal. At no point is the lazy path also the rewarded path, which is the exact failure of the current setup.

Why this matters

This is personal for me. I have kids going through the system right now, and I feel like I am hanging on for dear life to teach them things like grit, against a system that rewards participation, that optimises for the middle, and that never makes them uncomfortable for any lengthy period of time.

And here is where that connects to everything above. Earlier I made the point that the education system rewards exactly what AI produces: the structured, the accurate, the verifiable. Follow that one step further.

If everything the system measures is something a machine can now generate, then the measurement itself is broken.

A tertiary entrance rank increasingly tells you how good the output was, not how capable the student is, and the gap between those two things is exactly the width of the tool. Which changes the status of everything the system has never measured. Grit. Attitude. Growth over time. Character. These qualities were always undervalued. Now they are something more than that: they are the only signals left that a machine cannot fake.

The three dimensional view of a student is no longer a nice idea for a better system. It is the necessary response to this one.

Now imagine the alternative. Imagine a system that measured grit. That measured attitude. That measured and rewarded behaviour, tracked over years, not terms. Imagine if entry to university and the other pathways and opportunities beyond school relied on that three dimensional view of a student, rather than on who is good at sitting still and regurgitating numbers and quotes, or who is good at using the tools at hand. Imagine assessing a university application and seeing the whole person: this kid was an outstanding community citizen. Their behaviour was ordinary at times, but across the years you can see the trend, growth, learning, hearing feedback and actually putting it into practice. That is character building made visible. Imagine grit and relationships were measured, and on that application you could see the person, not just the As, not just the tertiary entrance rank. If you were the one assessing those applications, you already know your choices would be different. The people you admitted would be different. You know that in your gut, reading this. So why are we not architecting a system that values it?

I know why. Because it is inherently risky, and it is risky for the leaders and decision makers specifically. We have built a system where they are accountable to everybody, where they answer to everybody, where they cannot go on their gut and cannot take a risk, even slowly, even carefully. So the safe path is the measurable path, and the measurable path is the one dimensional one. The people with the power to change the system are the people the system punishes hardest for trying.

References

Kosmyna, N. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab, preprint. arxiv.org/abs/2506.08872

Lee, H-P. et al. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. Microsoft Research and Carnegie Mellon University, CHI 2025. dl.acm.org/doi/10.1145/3706598.3713778

Shumailov, I. et al. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755-759. nature.com/articles/s41586-024-07566-y

Kruger, J. and Dunning, D. (1999). Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments. Journal of Personality and Social Psychology, 77(6). doi.org/10.1037/0022-3514.77.6.1121