Agents, Copilots, and the Return of Waterfall

Claude Code is just offshoring for the rest of us

Jan 26, 2026

I have a confession to make: I am incredibly lazy.

If there’s a shortcut, I’ll take it. If there is a tool that promises to do my work while I nap, I will buy it. When the new wave of “agentic AI” tools started dropping (tools like Claude Code, Devin, or AutoGPT), I was first in line. The pitch was incredible: Don’t just get help writing the code. Get an agent that writes the code, runs the tests, fixes the errors, and deploys it while you watch Netflix.

Trigger warning: this post will have a lot of “grumpy old man” energy.

The Angry Photographer: aka 'Old Man Yells at Cloud' - Mark Galer

I remember the last time we were promised a magical solution to software development. It was the early 2000s, and the promise was “outsourcing.” We were told that if we just wrote a detailed enough spec, we could hand it to a team in a different time zone, pay them a fraction of the cost, and wake up to a finished product.

We all know how that ended. We spent big bucks to build products that checked every box in the spec but failed in the market. I recall endlessly revising documents to explain “user delight” to a team that had never used the product, only to get back code that was technically functional but excruciating to use.

The good news is, today’s non-technical founders aren’t dropping $100k on an agency. They are dropping $20/month on Claude. But the bad news is the industry is lying to itself again.

I see founders convincing themselves that the reason outsourcing failed was the people. They think, “If I can write the perfect prompt and include the right context, the right skills, and use enough tokens, I’ll get it right,” and “Well, the offshore team didn’t understand my vision, but Claude Code does! Claude is smart!”

But outsourcing didn’t fail because of the people. It failed because the handoff is a lie.

The return of Waterfall

I recently tried to use Claude Code for a simple task, a Python script to scrape some data (well, a lot of data). I spent 20 minutes crafting the perfect prompt (the modern PRD). I set up the permissions. I hit enter. I leaned back.

Two hours later, I came back to a mess. I had 14 new files, 600 lines of code, and a confident README that described a script that didn’t quite work.

The agent had technically done what I asked. It wrote code. It passed tests. But it had built the wrong feature.

For example, the dataset had missing timestamps. A human partner would have asked, “Should we drop these rows or infer the time based on the sequence?” The agent just dropped them. It was a valid code decision - the script didn’t crash - but it made the time-of-day analysis I wanted to do impossible.

What it built was technically correct, but practically useless.

We’ve confused toil with thought.

We are trying to outsource the hard work of thinking to a machine that is designed to satisfy a prompt, not solve a problem. We are recreating waterfall development.

It feels like speed. It looks like shipped features. But it’s actually just “unplanned work” waiting to explode: the firefighting that destroys your roadmap. Agents are efficient engines for generating unplanned work. They let you ship “planned” code today that requires a forensic investigation to debug tomorrow.

Agents fail when you use them as a batch processor for ambiguous work. You trade the tight feedback loop of a copilot for the blind handoff of an agent.

When you use an agent to do the work for you, you skip the friction. You skip the learning. You get “alibi progress,” a pile of artifacts (code, text, slides) that look like work, but contain no actual understanding.

The “micro-Boeing“ moment

We’ve seen this movie before.

In the early 2000s, after the merger with McDonnell Douglas, Boeing’s culture shifted. The new finance-driven leadership didn’t want to run an engineering company; they wanted to run a “business.” And in a business, you reduce costs.

For the 787 Dreamliner, they decided that “integration” was just another cost center to be slashed.

They didn’t just outsource the parts (Boeing had always bought tires, landing gear, etc), they outsourced the integration (assembly). They told their “Tier 1” suppliers: “Here is the spec. You figure out how to build the entire fuselage section. Just send us the finished product.”

The finance team loved it. It looked efficient on a spreadsheet. But they had made a mistake: They had outsourced the struggle.

In the olden days, Boeing engineers fought with suppliers, argued over tolerances, and battled over how the wiring harness would fit through the bulkhead. That friction wasn’t waste; that friction was where the quality came from.

By handing off the integration, they removed the moment (and the incentive) for the different parts of the process to struggle together.

The result was a mess. It started with 787 sections that arrived at the factory and didn’t fit. It festered into a culture where nobody owned the whole. And it ended, twenty years later, when a cabin door plug blew out of a 737 MAX mid-flight. The door plug failure was symptomatic of the same pattern: when you separate the people doing the work from the people designing the work and from the people bringing it all together, you lose the ability to catch the small things that become big things.

My Python script was a “micro-Boeing” moment.

When I handed off the task to the agent, I created what manufacturing folks call traveled work: the work comes back without the why, and now you’re paying a tax to reconstruct the context.

This is an architectural failure, not just a process one. We forget that software architecture mirrors our org structure. When you hand off a task to an Agent, you are effectively creating a silo of one. You sever the high-bandwidth communication loops required to discover, understand, and solve complex problems. You aren’t just outsourcing the task; you are breaking the architecture.

Polanyi’s Paradox explains why: We know more than we can tell.

I can’t prompt my taste. I can’t prompt my risk tolerance. I can only apply it while I’m watching the work happen. When you pair-program or pair-write with a copilot, you transfer that knowledge in real-time. When you hand off to an agent, that knowledge is lost.

The partner and the associate

So, are agents useless? Absolutely not. I use them every day. But to use them effectively, I’ve found I have to treat them very differently depending on the mode I’m in.

When I use a copilot, it feels like pairing. I’m still driving; it’s just faster to explore options. I type a little, it suggests a little, I accept some, reject most, and the real work is that constant nudging (my judgment is in the loop every few seconds).

The authors of Apprenticeship Patterns call this “Rubbing Elbows.” You learn by sitting next to a master (or just a peer) and watching how they work. When I use a Copilot, I am rubbing elbows with the model. I am learning from its suggestions, and it is constrained by my intent.

An agent is different. An agent is a handoff. You point it at a task, go make coffee, and come back to a finished-looking artifact. That can be amazing, until you realize it made ten “reasonable” choices you never would have made, and now you’re debugging someone else’s mental model.

It’s like delegating to a brilliant first-year associate: they are smart, diligent, and eager to please, but they optimize for “finishing the task,” not “solving the problem.” And because you treat them as an expert, you don’t supervise them. You let them work all night, only to discover they produced the perfect solution to the wrong problem.

The mistake we make is assuming the associate can do the partner’s job just because they work hard and talk smart.

The commodity trap

Lately, I’ve noticed a pattern in myself. If I can build it in Claude Code, it might be cool, and it might solve a problem, but I’m likely not fundamentally adding value in a strategic way. (Check out my site https://goodmon.day to see what I mean.)

In engineering strategy, we distinguish between “Core” (what differentiates you) and “Context” (what you need to exist, like logging or payroll). Agents are incredible at Context. They can scaffold a boilerplate app in seconds. But if you use them for Core - if you let them define your unique value proposition - you are admitting that your Core is generic.

If I can describe the solution so perfectly that an agent can build it without my intervention, then I am building a commodity.

Satisficing vs. differentiation

The heuristic comes down to one question: Is “good enough” actually good enough?

Use agents for satisficing (the toil). There are parts of your job where “perfect” is a waste of time. Formatting a messy JSON blob from a legacy API. Writing unit tests for a stable function. Summarizing a 60-minute meeting transcript. Converting a Jira ticket into a release note. In these cases, the outcome is binary. It works, or it doesn’t. If the agent gets it 90% right, the cost of fixing the last 10% is low. Delegate the toil.

Use copilots for differentiation (the strategy). Then there is the work where you are making substantial bets: Defining the product vision, architecting a new system, or writing a difficult email to a churning client. In these cases, “satisficing” is failure - if your strategy is just “average,” you lose.

This doesn’t mean you can’t use agents within strategic work, like drafting copy, summarizing research, or enumerating edge cases. But you have to keep the core judgment loops tight. The agent tries to minimize friction to get to “Done.” The copilot forces you to engage with the friction to get to “Good.”

Shadow competency

The final danger is shadow competency.

When TSB Bank had its massive IT meltdown in 2018, locking millions of customers out of their accounts, the post-mortem was fascinating. They had outsourced so much of their IT that they no longer had the ability to judge if the work was good. They had lost their shadow competency, the ability to inspect the work of others.

The classic book Pragmatic Programmer advises using “Tracer Bullets, “ code that penetrates all layers of the system to test feasibility (I love this model). When you write the code or the email yourself, you’re firing tracer bullets. You feel the friction of the database; you see the latency of the API, you notice the awkward phrasing. To maintain shadow competency, you need to do enough manual reps to know what “good” actually looks like. You need to read the code. You need to rewrite the draft. You need to ask, “What would make this wrong?” before you ship.

Why does this matter? Because if you don’t know what good looks like, you can’t spot what bad looks like.

The tl;dr

If you’re a product manager, tech leader, or founder, your value isn’t your output. Your value is your judgment.

When you hand off a task to an agent, you are trading your judgment for speed. Sometimes, that’s a great trade (fixing Excel errors). But for the work that matters, the work that requires empathy, insight, and strategy, that trade hurts you.

Don’t try to outsource your thinking. Use agents to clear the brush. Use copilots to climb the mountain. And know the difference.

Try this:

Try the Surprise Test. Look at the last piece of AI-generated work you used. If you were surprised by the output and just accepted it, you are likely in the “satisficing” trap. If you argued with the AI, refined the output, and caught a nuance it missed, you’re doing it right.

Then, audit your friction. If you spent longer writing the prompt than you would have spent doing a rough first draft, you tried to waterfall a creative process that only reveals itself while you’re building it. That’s your cue to switch modes.

Finally, define your “associate” tasks. Pick five recurring chores that don’t require judgment, set up an agent workflow for those, and keep the rest close—because that’s where your taste and strategy actually live.

Vision and Values

Discussion about this post

Ready for more?