Still Using AI to Develop a Software Product
An Experience Report, One Year On (v26.6)
A year ago I wrote Using AI to Develop a Software Product, an experience report on how AI tooling has impacted the development of Lighthouse. Twelve months on, we’re still at it, but almost everything about the how has changed.
Back then, my rule was simple: never commit code I haven't read. Today, I more often than not don’t look at the code. The AI space is moving fast, and while I don’t claim to know everything (I certainly don’t), I want to share in this follow-up my personal experience: how do I work, what changed and what I’d actually recommend.
Previously, on Lighthouse…
If you missed last year’s post or don’t remember anymore, here’s the tldr; version. I’m a backend developer by instinct who started Lighthouse to solve a problem I had and to make sure I stayed in touch with the latest tech. And while I started “pre-AI”, the AI tooling is what enabled me to build a full product on my own (including a decent Frontend!). That post traced three evolutionary steps I went through:
Chat tools like ChatGPT, to get unstuck faster than constantly checking Stack Overflow (also those tools tend to not insult you, unlike many active people on Stack Overflow…)
IDE-integrated Copilot, which killed the constant context-switching
Agent Mode, the big leap, where Copilot started figuring things out by itself
I closed that post by drawing a firm line between what I was doing and “vibe coding”: I still read every line, so I wasn’t vibing. That’s something that started to get blurry...
How Things Evolved
The field of AI-assisted development moves fast. New tools, workflows, and ideas show up constantly (and plenty of them quietly disappear again). Back in June 2025, I was “just” running Agent Mode in VS Code. I wasn’t using any of this:
Specialized agents working together in a defined workflow
MCP servers to hook into external services
Skills
Anything to keep the context window under control
Any tool other than GitHub Copilot inside VS Code
Some of that didn’t exist yet. Some of it I just hadn’t heard of. And while it’s tempting to jump on whatever the LinkedIn AI experts are calling a “must” this week, I took my time: tried things, kept what worked, dropped what didn’t, and kept experimenting.
Because it was never about which tools you use (and don’t get me started on the idiotic trend of companies measuring who uses AI the most).
It’s about whether you get the results you want, at the quality you need.
The first step to getting more quality and even higher speed was a workflow built with specialized agents.
A Team of Agents
Starting from a (since-abandoned*) GitHub repo, I set up a workflow of specialized agents, each with one job:
Planner
Analyst
Architect
Critic
Security
Implementer
Retrospective
Each agent has a clear description of what it should do (and what it shouldn’t), and you can even assign a different model to each one, since some models are better at certain tasks than others.
The clever part is that agents hand off work to each other, which keeps the whole thing flowing. And every agent writes its findings into a markdown document before handing over. That gave me two things: a reviewable artifact at each step, so I could check and correct an agent’s output before the next one picked it up; and a clean way to manage the context window, because the next agent reads what it needs from the markdown, letting me clear context far more often. Less context window usage, less drift.
This worked well enough that I started trusting agents with bigger features. Until then I’d mostly handed them frontend work. Now I let them loose on the backend too. The results were ok. I still had to fix things here and there, but it was another real jump in how fast we could ship without giving up quality.
*One big issue I see with a lot of vibe-coded tools that pop up is not the fact that they are vibe-coded. But that they are not maintained. It’s nice (and easy) to create something. But will your thing still be available in 2 months? Will you accept feedback or code changes? Or will you abandon it in two weeks? So be careful if you choose a tool for serious work. Don’t use idiotic vanity metrics like GitHub Stars, but check how active the project was over time.
The Quality Jump
Towards the end of 2025, Anthropic released Opus 4.5, and the quality of what came out the other end of the prompt jumped noticeably. I went from regularly correcting the generated code to only occasionally stepping in. The clearest sign of the shift: Instead of writing code, I was spending an equal amount of time reading and modifying Markdown files.
That had a great side-effect. With less time spent wrestling the code, I had more time to spend engaging with our users. Feedback came in and got woven back into the product quickly, which for a two-person company is worth more than any raw speed gain. It leads to a virtuous cycle: Better understanding of our user base, low cycle times to release new functionality.
Calling Home?
An agent that can only see your code is missing half the picture. The Model Context Protocol (MCP) is a standard way to give it more: servers that let an agent pull information from, and act on, the systems you work in outside your editor.
The one I lean on most for Lighthouse is the Azure DevOps MCP server. A lot of the context for a feature already lives in our backlog, in the descriptions we wrote on the work items. Instead of copy-pasting that into a prompt, the agent reads it straight from ADO. So when I ask it to build something, it starts from what we actually specified, not from my hasty paraphrase of it.
I also run the SonarQube MCP server, which lets an agent fetch our quality gate results and see exactly what failed. That one really earns its keep inside a command I’ll come to shortly, so I’ll save the details for there.
Claude Code and nWave
At the end of April, GitHub (or better, Microsoft) changed Copilot's pricing model in a way that stopped making sense for me. So I cancelled and switched to Claude Code. It was something I'd been wanting to try anyway; the pricing change just gave me the push I needed.
The real reason I wanted to make the move was nWave. nWave.ai is a plugin that runs inside Claude Code and wraps a structured agent workflow around proper software engineering practices. It breaks feature delivery into seven waves: Discover, Diverge, Discuss, Design, DevOps, Distill, Deliver. With specialized agents that produce artifacts at each stage. Human judgment is required before the next wave begins; the machine never runs unsupervised end-to-end.
When I say "proper software engineering", here's what I mean concretely: nWave's Distill wave produces acceptance tests before any production code is written. The Deliver wave then implements with inside-out TDD, enforced at the hook level. The agents literally can't skip it. It's the discipline I'd want a human team to follow, applied by the agents.
Yes, it's token-hungry. But the results genuinely impressed me. I haven't hand-written a line of production code since I started. And here's the part that would have made last-year me uncomfortable: I gained enough confidence over time that I stopped reviewing the code at all.
Now, if a feature will touch the heart of our software, say the forecasting mechanic or how we get the data, I will check it. But so far, these parts have been pretty stable as of late. And with most other things, I’m trusting the tooling enough to do it’s job.
So, Is This Vibe Coding Now?
A year ago I drew the line right here: I read the code; therefore I wasn't vibe coding. I don't read it anymore. Am I vibing now?
I (can) read code; therefore I am (a Software Engineer)?
I don't think so, and the reason matters. What kept me out of "vibe coding" was never the act of reading every line. It was the discipline around the code. With nWave, that discipline didn't disappear; it moved into the workflow itself:
Discover / Discuss: Jobs to be done, personas, figuring out what to build, slicing it thin
Design: Solution design within the existing architecture
Distill: Acceptance tests written before any production code
Deliver: TDD implementation, inside-out, enforced by hooks
I’m not eyeballing a diff and hoping. I’m trusting a process whose guardrails I helped define, and I review the outcomes it produces rather than the lines it wrote to get there. That is not the same as not caring.
The responsibility is still mine, and I don’t take that lightly. Quality gates are in place both inside the agent framework and outside it (don’t do deterministic things with probabilistic tools). The pipeline keeps everything green.
Software Engineering was never about the act of writing code. It was always about shaping systems, understanding the problem, making deliberate design decisions, setting the quality bar, and staying accountable for what ships. AI doesn’t remove any of that. And if anything, nWave forces more of it earlier, before a single line of production code exists. The code is an output of software engineering. It is not software engineering itself.
Closing the Loop
Claude Code also lets you define your own commands, and this is where it gets fun.
One of the first things I did was to check that my changes actually made it through the CI/CD pipeline. After a push, I invoke a single command and it:
Sets up a watcher on the GitHub Actions pipeline
Uses the SonarQube MCP server to inspect the issues, if the quality gate failed
Checks whether any unit, integration, or end-to-end test failed
Fixes the problems on its own if anything did
Logs what it learned each time something broke, and that log gets checked before we implement anything new, so we stop repeating old mistakes
That last point is the one I care about most. It isn’t just automating away the busywork; it’s feeding the failures back into the next piece of work. It took about five minutes to write the command (with Claude’s help), and it saves me a pile of manual checking every day.
So, Did It Work?
We’re a two-person company, both doing this part-time, and we ship something meaningful roughly once a week. Within one year, our release cadence hasn’t changed much. What changed is how much rides in each release.
Is this a “10x improvement”? I have no idea, and we’re not on LinkedIn, so I don’t have to pretend to. What I can say is that the amount we’ve shipped, at the quality we hold ourselves to, is well beyond what two part-timers could have managed a year ago. I’ve been critical of the AI hype, and I still am. That is exactly why it’s worth experimenting with the tools, instead of either swallowing the marketing or dismissing the whole thing.
The flip side of all of this: I no longer write the code, and essentially also no longer read it. That is a slippery slope, and I know it.
What worries me?
The tests, the gates, and the process catch a lot, but the responsibility is still ours. We can’t outsource that. Not reading every line is a choice I make with open eyes, not one I’ve stopped worrying about. Trusting tools that are not deterministic requires proper guardrails. I think many fall for the hype without the guardrails. Even more worrisome is if people lose the ability to think about what, why, and how they want to implement something.
The thing that pushed me to Claude Code was Copilot getting more expensive. I didn’t switch purely on the merits; a pricing decision made the call for me. I suspect today’s prices (yes, even the $200/month Claude Max) are still heavily subsidized. And with the looming IPOs, those companies will have to start making money (instead of burning it). Lean on these tools heavily, and you’re at the mercy of whoever provides them. Local models exist, and one day they may be good enough, but in my limited experience they’re not there yet, and GPU prices aren’t exactly helping.
So our thinking is: use the tools, take advantage of the cheap pricing while it lasts, but don’t bet the whole company on your favorite provider staying available at today’s price. That, too, is your responsibility.
The other thing I’ve noticed is more personal. With everything metered in tokens and rate limits (the 5-hour windows and weekly caps in Claude Code), I get oddly anxious when an agent isn’t running, as if I’m wasting a window I paid for. But there will be a dedicated post around this coming up, so watch this space…
Conclusion?
A year ago I ended this post excited that AI tooling had turned me into a one-person development team. A lot happened in the last 12 months.
What I can build part-time, with a partner and a workflow of agents, would have been out of reach a year ago. But I’ve also handed something over. I don’t write the code anymore, and I don’t read it. I’ve learned to delegate some of my work, because the results produced are as good (or better) than what I would have produced. And it’s certainly faster.
Because the thing that’s still mine hasn’t changed: the decisions, the quality bar, the relationship with our users, and the responsibility for whatever ships. The tools got a lot better at the how. The what, and the answering-for-it, are still the job. That’s the part that is still there. I don't see that changing…but let’s wait for the follow-up 12 months from now.






