The Unbearable Lightness of Coding

towardsdatascience.com

A month ago, I built a full retrieval system with embeddings, hybrid search, and a GUI in about 25 hours. Last weekend, I spent two days trying to fix a bug in it — and realized I had no idea how my own software worked.

Let’s be honest: I have pushed a GitHub repo without having written a single line of code. Do I feel bad about it? Kind of. The amount of technical doubt weighs heavily on my shoulders, much more than I’m used to. Will I regret it? Maybe. Will you?

I wanted to share my story here because I believe this is something many developers are going through right now, and even more will experience it in the coming years.

Because let’s face it: you can have a code of honor and be proud of your craftsmanship, but nothing beats the speed of GitHub Copilot & Co. If your colleague on AI steroids ships features and pushes updates twice (wildly underestimated) as fast as you, who do you think is closer to the company’s door when budgets tighten?

The productivity gains are real, even if you only use these tools for documentation. And there’s a tiny step from:

“Write docstrings for this function.“

“Write the function.“

That tiny prompt step skyrockets you into a completely different realm of productivity.

But here comes my very personal story, what I learned, and where I think this leaves us as developers.

The project: building my own NotebookLM (but stricter)

For background, I set out to build a RAG-style text retrieval system in the spirit of NotebookLM, except stricter. The system takes a private PDF library, processes it, and then retrieves answers verbatim from that corpus. No paraphrasing, no hallucinated sentences, just “give me the exact passage that answers my question so I can search it in the original PDF again.”

Admittedly, this is a very scientific, slightly paranoid way of using your literature. But I’m probably not the only one who’s tired of fact-checking every LLM response against the source.

The architecture of the software was fairly straightforward:

A robust ingestion pipeline: walking directory trees, extracting text from PDFs, and normalizing it into paragraphs and overlapping chunks.
Hybrid Storage & Retrieval: a storage layer combining standard SQL tables, an inverted-index full-text search engine (for exact keyword matches), and a vector database (for semantic understanding).
A Reranking Strategy: some logic to pull a wide candidate pool via lexical search, then rerank the results using dense vector similarity to get the best of both worlds.
A Full UI: a dashboard to manage the PDF library, monitor ingestion progress, and display results with deep links back to the source text.

On paper, this is all quite straightforward. Python, Streamlit, SQLite+FTS5, FAISS, a sentence-transformer model, everything wrapped in a Docker container. No exotic cloud dependencies, just a private NotebookLM‑ish tool running on my machine.

The documentation-first approach

I didn’t start with code, but with the documentation. I already had my usual project skeleton from a cookiecutter template, so the structure was there: a place for requirements, for design decisions, for how to deploy and test, all neatly sitting in a docs folder waiting to be filled.

I wrote down the use case, sketched the architecture, the algorithms to implement, the requirements. I described goals, constraints, and major components in a couple of bullet points, then let genAI help me expand the longer sections once I had the rough idea in place. I therefore moved gradually from a basic idea to filling out more detailed documents describing the tool. The result wasn’t the best documentation ever, but it was clear enough that, in theory, I could have handed the whole bundle to a junior developer and they would have known what to build.

Releasing my AI coworker into the codebase

Instead, I handed it to the machine.

I opened the doors and let my GitHub Copilot colleague into the codebase. I asked it to create a project structure as it would see fit as well as to fill in the required script files. Once a basic structure was set and the tool seemed to work with one algorithm, I also asked it to generate the pytest suite, execute the test, and to iterate once it ran into any errors. Once this was done, I continued asking it to implement further algorithms and to cover some edge cases.

In essence, I followed my usual approach to software development: start with a working core, then extend with additional features and fix things whenever the growing construct is running into major issues. Is this a globally optimal architecture? Probably not. But it’s very much in the spirit of the Pragmatic Programmer: keep things simple, iterate, and “ship” frequently — even if the shipment is only internal and only to myself.

And there is something deeply satisfying about seeing your ideas materialize into a working tool in a day. Working with my AI coworker felt like being the project lead I always wanted to be: even my half‑baked wishes were anticipated and implemented within seconds as mostly working code.

When the code wasn’t working, I copy‑pasted the stack trace into the chat and let the agent debug itself. If it got stuck in a self‑induced rabbit hole, I switched models from GPT5 to Grok or back again and they debugged each other like rival siblings.

Following their thought process and seeing the codebase grow so quickly was fascinating. I only kept a very rough time estimate of this project, as this was a side experiment, but it was certainly not more than 25 hours to produce >5000 lines of code. Which is certainly a great achievement for a relatively complex tool that could have otherwise occupied me for several months. It’s still far from perfect, but it does what I intended: I can experiment with different models and summarization algorithms on top of a retrieval core that returns verbatim answers from my own library, along with the exact source, so I can jump straight into the underlying document.

And then I left it alone for a month.

The technical debt hangover

When I came back, I didn’t want to add a major feature. I just wanted to containerize the app in Docker so I could share it with a friend.

In my head, this was a neat Saturday morning task. Instead, it turned into a weekend full‑time nightmare of Docker configuration issues, paths not resolving correctly inside the container, embedding caches and FAISS indexes living in places I hadn’t clearly separated from the code, and tests passing on my local machine but failing (or never running properly) inside CI/CD.

Some of these issues are entirely on me. I happily assumed that my CI/CD pipeline (also generated by AI) would “take care of it” by running tests on GitHub, so that cross‑platform inconsistencies would surface early. Spoiler: they didn’t.

back when Copilot suggested a seemingly simple fix: “Just add a reference to the working directory here.” Instead of letting it touch the code, I wanted to stay in control and only ask for directions. I didn’t want it to wreak havoc in a codebase I hadn’t looked at for weeks.

That’s when I realized how much I had outsourced.

Not only did I not realize why the error occurred in the first place, I could identify neither the file nor passage I was supposed to make the change in. I had no idea what was going on.

Compare that to another project I did with a colleague three years ago. I can still recall how certain functions were intertwined and the stupid bug we spent hours hunting, only to discover that one of us had misspelled an object name.

The uncomfortable truth

I saved enormous development time by skipping the low-level implementation work. I stayed in control of the architecture, the goals, and the design decisions.

But not the details.

I effectively became the tech lead on a project whose only developer was an AI. The result feels like something a very fast, very opinionated contractor built for me. The code has unusually good documentation and decent tests, but its mental models never entered my head.

Would I be able to fix anything if I needed to make a change and the internet was down? Realistically: no. Or at least not faster than if I inherited this codebase from a colleague who left the company a year ago.

Despite the better‑than‑average documentation, I still stumble over “WTF” code pieces. To be fair, this happens with human‑written code as well, including my own from a few months back. So is GenAI making this worse? Or just faster?

So… is vibe coding good or bad?

Honestly: both.

The speed is insane. The leverage is real. The productivity gap between people who use these tools aggressively and those who don’t will only widen. But you’re trading implementation intimacy for architectural control.

You move from craftsman to conductor. From builder to project lead. From knowing every screw in the machine to trusting the robot that assembled the car. And maybe that’s simply what software engineering is quietly turning into.

Personally, I now feel much more like a project lead or lead architect: I’m in control of the big picture, and I’m confident I could pick the project up in a year and extend it. But at the same time, it doesn’t feel like “my” code. In the same way that, in a classic setup, the lead architect doesn’t “own” every line written by their team.

It’s my system, my design, my responsibility.

But the code? The code belongs to the machine.

References

Andrew Hunt, David Thomas, Ward Cunningham (1999): The Pragmatic Programmer. From Journeyman to Master
The generated repo: https://github.com/ElenJ/PrivatePageQuery
My cookiecutter template with docs templates: https://github.com/ElenJ/cookiecutter_streamlit_ml

Feeds