The most surprising thing about AI in software development is that it hasn’t made my job simpler. Agents now write nearly 100% of my code at Retool. And yet, I’m spending more time than ever on the things that actually gate whether software ships: review, judgment, prioritization, and the work of making sure we’re solving the right problem in the right way.
The fast parts got faster, and some of the hard parts stayed hard. What AI did was make that line visible in a way it wasn’t before, and it forces every engineering team to decide, explicitly, where human judgment has to live.
What follows is what that actually looks like in practice on my team at Retool: where we’ve found real leverage, where we’ve hit limits that no model version will fix, and the discipline we’ve had to build to make AI earn its place in a production codebase.
In a lot of ways, we are moving faster because of AI, but it’s not always where we expected to speed up. Reviewing the code the agents write is still the bottleneck in my day-to-day workflows. AI’s great at identifying things that human reviewers can miss easily during code reviews, like subtle oversights in data flows, edge cases, and inefficiencies within the code. Our tools have context on the full codebase, not just the PR they’re reviewing. But even with that context, an agent won't correct for a fundamentally wrong architecture.
AI has also made it easier to avoid sunk cost fallacy. If a teammate brings me a PR I think we should approach differently, I’m no longer hesitant to say so—knowing that migrating from one approach to another involves a lot of lines of code changing, but not a lot of the fundamental assumptions.
We incorporated AI code review on pull requests this year and experimented with a number of providers. During that time, we had a few instances where AI code reviews weren’t available, and most of the engineers felt like they’d lost a huge asset.
Our Terraform provider exists as a layer on top of our public API, allowing enterprise customers to use Terraform to automate provisioning and managing their Retool instances. Most of the code in the repo is actually a Go wrapper for our API that’s generated from an OpenAPI spec. It’s tens of thousands of lines of automatically generated boilerplate code, but has subtle bugs that used to mean weeks of work to carefully add new capabilities to the provider.
I’ve been able to capture a lot of those nuanced edge cases and bugs into various README/AGENTS markdown files. Now I can spend about a day updating the provider to match the capabilities of our public API as we update and add functionality to it. This efficiency upgrade has allowed our team to commit to keeping the Terraform provider in-sync with our quarterly stable release cadence, giving customers a lot more comfort in using Terraform to manage their Retool deployments.
When it’s cheap to try out any idea, it’s easy to end up doing a lot of what software engineers refer to as “snacking”—fun but ultimately low-impact work. Over the past year, I’ve found the most important skill for me as an engineer is focus and prioritization.
AI makes it much easier to spin up proof-of-concept work within the codebase. We can now build demos to earn buy-in for ambitious proposals that previously would have lived in Google Docs and meetings for months. But engineers have to be really crisp on what’s production-ready versus what makes a flashy demo. Vibe-coded demos don’t deploy. We still have to do all the hard parts of software engineering: ensuring new features incorporate correctly into existing systems, are well tested, architected to anticipate future needs, logged appropriately, and secure.
The impact on the design phase has been less obvious, and more valuable. Managers and tech leads use tools like Claude Code and Cursor to get answers about how our systems work in production today instead of relying on documentation that may be outdated or incomplete. LLMs are great at condensing the massive amount of information that’s present in a large codebase into something I can use to make decisions with confidence.
AI helps me understand the current state of our systems when planning out new features and debugging tricky performance issues. It hasn’t necessarily sped up the time it takes me to plan and scope work, but it makes those plans much more accurate and actionable. What used to take me weeks of reading code, drawing diagrams, and understanding the current system can be condensed rapidly.
But this isn’t a replacement for judgment. I spend a lot of time evaluating the outputs of those AI sessions because, just like a code PR, if I’m putting my name on a proposal or technical plan I need to fully understand the impacted systems and be able to stand behind my recommendations.
Our security team has successfully used agents to surface vulnerabilities in code. We’ve done a lot of work to help agents understand our own security and permissions models within the product.
But it’s still our job as engineers to think about our customers, how they use the product, and where we need to develop guardrails both for them and for our systems’ integrity. AI is helpful in building those tools and systems, but ultimately we have to understand how people are using the product, their frustrations and limitations, and how we can address them to provide a secure-by-default system that meets their needs.
We’re doing a lot of work on my team to make it easier for administrators of Retool instances to have confidence that they’ve configured things in a way that’s secure and gives only the right people access to different parts of Retool. This includes setting sensible defaults, establishing good guardrails, and using powerful tools to both customize access and audit who has that access.
In a recent engineering plan for a big workstream I’m leading, I had a direct statement under “risks” that the current state of AI tooling and rapid pace of change has broken my ability to give accurate engineering estimates on the time needed to complete a project, especially when multiple engineers are contributing.
Code generation has decreased the time it takes to go from a decision or idea to implementation, but it’s done relatively little to speed up any project decisions, reviewing the code, and ensuring we’re solving the right problem in the right way. So, while it can feel like we’re moving faster, the deadline might not really change.
Project decisions including the ones we make around architecture, like the division of responsibility between frontend client code and backend code, are not always made easier with AI. We dealt with this during the build out of our new permissions system. It’s really easy to have an agent spit out a ton of frontend code for a responsibility that’s better handled by the backend.
AI doesn’t speed up UX decisions for the edge cases that surface during implementation either. In our new billing analytics, we had to figure out how to surface deduplicated user counts in individual space metrics clearly enough that an admin viewing a single space’s data wouldn’t be confused. That wasn’t in the original designs. An agent can generate the code to pull the right data, but deciding how to present it is still a human call.
We’ve had to move from “AI is bad for that task” to “what context is the AI missing to be successful at that task?” As a general rule, if you wouldn’t expect a new teammate to be able to complete a task with no more context than is available in written documentation, crisp problem definitions, and linter rules, you can’t expect an agent to be successful.
We’re not just trying to make our agents more successful at writing code. We want everyone working on the codebase to be successful, whether they’re new engineers on the team or just working in an unfamiliar part of the codebase. We have invested heavily into moving our internal documentation to live in our codebases, writing agent skills and files for specific systems, and documenting processes that used to exist mostly in individual teams’ working knowledge.
I also spend more time helping my coworkers adopt these tools and facilitate discussions around how we’re using them. It’s hard to learn from the AI discourse when the loudest voices are often talking about what they’re getting done with the tools, but not how they’re doing it. And it’s all made worse by industry claims that contribute to a feeling of falling behind (even when many of them don’t come with receipts).
My team meets regularly to discuss and show some of our workflows. I've gotten in the habit of sharing chat transcripts from particularly successful use cases, sometimes including them in PRs. We’ve updated our onboarding process for new engineers to include training in our team’s best practices for using AI in every step of our workflows.
We’re really big fans of the Research, Plan, Implement, Review framework—a workflow pattern for sequencing AI coding work that informs our methodology for incorporating AI agents into every step of our software development lifecycle. Here’s how it usually works:
- Research: An agent researches the codebase state, rubber-ducks ideas, and develops problem understanding, rapidly condensing weeks of work.
- Plan: A second agent uses the research summary to create a detailed, accurate, and actionable implementation plan.
- Implement: A further agent handles implementation, writing nearly 100% of the code.
- Review: This is the primary bottleneck, forcing a decision on where human judgment lives. I review the code and planning outputs making any necessary changes prior to sending it to a coworker for a final approval.
Even beyond the ship, we’re starting to use AI agents to attempt automatic fixes for errors we see in production traffic, one incident this year where a subtle breaking change in a dependency caused users to have intermittent issues signing in.
We saw an incident this year where we had a very small number of customers hitting an error with SSO flows. Within minutes, an agent was able to pull relevant production logs and diagnose an upstream dependency change that had introduced an edge case to zero in on the line of code causing the error. We were able to get a patch applied to the dependency in our codebase and a fix up for it very quickly. I was able to submit that fix to the dependency’s maintainers and have it merged within days—meaning we didn’t need to maintain a patch or have a complicated workaround in our codebase indefinitely.
The most clarifying thing I can say after almost a year of working this way: the fast parts got faster, and the hard parts stayed hard. What AI did was make that line visible in a way it wasn’t before, and force every engineering team to decide, explicitly, where human judgment has to live.
- Shift from “AI is bad at that” to “what context is AI missing?” It changes how teams debug agent failures across the board.
- Document edge cases and known bugs in README/AGENTS files. Retool took Terraform provider updates from weeks to a day this way.
- Tune AI code review tools for signal, not volume. Noisy reviews get ignored, which defeats the purpose entirely.
- Keep a human in the review seat, always. AI catches what humans miss, but it won’t flag a fundamentally wrong architecture.



