Retool Blog | Human + AI collaboration: Beyond the automation anxiety

This post is a recap of a conversation between Retool Head of Product Design Paco Vinoly and Strategic Solutions Architect Tom Konewka at Retool Summit 2025. You can also watch their talk and all of the Summit sessions.

It’s not that hard to build an AI agent these days, but getting it into production safely is where most teams hit a wall. Building an impressive proof-of-concept in isolation is the new “but it works on my machine!”

An impressive 65% of today’s enterprise app builders aren’t actually software engineers, and 48% build solutions directly without waiting for engineering resources, according to our 2025 Builder report. While this is great news for the democratization of code, it’s no surprise that the gluework that’s tedious for even seasoned developers (authentication, monitoring, audit trails, approval workflows, and compliance checks) is resulting in failure to launch.

Beyond this reality check is a realistic way forward: rather than aiming for total autonomy, a combination of agentic workflows, AI agents, and human tasks means you can get the best out of each capability, while maintaining safety, security, and reliability. Gradually adding in more autonomy, where it makes sense to do so, is how you get incremental gains with less risk.

So, how do you build systems where AI and people work together—not just coexisting, but collaborating and amplifying each other’s strengths? And how do you build those systems with change in mind, knowing that the boundaries of what humans or AI is responsible for will continue to shift over time?

What “agentic” means in practice

Before we get into how best to combine human tasks with agents or agentic workflows, let’s get on the same page about what each of those means.

Agentic is a spectrum, not all or nothing

Graph showing the evolution from GenAI workflows, through agentic workflows, to fully-autonomous AI agents.

At one end of the spectrum, you have AI inside deterministic workflows: humans remain firmly in control while LLMs augment specific steps.

This workflow takes a sales call transcript and includes a genAI step to summarize, highlighting objections and action items. It then emails a recap to the sales representative. While the LLM output varies, the workflow on the whole is consistent.

Diagram illustrating a GenAI workflow: Input, Get Transcript, Summarize call and action items, Send email, and Output.

At the other end of the spectrum, you have open-ended AI agent workflows where an agent plans, acts, accesses tools, and converges on an outcome. There are no fixed steps, and the agent may chart a different path each time. This works well for ambiguous tasks that are highly context dependent and may require multiple loops for the agent to gather data and confirm that it has everything it needs to proceed to the next self-defined step.

Flowchart depicting an AI agent workflow: an LLM processes input, reasons about actions, interacts with tools (summarize call, update CRM, web search) for responses, updates observers, and generates output, with a feedback loop.

Agentic workflows are a step in between: still a human-defined workflow, but instead of just generating an output, the AI model makes some decisions about which path to take. It may run the entire workflow from end to end, or decide to escalate to a human for clarification before taking action.

Each of these models represents slightly different boundaries between automation and human input. It might sound like the ideal journey is to start with genAI workflows and graduate eventually to agents, moving from a state of low AI maturity to the most advanced.

But the answer is actually that all types of AI workflows have their place, and the trick is figuring out which pattern fits your use case. Ideally, you want to limit the use of LLMs to ambiguous and open-ended tasks, so you’re not introducing unpredictability where it’s not welcome.

The architecture that actually works

So, we know that most businesses need some combination of:

Workflows to provide durability and repeatability in deterministic tasks or processes
Agents to offer flexible, probabilistic reasoning for nondeterministic flows
Human checkpoints to add judgment, policy decisions, and final accountability through approvals, edits, and exception handling

So how do you bring these all together? In the course of building Retool Agents and AI-assisted development (and working with customers deploying AI to production), we’ve learned that the agent architecture that works follows a triad pattern: combining workflows, agents, and human tasks.

At the center is a unified orchestration layer—a coordination system with several components working together:

The orchestrator agent acts as a supervisor

It receives a task, breaks it down, and delegates to specialized sub-agents. Instead of trying to make one agent an expert at everything, you create focused agents that excel at specific tasks.

Imagine you’re building a dispute resolution workflow. You could create a single “dispute agent” with all your company’s policies and ask it to evaluate disputes, review evidence and customer history, and draft email responses. It might work, but it won’t do an outstanding job of any of those things.

A better approach is a dispute manager agent that orchestrates specialized sub-agents. One agent excels at language analysis because it’s been trained specifically on that task. Another specializes in reviewing customer history. When a task comes in, the orchestrator routes it to the right specialist, prepares the context, and coordinates the overall process without trying to do everything itself. These agents all work in parallel, each with clear guardrails about what they can access and what they’re looking for.

Branching logic allows the system to make decisions based on what it discovers. In our dispute resolution workflow, here’s how that might play out:

A customer disputes a credit card charge. The orchestrator agent receives the complaint and immediately spawns several sub-agents in parallel.

Each sub-agent has a different job:

The evidence collector agent might check product analytics to validate if the customer is actually using the product.
A Salesforce agent evaluates account signals (like history of disputed payments) and returns a risk score.

Based on what these agents find, the workflow branches. If the evidence is clear-cut, the system might automatically accept or reject the dispute. But if there’s ambiguity—which there often is—the workflow routes to a human for review and approval before finalizing the decision.

Far from being an afterthought, human-in-the-loop checkpoints are designed into the architecture from the start. Humans can see exactly what the AI discovered, review the evidence, and make the final call.

In this model, even complex, multi-step business processes can run in the background with as much automation as you’re comfortable with, and humans can step in only where they’re needed. User interfaces and human involvement aren’t the starting point, but rather the safety net for guiding and approving what AI is doing.

Why the interface is the linchpin

While chatbots are where AI went mainstream, they’re not the answer for every use case. Chat interfaces are great for low barrier-to-entry tasks where you’re providing open-ended context, but the drawback is that chat forces users to initiate every action.

Something happens in Salesforce, you get an email, or a dispute comes in—and now you have to go paste information into a chat interface to kick off the process. ChatGPT can help you draft an email reply, but it can’t send it for you.

Prompting is exhausting. You have to think of everything, structure your request perfectly, and hope the agent parses it correctly. Switching between tools to enact different parts of the process comes at the cost of focus. Rather than relieving cognitive load, handing tasks over to AI via a chatbot interface creates more of it.

Giving agents agency while maintaining some level of human oversight takes a different approach. Here’s what works better:

Purpose-built interfaces for specific tasks

What do we mean by purpose built?

Forms with structured inputs tied to specific fields an agent needs
File upload requirements for certain file types
Email-based triggers for asynchronous processes that might take hours or days because they involve human approvals

These constraints keep logic at the workflow level, and limit LLM-driven ambiguity or flexibility to the tasks that benefit from it.

The right UI acts as a control surface for AI: it lets people provide structured inputs, make decisions at key gates, and trace what the AI is doing.

The leader’s playbook: How to ship safely in weeks, not quarters

Building all these agents and their human/UI interfaces in separate sandboxes and then trying to figure out how to get them to communicate (without losing context) can mean weeks spent on plumbing alone.

Yet 51% of Retool builders who now solve problems in days or weeks instead of months or quarters. One such builder, Eric Cheng, a Komatsu enterprise architect, stood up a Retool app that acts as a surface for building agentic systems to automate lengthy, step-by-step customer service operations. What started out as a side project was easily scaled with Retool as a control panel and central MCP service, connecting with external endpoints, file storage systems, APIs, and retail workflows.

To realize the promise of AI pilots, here’s what actually works:

Start with processes that have clear outcomes and explicit policies

A common mistake teams make is asking AI to automate something they don’t fully understand themselves. If you can’t articulate the steps a human would take to complete a task, an agent isn’t going to figure it out magically. Pick workflows where you know what success looks like and where the decision rules are clear enough that you can evaluate whether the AI got it right.

Invest in guardrails, observability, and evals

LLMs are non-deterministic systems that need a different approach to safety and QA than traditional software. Sadly, unit tests alone aren’t going to cut it. If you’re giving agents the keys to the kingdom by letting them take actions via your critical business tools, you also need to take measures that prevent them from acting in unauthorized (or even malicious) ways.

Powerful, detailed observability means you always know what’s going on inside your systems and are able to debug any suspicious or incorrect output. You’ll want to keep track of metrics and signals such as:

Budget concerns like token usage and estimated cost
Performance monitoring like runtime and total runs
Security measures like tool access and usage

A dashboard displaying AI agent graphs, usage metrics, and an executive assistant drafting an email to schedule a meeting. — Agent observability built into Retool

At the same time, having automated evals set up to ensure that, once you’ve developed your agent, the output stays consistent with what you’d expect is an important forward-looking measure to make sure your agents are running smoothly.

Add autonomy incrementally

You don’t have to go end-to-end all at once. One of our team members built an internal tool to parse demo recordings from engineering calls. While integrating with Zoom’s API would let the tool pull in transcripts automatically, it would’ve taken weeks to set up. Instead, he started with a text box where he pastes the transcript manually. It’s not fully automated, but it’s working—and it only takes two minutes a week. He can still iterate on the solution and automate that last piece, but he’s already made incremental gains along the way.

Consolidate scaffolding work in a platform

If you build your proof-of-concept in the same environment as your production system, you don’t have to start from zero once you’re ready to ship. Your AI app or agent is already tied to your real data sources, with governance already configured, and monitoring already in place.

With a platform approach, you can also easily set up agents as tools for other agents—the orchestrator knows it has access to specialized agents and can call on them while the platform handles the coordination.

The future is perfect collaboration, not perfect autonomy

No agent is going to be perfect on day one. They reason and act like humans, which means they need coaching and iteration to improve. You might deflect 30% of support requests with an agent today. As you iterate on prompts, refine guardrails, and as the underlying models improve, you might get to 50%, then 60%, and eventually 99%.

This iteration, together with rapid developments in AI capabilities, mean that the boundary between what should be automated and what needs human input is constantly moving. As you refine your efforts, as models get better, as teams get more comfortable, as business requirements change—this line shifts.

We know that existing business models and processes will continue to transform in the coming years. Simplifying the plumbing work surrounding your AI efforts will help you to keep pace. Being able to bring any LLM model to create agents and agentic workflows, together with an easy way to grant secure access to your business tools gives you time to adjust to the new realities of those business processes over time.

The right combination of orchestrated agents, deterministic processes, and structured human judgment embedded in business workflows can turn those flashy demos into durable production value.

Check out more about what we learned at Retool Summit and watch the full talk here: AI Agents in Production: Apps, Innovation, and Leadership—Retool Summit 2025.

Reader

Human and AI collaboration: beyond the automation anxiety