Retool Blog | LLM best practices for eng teams

With AI tools proving useful for automating tasks—and with some LLMs able to generate increasingly decent-quality code—using AI to accelerate software development and building new internal and operations tools with it is certainly alluring.

But with all the novelty comes playbooks that are still being written. As we built our own AI-powered features and started leveraging LLMs for internal work, we took note of the core practices that helped us along the way. If you’re trying to cut through the noise and figure out where to start, we’ve collated some of those practices to help you shape your approach. (Want to see real use cases, top tools, and community insights? Check out our 2024 State of AI report.)

Stay focused on the problem

This might seem obvious, but it warrants an underscore: When you’re considering leveraging AI for whatever purpose—especially right now with all the buzz—it’s important to stay focused on the core problems you’re trying to solve. By getting crisp about what those are and clearly defining your objectives, it’s easier to evaluate how—and if—you ought to use an AI tool to achieve the desired outcomes. (Sometimes you might also find it’s a better plan to augment an experience than build something entirely new.)

Take, for example, a team that wants to streamline their accounts payable workflows. Building a chatbot to approve invoices might be flashy, but does that really create the most utility? A simpler, more familiar approvals portal could be more straightforward for end-users and developers alike. In that case, there might be a value add—like automating parts of the behind-the-scenes workflow—that could be AI-supported... not because you want to use AI, but because there’s a real benefit to doing so.

That’s all to say that a problem-first approach starts from the core business needs rather than the technological solution. Maybe your team has the type of challenge that can be readily tackled with the support of AI, like:

A sprawling knowledge base that makes it hard to retrieve information, whether for internal use or customer support
A deluge of email and other communications that slows teams down and reduces overall responsiveness
Too many manual steps in repetitive tasks that leave room for error and take up employees’ time
Overwhelming amounts of data that are difficult to analyze and find meaningful trends within

Whatever the case, try spelling out exactly what your pain point is. By starting with the problem, you’re better positioned to identify the right solution.

Refine your input data

Getting more down and dirty with LLMs, the data science adage “garbage in, garbage out” rings very true for LLMs as well. If you want to get the most out of the operations software, features, and tools you support with AI, not only should you clearly articulate your problem, but you’ll need to provide your LLM all the context it needs to generate meaningful solutions. Otherwise, it’s kind of like you hired a talented external consultant and didn’t give them access to the context they needed to do the job. (Of course, you should only ever hire a consultant you can trust with your data, and same goes for your AI tools—more on security and privacy in a later section.)

One way to help ensure your LLM is well-supported is to teach it… And to teach the model, you’ll need to feed it a variety of data types. While LLMs typically only take text as input and return text as output, “text” encompasses a lot. For example, you can use:

Documents
Data tables
Video transcripts
Code
Logs

But the value of just feeding the LLM a bunch of text—aka raw data—can be pretty limited if that text is a jumbled mess. The massive LLMs that many folks are excited about today aren’t something you can just custom fit to your task by rebuilding model weights—you’ve got to train the models with prompts, and the crisper those inputs are to fuel the LLM’s output, the better.

Organizing or annotating your input data gives the LLM more clear, structured context, which can, in turn, better enable the LLM to find patterns and produce high-quality responses. For instance, if you were supplying the LLM a corpus of cooking recipes, you could preprocess your data so the recipes follow a clear and consistent format, say of ingredients and steps, cutting out irrelevant text that might just add noise to your results.

Tip: You might be able to leverage a separate LLM to elevate your data quality. For example, you could use that model to review your input data and optimize its format.

Engineer your prompts to produce structured, accurate responses

Contextual data isn’t the only input to an LLM you should refine—how you prompt the model makes a difference, too. For instance, you can put guardrails on prompts to get closer to the output you need (and to steer away from the output you don’t want!).

Instead of leaving the potential responses wide open, you could instruct the model to always answer in, say, valid JSON. (Recently, OpenAI’s GPT models added support for function calling which means you can prompt with a function signature and the model will respond with valid JSON arguments to that function. You can explore whether this works for you and how accurate/quality your results are.)

Going one step further, it can help to familiarize yourself with more advanced prompt engineering techniques. With the few-shot prompting technique, for example, you give the model examples of (input, output) pairs to nudge it in the right direction. For tasks that require complex reasoning and directional correctness, that technique can also be paired with chain-of-thought prompting, which involves explaining the reasoning to get from a particular input to an output (This is just scratching the surface—the research community is continuing to devise sophisticated ways to interact with LLMs.)

By refining your data quality and your prompts, you can encourage more high-quality, helpfully structured outputs, and potentially leave less room for error in how you link up the LLM to your other systems. Iterate on your prompts based on testing and user feedback to refine them.

Establish a feedback loop for continuous learning

Collecting user feedback is central to how you iterate on your products—and an advantage of working on internal tools is that feedback can be especially easy to come by when your users are your coworkers.

In order to continuously enrich and tailor the LLM, design your tool or app with affordances for users to let you know what is and isn’t working. You can of course gather long-form feedback by interviewing or surveying users, but you should also collect real-time data as they interact with your AI-generated results. For example, if you’re using AI to provide suggestions within an internal app, accepting a suggestion is a positive signal that can reinforce your model. On the other hand, if a user manually intervenes during an automated process or overrides a suggestion, this can be used as a strong negative signal for the LLM.

As you collect feedback, think about the best ways to act on it. You may see patterns that point to a need to retrain your base LLM or whatever other AI model you’re using. (Maybe your model is prone to hallucinations or toxicity.) Alternatively, especially if you don’t own your model, you can rethink the prompts and contextual data you provide to help point in a more effective direction.

And don’t forget that some of your problems may not be engineering problems. Depending on the feedback you gather, you might find that design, copy, or other user experience optimizations could go a long way.

Optimize for performance and scale

While thoughtful applications of LLMs could transform how you’re operating (or even reshape your business writ large), they can also be hugely expensive in terms of both time and money. If there are tasks that you can solve effectively and efficiently with deterministic code, it may be faster and cheaper to do so than delegating to AI. Evaluate the tradeoffs between LLMs and your other options to make sure you’re considering the possibilities holistically.

For the cases where you do choose to use LLMs, you’ll continue to balance tradeoffs in cost, performance, and quality. Implementing metrics for each category will help you to make data-driven decisions about your return on investment.

One of the most common balancing acts engineering teams face when implementing AI is figuring out how to reduce latency and cost while largely maintaining quality. Some practical recommendations cribbed and adapted from OpenAI include:

Using a cheaper model (perhaps less sophisticated but still well-suited for relevant tasks)
Batching your requests when possible
Setting a lower maximum number of tokens for the LLM to return
Rate limiting your queries

Optimization is rarely one and done; with AI, it’s something you’ll likely need to revisit now and then as your organizational needs change and the broader technological ecosystem matures. Not only are new models rolling out frequently as old ones are being deprecated, but there’s constant innovation on other software tools that enhance ML operations (like feature stores and monitoring platforms). As an engineering leader, you’ll likely want to set aside resources for maintenance and ongoing improvements to make the most of these developments.

Prioritize AI security and privacy from day one

LLMs have rightfully faced a lot of scrutiny around how they process data and who owns that data. Particularly if you’re using a third-party model, be sure to conduct a thorough security and privacy review to ensure you’re aligned with best practices and regulations (like GDPR, HIPAA, etc.), before granting access to your sensitive business data.

So what exactly can you do to proactively address AI security and privacy risks? The short version is that data that’s yours should stay yours. Applying this principle in your security and privacy reviews will preempt a lot of problems. For the long version, do thorough research and explore trusted sources for guidance. (If you’re looking for tactical guidance on securely using AI tools, our engineering manager’s guide to AI security and governancecovers documentation, auditing, and more.)

It’s easy to treat AI models as black boxes that magically solve business problems. Ignorance can be a liability here—spend the effort to understand how your models work, both at a theoretical level and in practice with sample data that approximates your own use case, and be sure to apply a healthy dose of skepticism at all times.

Build and maintain AI systems with Retool

With the AI space evolving rapidly and more and more companies and teams experimenting with what works and what doesn’t, the rules, in many ways, are still being written. We hope these learnings give you some helpful direction and structure as you figure out how to apply LLMs to solving business problems.

If you’re looking to build secure AI-powered apps and workflows, you can start building them for free with Retool AI. Whether you’re looking to automate business tasks (maybe you want to generate code, summarize text, or create charts), create an app (support bot anyone?), or streamline workflows, you can customize and integrate AI and securely deploy to devs and end users in minutes. (Yes, minutes.)

If you’d like to learn more about Retool AI, read the docs or book a demo... and get one step closer to building AI-powered tools that deliver.

Extra-special thanks to Ford Filer, Bob Nisco, and Cory Wilkerson for the insights and sagesse.

Reader