GraphQL is a query language and server-side runtime designed to streamline data transactions between the frontend and backend. It uses an object-like schema to provide fine-grained control over what data is and isn’t accessed. GraphQL can access data across multiple tables via a single query, traversing relationships. As RedHat puts it, GraphQL “prioritizes giving clients exactly the data they request and no more [and] is designed to make APIs fast, flexible, and developer-friendly.” Notably, GraphQL is not a database; it integrates with most modern languages and databases, contending directly with REST APIs.
Facebook relied on a REST API from its early days. Facebook originally was built with server-generated, dynamic HTML, but as time went on, they leveled up to a REST API that returned the various data structures (friends, pages, events, etc.) to their client application. However, social media can feature complex relationships; Facebook’s frontend had to make multiple consecutive requests before loading a view. This constraint hurt newsfeed load times and network efficiency and restricted developer productivity as they had to maintain alignment between the frontend and a sprawling user data API.
The situation compelled Facebook to spearhead development of GraphQL, a query language designed to fetch relevant data in a single query. It was initially internal only and remained closed-sourced for a few years; it is technically older than Facebook’s ultra-successful React, but was still a secret when React was released in 2013. GraphQL dramatically improved Facebook’s performance—and eventually, in 2015, Facebook open-sourced it.
GraphQL is both a query language and a server runtime. The query language looks a lot like JSON, but with only the keys and no values.
1{
2 user {
3 name
4 friends {
5 name
6 }
7 }
8}
9
GraphQL’s server runtime exposes a single endpoint that receives all of the GraphQL queries.
A common misconception is that GraphQL natively connects to a database. Rather, because GraphQL was designed to be a general-purpose query language for all data sources, it pairs with third-party libraries that translate GraphQL queries into a native database language (such as SQL). Some popular services to connect GraphQL and databases include graphile/postgraphile (for Postgres only) and Hasura. These libraries translate a GraphQL query into a single native query or a set of native queries.
GraphQL can be used to query data from all sorts of databases, including relational databases like MySQL or Postgres and non-relational databases like MongoDB. While GraphQL can be used for a wide variety of applications, some common niches for GraphQL are SaaS applications and social networks. GraphQL is a desirable solution in place of a REST API when clients would need to fetch data that spans multiple resources, in a way that might take multiple requests to fetch from a REST API.
GraphQL can be used at any stage of a project’s journey. It’s easy to set up for most databases and is well-suited for projects using JavaScript backends. Because GraphQL doesn’t dictate which databases need to be used, it can be implemented side by side with a REST API, allowing for incremental migration from REST to GraphQL. Many companies—including GraphQL’s original developer, Meta—switched to GraphQL far into a project’s lifecycle.
Because GraphQL was built as a response to REST’s pitfalls, perhaps the easiest way to grasp the advantages of GraphQL is to understand the disadvantages of REST. Like GraphQL, a REST API-powered backend serves as a bridge between a database and a frontend. Let’s take a deeper look.
Specifically, a REST API backend implements and maintains:
- Authorization. Because database data is typically protected or rate limited, a REST API provides authorization middleware to ensure that all data transactions are made by the right users.
- Object-oriented design. Because data in both relational and non-relational tables are sets of objects (e.g. user, organization, books), REST can provide endpoints relevant to each set and object type.
- Caching. A REST API can cache results on a per-endpoint basis to expedite subsequent data fetches, reducing load on the underlying database.
While REST APIs are open-ended, which can be organized and optimized in various ways, they are practically a rather rigid paradigm for accessing data. Creating new endpoints is tedious, so developers just end up combining them when needing inter-linked data; each endpoint typically has limited options (in practice, that is—hypothetically they could have infinite configurations). This constrained approach can create two common problems—over-fetching and under-fetching.
Over-fetching with REST APIs
One of the biggest issues with REST APIs is that they don’t offer fine-grained control over what fields are accessed. For instance, imagine a frontend application that strictly needs a list of User.names
and User.addresses
for a view. With a REST API, hitting the /api/users endpoint may return the following:
1{
2 "users": [
3 {
4 "id": 1,
5 "name": "John Smith",
6 "email": "[john.smith@example.com](<mailto:john.smith@example.com>)",
7 "phone": "+1-123-456-7890",
8 "address": "123 Main St, Anytown, CA, 12345"
9 },
10 {
11 "id": 2,
12 "name": "Jane Doe",
13 "email": "[jane.doe@example.com](<mailto:jane.doe@example.com>)",
14 "phone": "+1-987-654-3210",
15 "address": "456 Oak St, Anytown, CA, 12345"
16 }
17 ]
18}
19
You received email addresses and phone numbers that weren’t part of your request.
Some REST API designs address this over-fetching problem by adding optional flags that filer for specific subsets of data. For instance, the ?detailed=true flag
may return the above snippet while ?detailed=false
returns:
1{
2 "users": [
3 {
4 "id": 1,
5 "name": "John Smith",
6 "address": "123 Main St, Anytown, CA, 12345"
7 },
8 {
9 "id": 2,
10 "name": "Jane Doe",
11 "address": "456 Oak St, Anytown, CA, 12345"
12 }
13 ]
14}
15
This is rarely a sustainable solution from a code cleanliness standpoint, so developers end up just over-fetching data. While an occasional over-fetching of data may not seem terrible—you’re still getting the information you did request—at scale, it hogs up network bandwidth due to larger packet sizes, and can sometimes slow down the database.
Under-fetching with REST APIs
Under-fetching is something of the opposite, but can cause similar issues. When an endpoint doesn’t return all the data you requested, it can necessitate a second call.
Consider this. Applications often need to fetch nested data. An example of nested data would be a view that displays a list of users with their employer’s name and their posts. Since employers and posts are typically independent models in the database, the view requires access to three REST endpoints—/api/users, /api/employers, and /api/posts.
A common issue with REST APIs is that developers will fetch the top-level entry first (e.g. users), then loop through each entry (user) and access the related entries (employers and posts). This iterative (and slow) pattern is known as the N + 1 problem. That is, there was 1 initial query and then N queries for each entry returned by the first query. (Granted, in this specific example, it is technically 2N + 1 queries given that there are two related models, but the grander point is that as N scales, querying becomes more expensive).
In the case of REST APIs, N + 1 is a problem from both a code cleanliness and efficiency standpoint. Developers need to write a loop that fires off independent queries; then, afterwards, the database has to deal with N + 1 queries.
GraphQL can address both the under-fetching and over-fetching problem by providing users with fine-grained data access.
GraphQL makes it easy for developers to write a single query to access exactly what they need and nothing more. For example, if a view needed to just access User.names and User.addresses, the GraphQL query would look like this:
1{
2 user {
3 name
4 address
5 }
6}
7
And would result in:
1{
2 "users": [
3 {
4 "name": "John Smith",
5 "address": "123 Main St, Anytown, CA, 12345"
6 },
7 {
8 "name": "Jane Doe",
9 "address": "456 Oak St, Anytown, CA, 12345"
10 }
11 ]
12}
13
Notice how only name and address are in the response. This is a solution to theover-fetching problem. GraphQL makes it dramatically easier for developers to access only the relevant data for each view without creating custom endpoints.
But GraphQL also makes it easy to solve the under-fetching problem by enabling developers to access nested data. Imagine if a view needed a user’s employer and posts—the following GraphQL query could be used:
1{
2 user {
3 name
4 address
5 posts {
6 title
7 description
8 image
9 }
10 organization {
11 name
12 url
13 }
14 }
15}
16
And could result in:
1{
2 "user": [
3 {
4 "name": "John Smith",
5 "address": "123 Main St, Anytown, CA, 12345",
6 "posts": [
7 {
8 "title": "Post Title 1",
9 "description": "Description of post 1",
10 "image": "/path/to/image1.jpg"
11 },
12 {
13 "title": "Post Title 2",
14 "description": "Description of post 2",
15 "image": "/path/to/image2.jpg"
16 }
17 ],
18 "organization": {
19 "name": "Acme Corporation",
20 "url": "<http://acme.example.com>"
21 }
22 },
23 {
24 "name": "Jane Doe",
25 "address": "456 Oak St, Anytown, CA, 12345",
26 "posts": [
27 {
28 "title": "Post Title 3",
29 "description": "Description of post 3",
30 "image": "/path/to/image3.jpg"
31 },
32 {
33 "title": "Post Title 4",
34 "description": "Description of post 4",
35 "image": "/path/to/image4.jpg"
36 }
37 ],
38 "organization": {
39 "name": "Globex Corporation",
40 "url": "<http://globex.example.com>"
41 }
42 }
43 ]
44}
45
To be clear, this does not solve the underlying N + 1 problem as a whole. While the requests to the backend are cut down to a single request, the backend might still need to make N + 1 queries to the database. For some teams, this might be okay, but as N scales, the N + 1 relationship between the backend and the database can exacerbate.
There are solutions, however. GraphQL has something called a DataLoader. Instead of accessing subsequent tiers of data (e.g., posts and organizations) iteratively, a DataLoader completes the initial tier first (e.g., users). Then, it makes a single call for each subsequent tier (posts and organizations) with the list of keys from the initial returned list (users). In this example, a DataLoader would cut down the collective request to just three queries—one each for users, posts, and organizations.
While GraphQL can provide some significant advantages, it’s not all sunshine and roses, of course. There are a few disadvantages to GraphQL when compared to other options like a REST API. The good news is that most have rather straightforward solutions.
- Higher complexity floor. GraphQL is more flexible than REST, but it’s also more complex. For a simple application, a REST API can be easier to set-up and transact with. GraphQL’s set-up work (which also requires a third-party framework to link GraphQL to a data source) is more involved. Additionally, fewer engineers are familiar with GraphQL than REST, so there is a typical learning curve.
- Poorer errors. While REST errors are only as helpful as how much work was put into the REST API, they tend to provide fairly descriptive errors. At the very least, since each REST endpoint has a specific purpose, context helps developers solve their problems. GraphQL, meanwhile, uses a single endpoint, which can make its vague errors unhelpful. (This is reminiscent of SQL’s notoriously frustrating error reporting…!)
- Tunable Efficiency. Earlier, we discussed all the efficiency boosts that GraphQL boasts with its inherent ability to access tiered data. However, REST’s open-ended nature does enable developers to tune specific endpoint to efficiently query and cache complex needs. On average, GraphQL is faster, but in certain N + 1 scenarios may be slower. Granted, GraphQL’s DataLoaders can address this, but they take more development work to implement.
- File uploading. GraphQL doesn’t include native file uploading support—how file uploading works is left up to the developer. However, there are many third-party libraries like https://github.com/jaydenseric/graphql-upload to enable file uploading. This is less relevant for developers that use a dedicated file uploading service like Uploadcare or Cloudinary to translate files to public URLs.
- Caching. Because GraphQL has a single endpoint (as opposed to REST’s many specific endpoints), caching is more difficult because requests can be quite different. PersistGraphQL is a popular GraphQL third-party library that fixes this by creating a cache map between a hashed query and previous, cacheable results.
There are many fantastic products out there that use GraphQL. Most products in the Meta ecosystem use GraphQL such as Facebook and Instagram. It’s also used by Twitter, publications such as the New York Times, publishing sites like Medium, and business applications like Intuit and Shopify.
Additionally, many Retool customers use GraphQL, specifically to fetch data from their primary databases for their Retool instance. You can easily create a visual interface and build on top of your GraphQL data with a GraphQL GUI.
Of course not—you could spend ages digging into how resolvers, directives, or mutations work. But if you’re looking for the essentials, know that GraphQL is a growing query language for APIs and runtime for fulfilling your queries that can help you steer clear of REST’s over- and under-fetching issues. If you want to learn more, take a dip into the Retool archives with this beginner’s guide to the GraphQL ecosystem, or get into the weeds and learn to build a GraphQL admin panel, GraphQL frontend, or an app for your mobile workforce.
Reader