As the omnipresent language for communicating with relational databases nears 50, we explore some essential SQL concepts, and the evolving landscape around it.
SQL, or Structured Query Language, is a multi-dialect, domain-specific language designed to manage data in relational databases like MySQL or Postgres. It’s a very old language in computer science years, sparked by IBM scientist Edgar Frank “Ted” Codd’s 1970 paper “A Relational Model of Data for Large Shared Data Banks.” Later that decade, Don Chamberlin and Ray Boyce, also IBM scientists, were tasked with creating a query language for IBM’s relational database, System R—and created what would eventually be called SQL. The rest, as they say, is history.
Ready? Let's start our exploration of SQL with some basics.
Today, SQL is the most popular data querying language for relational databases. (Some non-relational databases use a different, but SQL-like syntax, too.) It’s used by software developers, database administrators, and others as they query, manage, and manipulate data in databases. And its influence is enormous: there are countless SQL databases, with leading projects including Postgres, MySQL, SQLite, Microsoft SQL Server, and MariaDB.
In fact, SQL has achieved a “lingua franca” status for data access. Similar to how JavaScript is a common example for general-purpose scripting and programming, SQL is omnipresent for anything related to data querying and manipulation. Even in a world of ORMs (object relational mapping software) and abstraction layers, raw SQL is often still necessary.
SQL commands are the instructions you send to the database. Some of the most common SQL commands are CREATE
, INSERT
, SELECT
, UPDATE
, and DELETE
.
The CREATE
command is used to create data structures in a database. CREATE
is used to both create tables and create relationships between those tables. For CREATE
to work, the developer needs to provide the relevant data types or tables for the table or relationships respectively.
1CREATE TABLE customers (
2 id INT NOT NULL AUTO_INCREMENT,
3 name VARCHAR(60),
4 email VARCHAR(60),
5 phone VARCHAR(12),
6 PRIMARY KEY (id)
7);
8
The INSERT
command is used to add entries to a table. An INSERT
command would look something like the following:
1INSERT INTO customers (name, email, phone) VALUES ('Jane Smith', 'jane.smith@example.com', '555-555-5555');
2
This adds a new customer
to the customers
table with the name Jane Smith, email jane.smith@example.com, and phone number 555-555-5555.
The SELECT
command is used to fetch specific columns from a table. Which rows are accessed could be constrained with a WHERE
clause, and all columns can be accessed with the wild-card character *
. For example, a SELECT
statement might looking something like this:
SELECT name, email FROM users WHERE age > 18;
As in the above example, SELECT
commands are often paired with WHERE
clauses which filter the data that is retrieved.
The UPDATE
command is used to make changes to existing entries in a table. An example would look like:
UPDATE users SET email = 'new@example.com' WHERE id = 123;
UPDATE
commands are almost always paired with a WHERE
clause because they’re typically targeting a single entry or tight collection of entries.
Predictably, the DELETE
command deletes entries from a database. For example:
DELETE FROM employees WHERE id = 1;
Like UPDATE
, DELETE
commands are typically paired with a WHERE
clause to specify which data is being deleted. (If all data is to be deleted from a table, developers typically skip DELETE
and use the efficient TRUNCATE TABLE
command.)
SQL is used to define table schemas, which dictate the various types of data stored in a table. It also outlines data’s relationships with other fields in other tables. Table schemas are the core building block of how relational databases work.
For example, using SQL to create two tables with relationships between the tables might look like:
1CREATE TABLE orders (
2id INT AUTO_INCREMENT PRIMARY KEY,
3order_date DATE,
4total DECIMAL(10,2),
5user_id INT,
6FOREIGN KEY (user_id) REFERENCES users(id)
7);
8
9CREATE TABLE users (
10id INT AUTO_INCREMENT PRIMARY KEY,
11name VARCHAR(60),
12age INT,
13city VARCHAR(60)
14);
15
In this example, the orders
table has a foreign key relationship with the users
table. The user_id
field in the orders table references the id
field in the users
table. Because the id
field is the primary key, it is used by the database to efficiently look up values.
If you’re already familiar with SQL, you may have come across some other flavors of it, too. That’s because not unlike many spoken languages, SQL has dialects. Each of these is tuned to a specific database program. While SQL dialects share a similar vocabulary— SELECT
, FROM
, WHERE
, JOIN
, LIMIT
, etc.—dialects might differ in how punctuation is used, what commands have defaults, and what logical elements are permitted. For instance, while SQL itself is not Turing-complete, PG/SQL, or Postgres’s SQL dialect, is. (That is, it has the same logical capacity as any other Turing-complete language.)
A common misconception is that SQL lacks a standardized syntax. But SQL is standardized by ANSI (American National Standards Institute), who are more or less the people for standardization. What’s more at play here is that there are many relational databases, most trying to optimize for a certain use case. Achieving their specific optimizations might require adding something new and independent of the ANSI SQL standard. Eventually, if the optimization becomes popular, ANSI may even add it to the standard. (Though by then, competing databases might have already added their own flavor of the feature—and the dialect proliferation continues!)
It’s no secret that SQL is a language that most developers know, but very few prefer. But why?
One common explanation is that SQL doesn’t look like most programming languages. Take, for instance, this SQL implementation of adding two integers.
1CREATE FUNCTION add(integer, integer)
2 RETURNS integer
3 AS 'select $1 + $2;'
4 LANGUAGE SQL
5 IMMUTABLE
6 RETURNS NULL ON NULL INPUT;
7
This looks very different from a JavaScript implementation of the same function. It’s not exactly controversial to say that the JavaScript version is much simpler:
const add = (a, b) => a + b;
Even in a somewhat tricky language, like C, the function is more readable:
1int add(int a, int b) {
2 return a + b;
3}
4
Granted, this difference in design is well-warranted. JavaScript, C, and most other languages are designed to execute arbitrary logic and have a syntax that represents that. SQL, on the other hand, was designed to transverse and modify relational data. A more typical SQL query would look like the following statement. (This one fetches users that live in California in ascending order based on name
.)
1SELECT name, email
2FROM users
3WHERE state = 'California'
4ORDER BY name ASC;
5
SQL is notorious for returning unhelpful errors. This is partially because SQL statements, unlike most statements in languages like JavaScript, are very long. SQL statements involve filtering data, sorting data, joining data, etc.
For instance, take the erroneous query:
1CREATE TABLE IF NOT EXISTS `category` (
2 `category_id` int(11) NOT NULL AUTO_INCREMENT,
3 `category_name` varchar(255) NOT NULL,
4)
5
which, on MariaDB, results in the following error message:
#1064 — You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'CREATE TABLE IF NOT EXISTS `product` (
SQL lets you know about the syntax error but gives very little detail about where the error is. In this error, it’s implied that the syntax issue is near the start of the statement, but in reality, it’s a missing parenthesis at the end!
Of course, this isn’t too different from other languages—but because those languages tend to have shorter statements (like const a = 45;
), the syntax error can be a bit more obvious.
SQL is often used to interface with massive amounts of data in a database. Given that databases often don’t have an easy “undo” function, a small mistake in SQL could have significant consequences.
Imagine the following update statement:
1UPDATE users
2SET eligible_for_medicare = true
3WHERE age < 65;
4
This statement compiles and runs, but it’s incorrectly setting a flag for users under the age of 65. (The correct statement would flag users above the age of 65.) This is a simple example of how easy it is to make a big mistake. Given SQL’s unique syntax, developers tasked with writing complex SQL are often stressed that a small mistake could mess up a lot of data. While database backups can undo some mistakes, it’s not as simple as hitting Control+Z, and data loss may still occur depending on when the last backup was saved.
No way—you could spend ages digging into how JOINs work, learning to create Views, and uncovering SQL’s memory utilization. But if you’re looking for the essentials, know that SQL is a massively successful language for querying data from relational databases. You can use it to run applications of all sorts, leveraging popular products like Postgres, MySQL, or Oracle. (Try building a SQL GUI!)
And plenty of opportunities lie ahead. With recent advancements in large language models (LLMs) such as ChatGPT, we’re seeing them get pretty good at writing SQL—and optimizing it. (Not only does SQL already have an English-like format, but it also has ridiculous amounts of online documentation that LLMs were trained on.) At Retool, for example, we’ve released a GPT-powered Smart Block that enables developers to write SQL by simply describing a query… and that’s just one example of how SQL can continue to permeate and proliferate.
In short? SQL is flourishing, the ecosystem for writing and managing it is only improving, there’s plenty to learn today and tomorrow—and surviving SQL’s error messages will only make you stronger.
Looking for a SQL GUI for your data warehouse? Need to build a SQL admin panel? Thinking about CRUD operations? We’ve got you covered.
Reader