A lot of databases start off as a spreadsheet. And if that’s indeed true for you, there’s a good chance you’re thinking about migrating some data from Google Sheets over to a real database like PostgreSQL. There’s no shortage of posts out there about how to upload spreadsheets to relational databases; so this one is going to go a bit deeper. We’ll cover data model considerations, the CSV upload method, when to go manual, and how to use something like pgAdmin to save time.
Thinking about your data model and other prep
A spreadsheet is like a database in the same sense that fixing a car is like making a drink: it’s not. In particular, when we’re talking about Google Sheets:
- Data types are on a cell-by-cell (=record) basis, not a column basis
- Data types are very limited and abstracted
- Missing data and nulls are not enforced
- There are no relations between tables or foreign keys
- There are no primary keys
In a sense, these characteristics make Google Sheets more similar to something like MongoDB than anything else, which is why migrating from a spreadsheet to NoSQL (or really anything to NoSQL) is a lot easier than doing so to something relational like PostgreSQL.
Anyway, the ways in which your spreadsheet differs from your database is basically your roadmap for things you’ll need to do to migrate from one to another.
1. Put together your data model
The first thing you’ll want to do here is consolidate data types for your columns. If you put together the Google Sheet yourself then this is self explanatory. But if another team created it—and especially if there are multiple types per column in said sheet—you’ll need to work together to figure out what the data type should be (apriori) for said column, and how to coerce any existing values that don’t match that type.
Outside of data types, you’ll have to figure out what the setup of tables and their relations should be. Google Sheets is not a relational database (did I mention this already?), so you’ll usually find yourself with all requisite data in a single sheet, likely denormalized, whether it’s the most efficient way to store things or not (probably not). You can copy things over as is, but as a conscientious developer you’re hoping to set this database up properly. As such, you may need to split your sheet into multiple tables and think about how they relate to each other.
2. Create policies for missing data
Although this could conceivably be considered a piece of creating your data model, as part of your migration you will need to set a policy for handling missing data, since Google Sheets will not do this for you (nor should it).
As a reminder, you can set a
NOT NULL constraint in Postgres, which is basically best practice for most columns. Less known is the fact that you can also create a
NULL constraint. Whoever wrote the Postgres docs here had a sense of humor about why you’d do this:
“This does not mean that the column must be null, which would surely be useless. Instead, this simply selects the default behavior that the column might be null.”
They also note that this is not technically part of the SQL standard, so use at your own risk (of getting scoffed at by your staff engineer).
3. Set up user groups and permissions
Though Google Sheets does have some functionality for access controls, it’s built for completely different use cases than relational application databases. You are basically limited to view, comment (this concept does not exist in databases obviously), and edit. Postgres, on the other hand, has a rich and robust set of features for controls and access management.
You’ll want an admin / master role for the application that you plan on accessing this database with. The roles you’ll need will differ based on the use case for the data, e.g. if you want to create a user role for an internal tool vs. an external application. This stuff is mostly self-explanatory and rote but nonetheless is a task you will need to do as part of migration.
With your prep work done, you’re ready to move on to actually moving your data from Google Sheets to Postgres. There are a few ways to do this, and which you choose depends on the shape and volume of your data, plus your priorities and how much time you have.
Method 1: exporting to CSV and uploading
This is the most straightforward way to get data from Google Sheets to Postgres (or any other relational database). The basic gist is: you export the data in your Google Sheet to a CSV, and upload it to your target database. We’ve got a basic set of sample user data you can work with in this sheet here. Start by clicking File → Download → CSV.
Next, you’ll need to create your target table in Postgres. This example actually underscores the importance of figuring out your data model in advance: the lightsaber color column is a bunch of text right now, and we should probably separate it out into a
lightsaber_colors table with a foreign key out to this
users table. There’s also a bunch of missing data we’ll need to deal with:
To actually get this data into Postgres, we’ll use the
COPY command. The basic gist of syntax is as follows:
COPY users(first_name, last_name, email, lightsaber_color,side) FROM '~\downloads\sample_user_data.csv' DELIMITER ',' CSV HEADER;
In a perfect world this would just work, but alas, our world is anything but perfect. In my experience failing to run this command, I’ve run into any number of the following issues:
- Column names in the CSV are not identical to the target table
- Postgres can’t locate the CSV—especially if you’re running it on a server somewhere (e.g. RDS)
- Missing data causes errors
It’s unlikely that you’ll be copying over enough data to push Postgres’s performance limits, but there’s an interesting discussion about
COPY throughput here that’s worth looking at.
Method 1.5: manual upload
The example we’re using is pretty simple, and even then there’s a discussion to be had about breaking our sheet up into multiple relations. If your setup is even this complicated, and certainly if it’s more complex, you may need to upload your Google Sheets data in a more, shall we say, manual way.
I went ahead and messed with the example data to make it more relational. First, we’ve filled in empty cells with
NULL. Instead of a Lightsaber Color column that would end up as
TEXT in Postgres, it’s now a Lightsaber Color ID.
The Lightsaber Color ID is a foreign key integer out to a (theoretical) Lightsaber Colors table, represented temporarily by another sheet:
This way we can make updates to colors without needing to update every single value in our users table.
After creating two tables in Postgres, we can now copy respective CSVs into each table, or just add in the data manually (depending on your volume). Note that we will again need to change the column names to match what we have in Postgres—whoever is maintaining this Google Sheet will probably want to keep them sentence case with spaces, while we certainly should not do that in our database.
With our more relational data model, we can write a query like this to pull the same data we had in our original sheet:
SELECT first_name, u.lightsaber_color FROM users u JOIN lightsaber_colors l ON u.lightsaber_color_id = l.color_id
Purely for those curious, Google Sheets does have a REST API, and you could (at least in theory) use it to migrate data into Postgres. You’d need some sort of server, or at least a function, to do it, and it’s probably more work than it’s worth.
Method 2: using a 3rd party service or GUI
If the CSV method doesn’t tickle your fancy, there are several GUIs you can use to import data into a Postgres database.
The GUI lets you choose your delimiter, header options, etc.—basically just some shapes on top of that
Outside of pgAdmin, whatever GUI you’re using—DBeaver, Navicat, PopSQL, etc.—chances are they have some sort of upload functionality.
Method 3: Retool and Retool Database
Retool also now offers Retool Database, a hosted Postgres database with a generous free tier to get started with a production-ready database right out of the box. It also has a spreadsheet-like UI to make quick edits to data or to your schema: