Generated columns allow you to store automatically generated data in a table without using the INSERT and UPDATE clauses. This useful feature has been part of the MySQL language since the 5.7 version, and it represents an alternative approach to triggers when it comes to generating data. Also, generated columns can help you make your query easier and more efficient.
Let’s now learn everything you need to know to master generated columns in MySQL.
What is a MySQL generated column?
A generated column is similar to a normal column, but you cannot change its value manually. This is because an expression defines how to generate the value of a generated column based on the other values read from the other columns of the same row. So, a generated column works within the domain of a table, and its definition cannot involve JOIN statements.
In other words, you can think of a generated column as a sort of view but limited to columns. Notice that generated columns are different from SQL triggers, and you can define them only in CREATE TABLE
or ALTER TABLE
statements with the syntax below:
generate_column_ame column_type [GENERATED ALWAYS] AS (generation_expression) [VIRTUAL | STORED] [NOT NULL | NULL] [UNIQUE [KEY]] [[PRIMARY] KEY] [COMMENT 'string']
The AS (generated_column_expression)
clause specifies that the column you are adding or updating to a table is a generated column. The generation_expression
defines the expression that MySQL will use to compute the column values, and it cannot reference another generated column or anything other than the columns of the current table. Also, notice that the generation expression can only involve immutable functions. For example, you cannot use the CURDATE()
function that returns the current date in a generated column expression definition because it is a mutable function.
You can also precede AS with the GENERATED ALWAYS
keywords to make the generated nature of the column more explicit, but this is optional. Then, you can indicate whether the type of the generated column is VIRTUAL
or STORED
. You will learn the difference between the two types in the chapter below. By default, if not explicitly specified in the query, MySQL marks a generated column as VIRTUAL
.
Let’s now see the generated column syntax in action in a CREATE TABLE
query:
CREATE TABLE users ( id INT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(100) NOT NULL, first_name VARCHAR(60) NOT NULL, last_name VARCHAR(60) NOT NULL, full_name varchar(120) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) );
In this example, the full_name
column will automatically store the concatenation of the first_name
and last_name
columns.
What types of generated columns exist in MySQL?
As explained earlier, you can define a generated column as VIRTUAL
or STORED
. Let’s now take a closer look at these two types.
Virtual generated columns
MySQL does not store a generated column marked as VIRTUAL. This means that MySQL evaluates its value on the fly when required, immediately after any BEFORE
triggers. In other terms, a virtual generated takes no storage space.
Stored generated columns
MySQL stores any generated column marked as STORED
. This means that MySQL takes care of evaluating its value and storing it on the disk every time you insert or update a row. In other terms, a stored column requires storage space as if it were a normal column.
Virtual generated columns vs Stored generated columns
Let’s now learn more about the pros and cons of virtual and stored generated columns.
Virtual generated columns
Pros
- Their creation is instantaneous because MySQL only has to change the table metadata
- They take no disk space
-
INSERT
andUPDATE
come with no overhead because MySQL does not need to generate them
Cons
- MySQL has to evaluate them when reading a table, making
SELECT
queries involving them slower
Stored generated columns
Pros
- MySQL can read them as if they were normal columns, which assures a fast retrieval with no overhead
Cons
- When added to a new table, MySQL has to rebuild the entire table
-
INSERT
orUPDATE
comes with an overhead because MySQL has to generate them - They take disk space
Also, notice that you can mix VIRTUAL and STORED columns within a table, and they both support MySQL indexes and secondary indexes. However, as explained in the official documentation, secondary indexes on virtual columns take up less space and memory compared to stored generated columns. So, virtual generated columns are more efficient when it comes to secondary indexes.
Why adopt generated columns?
There are several reasons to adopt generated columns, but the three below are the most important ones.
- They provide you with cache capabilities to make your queries faster: generated columns give you the possibility to define columns containing useful data you can then efficiently retrieve whenever you need.
- They allow you to simplify query expressions: instead of making your queries complex, you can spread the complexity over the generated columns and then use them in simple filter operations.
- They empower you to define functional indexes: MySQL implements functional indexes as hidden virtual generated columns. In other terms, generated columns give you the possibility to define efficient and advanced indexes involving MySQL functions.
MySQL generated columns in action in real-world examples
Let’s now see generated columns in action in some examples coming from my experience as a backend developer collaborating with data-driven startups.
Using a generated column to concatenate columns for consistency reasons
When developing the frontend of your application, you may notice certain patterns in data representation. For example, in sports, players on a team are generally identified with the following format:
first_name [first_carachter_middle_name.] last_name [(jersey_number)]
As you can see, you can easily generate this data field by aggregating the four columns with the following generated column:
string_identifier VARCHAR(255) GENERATED ALWAYS AS (CONCAT(first_name, ' ', IF(middle_name IS NULL, '', CONCAT(LEFT(middle_name, 1), '. ')), last_name, IF(jersey_number IS NULL, '', CONCAT(' (', jersey_number, ')')))) STORED
This would produce:
Cristiano Ronaldo (7)
Lionel A. Messi (10)
In this case, a generated column empowers you to standardize the data filed format directly at the database level. In addition, a stored generated column avoids the inevitable overhead of constructing this field every time you need it.
Employing a generated column to automatically generate hashed IDs
Typically, you use the IDs of your resources in the URL of your website or REST APIs to retrieve the data you need. But publicly exposing your IDs could pose a security problem. Especially, when using autoincremental IDs, which are easy to predict and make scraping or bot attacks easier.
To avoid this, you can think about hiding your original IDs through the use of automatically generated, random, more secure public IDs. You can achieve this with a virtual generated column by hashing your IDs as follows:
public_id VARCHAR(40) GENERATED ALWAYS AS SHA1(CONCAT("PLAYER", id)) VIRTUAL
Notice that to avoid generating known hash values, you can concatenate your ID with a special keyword. Learn more about MySQL encryption and compression functions here.
Defining a generated column to simplify data filtering
When filtering data, some columns are more useful than others. Also, it is often necessary to change the representation of the data to make filtering simpler or more intuitive. Instead of doing this in every filter query, you can define a helpful generated column storing the information you need to perform the filtering, in the desired format.
For example, you could define a generated column to more easily find the players on a basketball team as follows:
filter_string VARCHAR(255) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name, IF(birthdate IS NULL, '', CONCAT(' ', DATE_FORMAT(birthdate, "%m-%d-%Y")))) STORED
Such a column would produce:
LeBron James 12-30-1984
Stephen Curry 03-14-1988
This contains data useful for filtering and with the player’s birthdate in the US format.
Generated columns vs. Triggers
As explained previously, you can only use generated columns within a table. Also, they can only involve immutable functions, and MySQL generates their value in response to an INSERT
or UPDATE
query.
On the other hand, a trigger is a stored program MySQL automatically executes whenever an INSERT
, UPDATE
or DELETE
event associated with a particular table occurs. In other terms, triggers can involve several tables and all MySQL functions. This makes them a complete solution compared to generated columns. At the same time, MySQL triggers are inherently more complex to use and define and also slower than generated columns.
Final Thoughts
MySQL generated columns are undoubtedly a great tool that can serve many uses and bring numerous benefits. Adopting them is easy and effective, although not many database developers use them. Either way, knowing how to take advantage of them may become essential to making your backend application faster, cleaner, and less complex. Explaining what they are and how to use them through real-world examples was why I wrote this article!