
Don’t miss the great overview of PostgreSQL 14 that Umair Shahid recently put together, PostgreSQL 14 – Performance, Security, Usability, and Observability!
Range data types have been in Postgresql for some time now, however, as of PostgreSQL 14, we have seen the introduction of multi-range data types. Before we cover that, let’s cover the basics of what the heck kind of voodoo black magic a range data type is, and why would you ever want to use it before diving into the multi-range variety.
Range data types store a beginning and end value. The stored data could be an integer or a DateTime (timestamp). Note: if you are interested, you can also create your own custom range types as well. Why would you use this? I can think of a few interesting use cases, but the most generic one is keeping track of state changes and durations. For example:
- This machine was running from X time to Y time.
- This room was occupied between X and Y.
- This sale/price is active within these time frames only.
Range data has been stored for ages without a specific “range” data type. Let’s show you how we used to do it and how range and multi-range types help us.
For our example, let us assume you want to track the days and times the HOSS wears his Cleveland Browns hat -vs- his elephant hat (practical, right?).
Using Classic SQL to Interact with Ranges
Here is how you would classically do something like this:
create table matts_hats_1 ( id serial primary key, matts_hat varchar(100), start_ts timestamp, end_ts timestamp ); insert into matts_hats_1 (matts_hat,start_ts,end_ts) values ('Browns Hat', '2021-10-01 6:00', '2021-10-01 10:00'); insert into matts_hats_1 (matts_hat,start_ts,end_ts) values ('Elephant Hat', '2021-10-01 10:00', '2021-10-01 12:00'); insert into matts_hats_1 (matts_hat,start_ts,end_ts) values ('Browns Hat', '2021-10-01 14:00', '2021-10-01 20:00'); insert into matts_hats_1 (matts_hat,start_ts,end_ts) values ('Elephant Hat', '2021-10-01 22:00', '2021-10-01 23:00');
You can insert which hat I am wearing and when I start and stop wearing that hat. To see what hat I am wearing at any one time, I would use:
yonk=# select id, matts_hat from matts_hats_1 where start_ts <= '2021-10-01 7:35'::timestamp and end_ts >= '2021-10-01 7:35'::timestamp; id | matts_hat ----+------------ 1 | Browns Hat
To see what hat I am wearing from 7 am to 11 am, I would need to do a couple of adjustments. First, I may have started wearing the hat before 7 am and potentially wore the hat after 11 am. In order to find these, I will need to look at start times before 7 am with end times after 7 am and start_times before 11 am with an end time after 11 am. That looks like this:
yonk=# select id, matts_hat from matts_hats_1 where (start_ts <= '2021-10-01 07:00'::timestamp and end_ts >= '2021-10-01 07:00'::timestamp) or (start_ts <= '2021-10-01 11:00'::timestamp and end_ts >= '2021-10-01 11:00'::timestamp); id | matts_hat ----+-------------- 1 | Browns Hat 2 | Elephant Hat (2 rows)
The issue with this approach is that it quickly gets pretty complicated and it negates proper indexes. See as we try and add an index:
yonk=# create index matts_hats_1_idx on matts_hats_1 (start_ts, end_ts); CREATE INDEX yonk=# explain select id, matts_hat from matts_hats_1 where (start_ts <= '2021-10-01 07:00'::timestamp and end_ts >= '2021-10-01 07:00'::timestamp) or (start_ts <= '2021-10-01 11:00'::timestamp and end_ts >= '2021-10-01 11:00'::timestamp); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------- Seq Scan on matts_hats_1 (cost=0.00..1.08 rows=1 width=222) Filter: (((start_ts <= '2021-10-01 07:00:00'::timestamp without time zone) AND (end_ts >= '2021-10-01 07:00:00'::timestamp without time zone)) OR ((start_ts <= '2021-10-01 11:00:00'::timestamp without time zone) AN D (end_ts >= '2021-10-01 11:00:00'::timestamp without time zone))) (2 rows)
Introduction to Ranges in PostgreSQL
This is where the “range” data types come into play. Let’s take a look at this data and query using a range.
create table matts_hats_2 ( id serial primary key, matts_hat varchar(100), hat_ts tsrange ); insert into matts_hats_2 (matts_hat,hat_ts) values ('Browns Hat', '[2021-10-01 6:00, 2021-10-01 10:00]'); insert into matts_hats_2 (matts_hat,hat_ts) values ('Elephant Hat', '[2021-10-01 10:00, 2021-10-01 12:00]'); insert into matts_hats_2 (matts_hat,hat_ts) values ('Browns Hat', '[2021-10-01 14:00, 2021-10-01 20:00]'); insert into matts_hats_2 (matts_hat,hat_ts) values ('Elephant Hat', '[2021-10-01 22:00, 2021-10-01 23:00]');
Out of the box, you can create ranges with integers, numerics, timestamps, or dates, and if you need other data (like a float) you can add these custom ones (check the docs for more info). You can also put constraints on the ranges to prevent data from overlapping or enforce certain rules, but I won’t cover that in this blog.
Now if we wanted to find what hat I was wearing at 7:35 am like above, I would find that with the following:
yonk=# select id, matts_hat from matts_hats_2 where hat_ts @> '2021-10-01 7:35'::timestamp; id | matts_hat ----+------------ 1 | Browns Hat
Notice the different operator when checking for the range? Instead of “=” I am using the “@>” (the containment operator). PostgreSQL has a set of operators that are used when interacting with ranges. In this case, the @> is checking if the range to the left ( hat_ts ) contains the value ‘2021-10-01 7:35’. The most common operators which I use in this blog are:
@> , <@ | See if an element or range is part of or contains the other value |
&& | Do the ranges overlap |
+ | Creates a union between two ranges |
– | Removes one range from another |
There are other operators, so check out the docs for the complete list in the docs.
Now to get the hats I was wearing between 7 am and 11 am with a tsrange data type, I would issue the following command:
yonk=# select id, matts_hat from matts_hats_2 where hat_ts && '[2021-10-01 7:00, 2021-10-01 11:00]'; id | matts_hat ----+-------------- 1 | Browns Hat 2 | Elephant Hat
Notice how that looks much cleaner. In this case, we are checking if the range 7 am-11 am overlaps with the hat_ts field. Indexing will also work with range data types via gist.
CREATE INDEX matts_hats_2_idx ON matts_hats_2 USING GIST (hat_ts); yonk=# explain select id, matts_hat from matts_hats_2 where hat_ts && '["2021-10-01 7:00","2021-10-01 11:00"]'; QUERY PLAN ------------------------------------------------------------------------------------------ Bitmap Heap Scan on matts_hats_2 (cost=4.17..14.51 rows=3 width=15) Recheck Cond: (hat_ts && '["2021-10-01 07:00:00","2021-10-01 11:00:00"]'::tsrange) -> Bitmap Index Scan on matts_hats_2_idx (cost=0.00..4.17 rows=3 width=0) Index Cond: (hat_ts && '["2021-10-01 07:00:00","2021-10-01 11:00:00"]'::tsrange)
I can modify the ranges as well, adding or removing time from what is already stored. Let’s go ahead and remove 30 minutes from my hat-wearing time.
yonk=# update matts_hats_2 set hat_ts = hat_ts - '[2021-10-01 14:00,2021-10-01 14:30]' where id = 3; UPDATE 1 yonk=# select * from matts_hats_2 where id = 3; id | matts_hat | hat_ts ----+------------+----------------------------------------------- 3 | Browns Hat | ("2021-10-01 14:30:00","2021-10-01 20:00:00"] (1 row)
This method works as long as you are shrinking or extending the range. However, removing or adding ranges gets tricky (at least before PostgreSQL 14) when removing the middle of a range or adding a non-contiguous hat-wearing time.
yonk=# update matts_hats_2 set hat_ts = hat_ts - '[2021-10-01 15:00,2021-10-01 15:30]' where id = 3; ERROR: result of range difference would not be contiguous
Introduction to Multi-Ranges in PostgreSQL 14
Before PostgreSQL 14, you would need to add new records to facilitate having multiple non-contiguous blocks. However, in PG14 you have the option to use a “Multi Range” type. Let’s see how this works:
create table matts_hats_3 ( id serial primary key, matts_hat varchar(100), hat_date date, hat_ts tsmultirange ); insert into matts_hats_3 (matts_hat,hat_date,hat_ts) values ('Browns Hat', '2021-10-01', '{[2021-10-01 6:00, 2021-10-01 10:00],[2021-10-01 14:00, 2021-10-01 20:00]}'); insert into matts_hats_3 (matts_hat,hat_date,hat_ts) values ('Elephant Hat', '2021-10-01', '{[2021-10-01 10:00, 2021-10-01 12:00],[2021-10-01 22:00, 2021-10-01 23:00]}');
You can see I consolidated all my time wearing each hat into one record for each. I can run the same queries I used for the range examples here with the same results. Now, however, I can add or remove additional non-contiguous ranges into that record:
yonk=# select * from matts_hats_3 where id = 1; id | matts_hat | hat_date | hat_ts ----+------------+------------+----------------------------------------------------------------------------------------------- 1 | Browns Hat | 2021-10-01 | {["2021-10-01 06:00:00","2021-10-01 10:00:00"],["2021-10-01 14:00:00","2021-10-01 20:00:00"]} (1 row) yonk=# update matts_hats_3 set hat_ts = hat_ts - '{[2021-10-01 15:00,2021-10-01 15:30]}' where id = 1; UPDATE 1 yonk=# select * from matts_hats_3 where id = 1; id | matts_hat | hat_date | hat_ts ----+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------- 1 | Browns Hat | 2021-10-01 | {["2021-10-01 06:00:00","2021-10-01 10:00:00"],["2021-10-01 14:00:00","2021-10-01 15:00:00"),("2021-10-01 15:30:00","2021-10-01 20:00:00"]} (1 row)4
Removing a block of time from the middle of the range now creates a third range. This functionality can be an incredibly efficient and powerful way of storing and seeing when something is active or not. This can also be useful for auditing and determining what conditions were active during a certain period.
Word of caution: Just because you can do something does not mean you should. I have not been able to find any hard limit on the number of distinct ranges you can store in a single field. In fact, I was able to add over 100K distinct ranges into one field without PostgreSQL complaining. That said, the performance of modifying or using that many different ranges is potentially very limiting; this was especially telling when adding or splitting ranges.
More 101 on Using Ranges in PostgreSQL
You may have noticed the ranges are bracketed with “[]“; these have a special meaning in PostgreSQL. They specify whether the range should include the lower/upper bounds value when evaluating the data. Here is a quick primer:
Range | Description |
( | Exclusive lower range value, think of this like a > ( will not include the specified lower value) |
) | Exclusive upper range value, think of this like a < ( will not include the specified upper value) |
[ | Inclusive lower range value, think of this like a >= ( will include the specified lower value) |
] | Inclusive upper range value, think of this like a <= ( will include the specified upper value) |
{} | These are reserved for multi-range data types |
You can visualize this if you load data into a test table:
create table test_range ( int_range int4range, int_mrange int4multirange, ts_range tsrange, ts_mrange tsmultirange, my_desc varchar(100) ); insert into test_range values ('[1,10]','{[1,4],[6,10]}','[2021-10-01 10:30,2021-10-01 22:30]','{[2021-10-01 10:30,2021-10-01 12:30],[2021-10-01 16:30,2021-10-01 22:30]}','Inclusive upper/ Inclusive lower'); insert into test_range values ('(1,10]','{(1,4],(6,10]}','(2021-10-01 10:30,2021-10-01 22:30]','{(2021-10-01 10:30,2021-10-01 12:30],(2021-10-01 16:30,2021-10-01 22:30]}','Inclusive upper/ Exclusive lower'); insert into test_range values ('(1,10)','{(1,4),(6,10)}','(2021-10-01 10:30,2021-10-01 22:30)','{(2021-10-01 10:30,2021-10-01 12:30),(2021-10-01 16:30,2021-10-01 22:30)}','Exclusive upper/ Exclusive lower'); insert into test_range values ('[1,10)','{[1,4),[6,10)}','[2021-10-01 10:30,2021-10-01 22:30)','{[2021-10-01 10:30,2021-10-01 12:30),[2021-10-01 16:30,2021-10-01 22:30)}','Exclusive upper/ Inclusive lower'); yonk=# select * from test_range; int_range | int_mrange | ts_range | ts_mrange | my_desc -----------+----------------+-----------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------- [1,11) | {[1,5),[6,11)} | ["2021-10-01 10:30:00","2021-10-01 22:30:00"] | {["2021-10-01 10:30:00","2021-10-01 12:30:00"],["2021-10-01 16:30:00","2021-10-01 22:30:00"]} | Inclusive upper/ Inclusive lower [2,11) | {[2,5),[7,11)} | ("2021-10-01 10:30:00","2021-10-01 22:30:00"] | {("2021-10-01 10:30:00","2021-10-01 12:30:00"],("2021-10-01 16:30:00","2021-10-01 22:30:00"]} | Inclusive upper/ Exclusive lower [2,10) | {[2,4),[7,10)} | ("2021-10-01 10:30:00","2021-10-01 22:30:00") | {("2021-10-01 10:30:00","2021-10-01 12:30:00"),("2021-10-01 16:30:00","2021-10-01 22:30:00")} | Exclusive upper/ Exclusive lower [1,10) | {[1,4),[6,10)} | ["2021-10-01 10:30:00","2021-10-01 22:30:00") | {["2021-10-01 10:30:00","2021-10-01 12:30:00"),["2021-10-01 16:30:00","2021-10-01 22:30:00")} | Exclusive upper/ Inclusive lower (4 rows)
Final Thoughts
The range data type was already a unique and powerful feature to store multiple non-contiguous ranges. With this addition, developers will have yet another tool in their toolbox to get even more out of PostgreSQL.
As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.
Read Our New White Paper: