Jan Veen

Optimising a PostgreSQL Query

As part of using the stocks bitemporal data model productively I recently encountered a SQL query with a runtime constantly deteriorating to a point where it was very noticeable on end clients. This is how I managed to optimise it.

What is the query?

In a bitemporal data model, every entity is valid for a period of time. During its lifetime it is subject to updates. Every time such an update happens, a new row is inserted into the database stating which values have been valid starting from valid_time_start and ending at valid_time_end. To keep data in a consistent state it is important that this history has no gaps as each entity only has one insertion, some updates and a final deletion. Having two insertions would mess up the model and complicate other queries so we limit the data model here. So the model needs a constraint checking for gaps.

Formally we have a list of intervals for each entity. We now have to make sure, that for each interval u1 and each interval u2 where u2 is chronologically after u1 there must exist a third interval u3 which either overlaps with u1 and is strictly after it or u3 overlaps with u2 and is strictly before u2. It is also valid for u3 to be equal to either of the other intervals if they are adjacent to one another. Validating this property for all pairs of u1 and u2 implies no gaps.

The query below counts pairs of intervals violating this property. If the counted number is 0 the data is consistent.

To keep the example original we consider the entity food and also query for transaction_time_end. This is irrelevant for our discussion. The id identifies all rows belonging to the same entity. Note that start and end form a left-closed right-open interval [start, end).

select count(*)
from food u1, food u2
where u1.valid_time_end < u2.valid_time_end
and u1.id = u2.id
and u1.transaction_time_end = 'infinity'
and u2.transaction_time_end = 'infinity'
and not exists(
  select *
  from food u3
  where u3.id = u1.id
  and u3.transaction_time_end = 'infinity'
  and (
       (u3.valid_time_start <= u1.valid_time_end and u1.valid_time_end < u3.valid_time_end)
    or (u3.valid_time_start < u2.valid_time_start and u2.valid_time_start <= u3.valid_time_end))
)

Why is it slow?

Analyzing the performance problem I considered the query plan from a production dump on a test system generated using explain analyze:

1Aggregate (cost=3544.51..3544.52 rows=1 width=8) (actual time=1821.258..1821.265 rows=1 loops=1)
2 -> Hash Anti Join (cost=187.12..3380.82 rows=65476 width=0) (actual time=1821.251..1821.256 rows=0 loops=1)
3 Hash Cond: (u1.id = u3.id)
4 Join Filter: (((u3.valid_time_start <= u1.valid_time_end) AND (u1.valid_time_end < u3.valid_time_end)) OR ((u3.valid_time_start < u2.valid_time_start) AND (u2.valid_time_start <= u3.valid_time_end)))
5 Rows Removed by Join Filter: 4845498
6 -> Nested Loop (cost=0.57..1386.78 rows=82868 width=20) (actual time=0.074..206.128 rows=114340 loops=1)
7 -> Index Only Scan using food_current on food u2 (cost=0.28..140.06 rows=3719 width=20) (actual time=0.033..1.555 rows=3719 loops=1)
8 Heap Fetches: 0
9 -> Memoize (cost=0.29..0.39 rows=4 width=12) (actual time=0.008..0.044 rows=31 loops=3719)
10 Cache Key: u2.valid_time_end, u2.id
11 Cache Mode: binary
12 Hits: 0 Misses: 3719 Evictions: 229 Overflows: 0 Memory Usage: 4097kB
13 -> Index Only Scan using test5 on food u1 (cost=0.28..0.38 rows=4 width=12) (actual time=0.007..0.019 rows=31 loops=3719)
14 Index Cond: ((id = u2.id) AND (valid_time_end < u2.valid_time_end))
15 Heap Fetches: 0
16 -> Hash (cost=140.06..140.06 rows=3719 width=20) (actual time=4.362..4.363 rows=3719 loops=1)
17 Buckets: 4096 Batches: 1 Memory Usage: 236kB
18 -> Index Only Scan using food_current on food u3 (cost=0.28..140.06 rows=3719 width=20) (actual time=0.023..1.371 rows=3719 loops=1)
19 Heap Fetches: 0
20Planning Time: 8.084 ms
21Execution Time: 1822.818 ms

The test system is much slower than the production system, but to optimise queries a slow system is ideal as differences in execution time are more observable and extreme.

Kind PostgreSQL prints the execution time (21) right below the plan: 1822 ms is not that fast. So how was the result computed? First (7) it fetched all entities u2 using an index which takes negligible 1.555 ms and yields 3,719 rows. Then (13) u1 is joined with u2 forming all pairs where u2 is after u1. Stepping out of this depth (6) we see that this query yields 114,340 rows as a result which took 206 ms.

I couldn't find out what the Memoize node (9) actually does but it wasn't needed to get a rough understanding of the query.

Separately (16) we do another index scan to get u3 again yielding 3,719 rows. This result set is finally (2) joined with the result from before filtering out a total 4,845,498 rows.

To summarise this looked to me like a naive implementation. Assuming we would only have one id with n intervals it had a runtime of $O(n^3)$ which doesn't scale well.

What can we do?

First I did some experiments if any kind of index would accelerate the query, using the nice tool hypopg which fools PostgreSQL into believing there is a fast index while it is only a no-op dummy but saves us from actually computing and storing the potentially large index. Unfortunately none of the indices I came up with made a huge difference, runtime was still $O(n^3)$ and it was slooow.

So I stepped back and thought about the problem more abstractly. We were given n intervals and wanted to find a gap. This could easily be solved by sorting all intervals by their start value and then for each consecutive pair check whether the end of the first overlaps the start of the second. A violation to this overlap would constitutes a gap. This algorithm intuitively has a runtime of $O(n\ \log(n))$ dominated by sorting the intervals.

The careful reader might notice this procedure is wrong. What I didn't state before was that when checking this gap constraint we may assume that intervals don't overlap. Then the procedure works fine.

Thinking even more abstractly when forming the union of all intervals should again result in a single interval rather than a set of non-contiguous intervals. Is there a way to express any of these ideas in PostgreSQL?

Crunching the documentation shows that we can utilise PostgreSQL's mighty type system which has a native datatype for intervals ("ranges") and even multiranges representing a set of intervals potentially having gaps. It also has a comprehensive list of functions and operators on them Very interesting.

The solution can be expressed elegantly as follows: Grouping the entities by id and then aggregating all their intervals by multirange union, is the interval formed by the start and end of the multirange completely contained inside the multirange? If not, there must be a gap. This is the query:

select versions_as_range.id
from (
  select id, tstzrange(valid_time_start, valid_time_end) part
  from food
  where transaction_time_end = 'infinity') versions_as_range
group by id
having not range_agg(versions_as_range.part) @> tstzrange(lower(range_agg(versions_as_range.part)), upper(range_agg(versions_as_range.part)))
limit 1

tstzrange() forms a range from two values, range_agg() forms the set union of a set of ranges. The @> operator computes containment.

How does this query perform? Let's look at the query plan:

1Limit (cost=0.28..0.85 rows=1 width=4) (actual time=7.731..7.733 rows=0 loops=1)
2 -> GroupAggregate (cost=0.28..174.96 rows=309 width=4) (actual time=7.728..7.729 rows=0 loops=1)
3 Group Key: food.id
4 Filter: (NOT (range_agg(tstzrange(food.valid_time_start, food.valid_time_end)) @> tstzrange(lower(range_agg(tstzrange(food.valid_time_start, food.valid_time_end))), upper(range_agg(tstzrange(food.valid_time_start, food.valid_time_end))))))
5 Rows Removed by Filter: 311
6 -> Index Only Scan using food_current on food (cost=0.28..140.06 rows=3719 width=20) (actual time=0.044..1.479 rows=3719 loops=1)
7 Heap Fetches: 0
8Planning Time: 0.467 ms
9Execution Time: 7.809 ms

First (6) it fetches the entities as before using an index. But apart from that there is only a grouping by id (3) and all the work of computing unions and checking containment happens inside the built-in range functions (4). The overall execution time as been reduced to 7 ms which is a speed-up of 260. Amazing.

And in the end this made stocks much faster when putting food onto its shopping list which is one of the use-cases checking this property.