Bitemporal Struggles in SQLite

2021-01-23 on this blog

I want to share some of the (partially painful) experience I made while incorporating a bitemporal data model into the Android client of stocks. This mostly involves certain individualities of the SQLite database engine and its interface to Android.

How old are you?

The first bitter pill to swallow was that Android ties the SDK version (think operating system version) to the version of SQLite it offers. At the moment stocks targets API version 24 (Android 7.0 Nougat). This fixes the supported SQLite version of stocks down to 3.9.2 which is as old as November 2015 (also thanks to this stackoverflow answer for collecting the mapping of versions). No there are no window functions in this version and a ton of useful features is missing. But increasing the supported Android version of my app lowers the potential user base and is bad for backward compatibility so I had to stick to this version.

What type should you be?

The first question was what data type to pick to model the bitemporal fields. SQLite offers no dedicated time types, so up to now I have modelled them as strings with a fixed format. This was fine as the dates only had to be stored and no computation was done in SQL. But this was about to change as bitemporal queries often compare time values. If I stick to strings this technically works as my data format allows that the lexicographic ordering of timestamp strings equals the chronological ordering of the represented dates. But it is rather inefficient as the worst-case number of character comparisons of timestamp strings is 26 in my case. However, a timestamp can be encoded in as few as 12 bytes at the same information content, using a Java Instant which internally is a long for the epoch seconds and an int for fractional nanoseconds.

So I evaluated different options and found the SQLite blob datatype which was quite suited to my needs: Comparing blobs was done with memcmp() (see Section 4.1) so embedding the bytes correctly would give me a chronological ordering of the values. Also, it is much more efficient as it only needs 12 "character" comparisons compared to the 26 for string timestamps. I could also write a custom datatype mapping for my Java ORM so I can pass in Instant values and the DB keeps blob values. But it didn't work out for a very simple reason: I could not express the SQL expression "now" to work with my custom date format.

This was a strong requirement for good user experience: Assume a SQL query is run at time "now", say t to display data to the user. If t can't be expressed in SQL one can easily bind it as parameter into the statement. Now the background synchronisation updates the data, identifies the "live" query from before which still shows the data to the user . To update the UI automatically the SQL query is run again. But the data is fully bitemporal, and the query will be run with the same t. So it will give the same answer as before although "now" has progressed and is now a different value t'. So binding a fix value as "now" is not flexible enough. You could argue that a new query must be built using a more recent value for t. However this would require a tight coupling between the UI and all potential backend tables from which it draws its data and leads to much more complexity. Doing this kind of ping-pong notifications like

Background Sync: "Hey DB, here is the new data"
DB: "Thanks. Hey UI, $table has changed"
UI: "Oh good you mention, I actually use $table. Please run this new query"
DB: "Here is your new data."

is neither supported nor encouraged as the UI knows about what tables it draws its data from and also that it has to update it if a notification arrives. Android Jetpack is all about having only communication from DB to UI and not backward again. So this is not viable.

Other ways I tried were to use SQLite's datetime('now') which returns a string timestamp and convert it into my blob format. Bad news ahead: There are no functions that might help to convert the string to a blob. There is a feature to bind custom Java functions into the SQLite context but only for API level 30, so not for me. Another idea was to form a "now" value from the number of Julian Days julianday('now') but this yields a real which again I cannot convert into a blob. All was lost. So I used the already mentioned string timestamps as they have all the features I need while being less efficient.

How infinite are you?

Another peculiarity of my code is that now it knows 3 different values for infinity. First there is 'infinity' in the PostgreSQL server code which is a valid time literal there. Since Java doesn't offer a similar literal, PostgreSQL JDBC driver maps this to 9223372036825200000L milliseconds since epoch (see org.postgresql.PGStatement.DATE_POSITIVE_INFINITY). The corresponding timestamp string +292278994.08.16-23:00:00.000000-+0000 is the value the REST API reports to clients and which clients use in their code. However, SQLite doesn't like years beyond 9999 (see Section 5), so the client DB uses 9999-12-31 23:59:59.999999 as its infinity value and they have to be carefully mapped. Luckily my Java ORM allows for custom types to be mapped into SQL by the client (me), see here.

How present are you?

Now everything is nice and we can start querying, can't we? E.g. with this nice query:

select * from Food
where valid_time_start <= datetime('now')
and datetime('now') < valid_time_end
and transaction_time_end = datetime('9999-12-31 23:59:59.999999')

which just selects the Food now as best known. It turns out that this is still not enough. datetime('now') evaluates to 2021-01-23 08:24:44 right now and as you might note it doesn't contain any fractional seconds. Internally SQLite can operate on up to 3 fractional digits (look for "only the first three digits are significant to the result") but won't give them to the user when evaluating this expression. It can also operate in the presence of more digits, but will exercise strong ignorance on them (i.e. doesn't care). Is this a problem? Yes. Consider the case when an update to Food happens and the query above is run as a reaction. If update and query happen in the same second the query will return a state in time before the update happened because fractional digits in the query are cut off, thus the update will occur only later in time. So the user won't see the update having occurred. This is such a common scenario that I could observe it in the very first pilot versions while implementing the feature. So it is unbearable for users and I needed a more present value than SQLite can offer.

Luckily the data itself is full of recent timestamps. For each table the client already has a record in the Updates table which records when the last update on that table happened. Updates itself is equivalent to selecting the latest transaction_time_start value from each table. Such a value is always at least as recent as any update which happened to the data. Originally this table has been useful for server synchronisation and is useful now for telling the time. So the real "now" expression is: (select max(t) from (select datetime('now') as t union select max(last_update) as t from updates)). What an effort!

If you have paid close attention you might now wonder: "You rely on the data to compute 'now'. Why don't you return to the initial blob datatype as now you can express 'now' by just querying Updates?". The answer is: Because the data is bitemporal. It is possible to store data like "In two weeks from now, the Food "Apple" will be gone". The update representing this data happens today, so the latest timestamp I get from Updates is of today. If the query runs again in two weeks without any intermediate updates it will still assert the presence of the "Apple" as 'now' hasn't progressed for two weeks. So we still need a more recent value for 'now' which datetime('now') luckily is just good enough for.

Summary

These are the struggles I encountered while implementing bitemporal data in SQLite. If I had a free wish to improve on this situation, I would like to have a dedicated, efficiently encoded time type in SQLite which allows for precise statements on time. It should be accompanied by functions which leverage the full potential of such a type. As long as this is not the case one has to work around all the implied shortcomings.