Tracing my way with tracing-rs

2024-07-19 on this blog

Tracing my way with tracing-rs

Some time ago, I read about the Rust crate tracing and decided it to be an interesting addition to my application tiny-auth. It provides more powerful ways to give Operations insights into the inner workings of an application than the log crate and is about to replace it as the go-to standard Rust logging framework. Here I want to share some insights, patterns and troubles I had while adding the library to my project. I will not introduce the crate itself here, there are already some recommendable articles and articles and videos available.

While I feel mostly comfortable writing Rust, there are still some corners that scare me every time I encounter them. Any troubles I am presenting might have easy solutions that I am simply not aware of. Feel free to reach out to me if you have something to share.

`tracing` in general

Ways to log fields

A useful shortcut to keep in your head is the way, fields are referenced in log messages. If a field is mentioned without any symbol in front, it must have the Value trait implemented by tracing. The trait is sealed, so it can't be implemented on own types. To print other values, one can resort to prefix it with % to delegate to Display or ? to delegate to Debug. In short:

info!(request_id);  // impl Value
info!(%request_id);  // impl Display
info!(?request_id);  // impl Debug

Context is King

I really like the concept of spans in general. For meaningful logging messages it is key to imagine what information a message conveys without the code context around it. Most importantly, are all variables needed to trace (pun intended) the way the code took to get here included? When checking the state the application requires from its environment (DB, files) inside the environment, are the necessary data included (primary keys, search keys)?

Having spans really eases that burden by allowing to add a value to a span as soon as it becomes relevant. This way it is not possible to forget the value in any event inside the span. For long-term log users it might take a while to adapt and not include context in each and every event.

Another quality-of-life aspect are the names of logged variables being captured implicitly. Have you ever encountered a message like this?

WARN could not load 53

Is 53 an order, customer or address id? With tracing you get the context for free:

warn!(customer_id, "could not load");
---
WARN could not load customer_id=53

Have conventions for meaningful spans

Adding to the previous point, a question that quickly arises with span fields is responsibility. Nobody wants to read spans like this:

INFO http_post{user=john}:login{user=john}:user_repo_get{user=john} found

To me it doesn't seem sensible to add fields to a span which are already present in a parent span. So a project should establish some conventions to ensure no (or little) of these redundant cases occur. My convention as of now is to add fields of business relevance to the outermost call to a business logic function. In the provided example this would be the login span. Doing it before might lead to situations where it is too difficult to determine the username at request entry, e.g. when the username is encoded in a basic authentication header. On the other hand doing it too late will lead to events without required information to comprehend, rendering the event useless.

I might add more conventions when encountering different situations. But I think the important takeaway here is to have any conventions at all.

Logging privacy-sensitive data

For any program processing personally identifiable information (PII), privacy legislation like GDPR is highly relevant. In context of tracing this makes events containing PII subject to these laws. E.g. a user can request deletion of all his data if there is no legitimate interest to the entity storing the data. So programmers and system administrators need to keep control what PII is stored where in their systems landscape.

Tracing makes it very easy to assign event data to categories, namely spans. By making sure all PII is only contained in spans of a particular name (talking of conventions again) makes it easy to filter for this data. That way, administrators have full control where the data goes, or even keep it out of the logs entirely.

Span events beat artificial log statements

While instrumentating my application, I more than once felt the urge to add a log statement, just to ensure the data in the span just opened is really logged. The log statement itself didn't convey much meaningful information, so I was not very satisfied with this technique. But I wanted to give the user the chance to get hold of the span data somehow, so I accepted it.

Until I realised that a subscriber is also notified when spans are created, entered, left or dropped. Moreover tracing_subscriber::fmt supports them, rendering all my artificial log statements superfluous. So I got rid of them again in favour of giving interested users a chance to emit span events from the subscriber.

Instrumented instrumentation

It is perfectly possible, to instrument the same function more than once. Why would anybody want that? Remember the privacy issue around PII above. If one function contains a PII parameter, I want it to be contained in a span of a special name while another parameter can go into a different span. This leaves my application with functions like this:

#[instrument(fields(client_id))]
#[instrument(name = "cid", fields(username))]
fn authorize(username: &str, client_id: &str) { ... }

Accessing the span hierarchy?

I came across some cases where I forward-declared a span field, only to span.record() the actual value later, when it became available. This is a useful and supported case. However, for this to work, the span needs to be in scope, which required me to pass the span down some one or two functions in the call hierarchy, polluting function APIs with a cross-cutting concern (lending from AOP terminology).

I would welcome some way to make this goal easier to achieve. One way would be giving the application access not only to the Span::current() but also to the entire stack of active spans. Then the app could search the correct span by name, fields or other attributes.

Another way could be to make span.record() more powerful. Instead of only allowing recording of its own fields, any unknown field should be passed to the parent for recording, eventually bubbling up to a span that is able to store the field.

I want to point out that I take a sole user perspective here. There might be serious design or performance problems that prevent this ideas from being implemented.

`tracing_subscriber` Usage

Coming from log4rs is a bit disappointing

The most concerning reason which initially held me back from immediately switch to tracing was the lack of simple, user-facing configuration. Coming from log4rs which offers a config file format to initialise logging right from application start, this was a major drawback. Especially hot-reloading the logging configuration without restarting the application is a very appealing feature to me.

Switching to tracing for me meant to reimplement this by myself. While tracing-subscriber offers a lot of building blocks for this, it is still not as plug-ang-play an experience as it was with log4rs. To reimplement hot-reloading, the notify crate and the reload module of tracing-subscriber were very useful.

Another problem in that regard is, now as the logging configuration comes from my own file, parsing that file has to happen, before logging is initialised. Obviously, that parse can fail, so to emit useful information about the error there are now two log formats in the application. One which appears sensible to me as the author taking effect right after argument parsing which allows config parsing to use logging already. The second format is configured by the user and is only applied after configuration could be parsed and logging is set up accordingly.

Unhelpful generics

There is one single point that almost made me cry. It is the struct tracing_subscriber::fmt::format::Format. It allows configuration of the way log lines look (JSON, multiline, compact, or full) and of the time format. The root cause of my dissatisfaction is that it is generic over these two dimesions, Format<LogLineLook, TimeFormat>. When trying to build a configurable log line format, this makes it impossible to own a format that just doesn't care what time format it has:

let format: ??? = match config.time_format {
    TimeFormat::Uptime => format.with_timer(Uptime),
    TimeFormat::None => format.without_timer(),
}

There is no common super type the rust compiler can infer for the two match branches. Ultimately, the format is passed to a fmt::layer().event_format(format) which is generic enough to accept all formats. But dealing with the intermediate forms for configuration is impossible. This left me with the only choice of massively duplicating the code to have an outer match-statement around the format dimension, which each branch having an almost identical match-statement for the time format. I really felt like monomorphizing generic code by hand.

If there is any trick that allows deduplicating this configuration, I'd be glad to hear about it.

JSON needs additional configuration

A little curiosity occurs when using a JSON format. Setting a custom event format is not enough to get proper JSON output.

let format = tracing_subscriber::fmt::format::json();
tracing_subscriber::fmt()
    .event_format(format)
    .finish()

This leads to a panic of malformed fields and requires setting .fmt_fields() to JSON, too:

tracing_subscriber::fmt()
    .event_format(format)
    .fmt_fields(tracing_subscriber::fmt::format::JsonFields::new())
    .finish()

Thanks to davidbarsky for pointing this out repeatedly.

Logging inside reload handler deadlock

A funny curiosity when implementing the reload mechanism, was a little deadlock I produced. The Handle::modify() function takes a closure to modify the layer in-place. It is not a good idea to log inside this closure as it leads to a deadlock, maybe because all layers are blocked while the reloading takes place. This fact is not documented, but it appears a sensible explanation to me.

Projecting span values?

As an end-user configuring the log via EnvFilter it would be nice to have a way to express interest only in certain fields of the span. E.g. the HTTP request span contains many fields which might not be relevant to the user and thus it would be nice if she could project them out of the log. Currently, it is only possible to filter for spans which contain a field, possibly with a certain value.

Integrating `actix-web`

Another small stepping stone was activating the tracing_actix_web crate. After setting it up according to the documentation my logs still didn't contain any spans from the HTTP requests. Only after adding an EnvFilter directive like [HTTP request]=trace the spans were added to the output. Much to my confusion, the spans were also present when setting the level to off or error, although the default level of the span is INFO. The only way to disable the span is not to mention it.

Tracing my way with tracing-rs

tracing in general

Ways to log fields

Context is King

Have conventions for meaningful spans

Logging privacy-sensitive data

Span events beat artificial log statements

Instrumented instrumentation

Accessing the span hierarchy?

tracing_subscriber Usage

Coming from log4rs is a bit disappointing

Unhelpful generics

JSON needs additional configuration

Logging inside reload handler deadlock

Projecting span values?

Integrating actix-web

`tracing` in general

`tracing_subscriber` Usage

Integrating `actix-web`