dateo. Coding Blog

Coding, Tech and Developers Blog

OpenTelemetry
.NET
Observability
Tracing

Best practices for OpenTelemetry in .NET

Dennis Frühauff on August 9th, 2023

Whoop, this is a different article than the original newsletter stated. Human errors do happen! :-)


This article attempts to collect some of the best practices when it comes to integrating and observability-related concepts into your application.


Specifically, the integration is done via the standard defined by OpenTelemetry. Code examples will be mainly in .NET, but the general concepts are language-agnostic. OpenTelemetry provides integrations for many languages and frameworks.


Over the past months, I have successfully integrated and used OpenTelemetry in various .NET applications. While there are many good resources out there, I had to find answers to some of my questions the hard way. That is why I want to share my findings in this article with you. But let's cover the basics first.


What is observability?

Observability is the answer to the question, whether you can assess the internal state of your system by asking questions from the outside. This becomes especially useful in the context of debugging distributed systems.


Typically, there are three independent tools to improve the observability of your system:


  • Metrics: Often scalar measures that indicate the current (or better: past and present) state of your system. This includes health checks, workload, resource consumption, etc. Metrics are helpful to indicate that there is a problem.
  • Logging: The standard way to help debugging applications for many years. Logs are helpful in limited scenarios but they lack important information in distributed systems, or at least make it hard to analyze them.
  • Traces: Traces introduce the idea of causality to your debugging. What message triggered this activity and how long did that take? Where did that request come from initially? Tracing will help you understand the bigger picture in your environment.

What is a trace? What is a span?

The terminology between OpenTelemetry, Microsoft's System.Diagnostics, and Elastic can be a bit confusing, so let's take a look at this first:


Open Telemetry .NET Elastic
Trace(r) ActivitySource Trace
(Telemetry)Span Activity Transaction
SpanContext ActivityContext Metadata

To make things easier, we will focus on using the terms Trace and Span to conform with OpenTelemetry standards. In terms of .NET, we will have to know that we are talking about activities for this. So just remember that they are interchangeable.


Trace

A trace is the thing that you want to do in your application:


  • Making a POST request and inserting new data via an API.
  • Running the calculations for a series of entities in your system.
  • Ordering a product when clicking a specific button in the frontend.

It is (usually) not something that we create ourselves, but rather something that is automatically created for us, e.g., by existing HTTP instrumentation.


A trace consists of


  • a unique identifier (Trace ID)
  • child spans,
  • tags (i.e., metadata)

Span

A span (or Activity) is a small portion of the work that is being done inside of the trace, e.g.,


  • retrieving data via HTTP
  • calling out to the database

Data-wise, it is only a set of structured data that can have:


  • a unique identifier,
  • a correlation ID (the Trace ID),
  • a duration,
  • a timestamp,
  • the identifier of its parent span,
  • tags (i.e., metadata)

These concepts differ greatly from logs in that they introduce the concept of causality between operations. The complete hierarchy of operations (and also the duration) can be investigated after the ingestion of the full trace.


OpenTelemetry in .NET

The integration packages in OpenTelemetry make use of the System.Diagnostics namespaces in .NET. That makes it easy to upgrade existing applications to use tracing, because there will be no external dependencies from your application code to the actual packages of OpenTelemetry. Only during your startup, you will have to setup some code for OpenTelemetry.


Setup

The main NuGet package for your application to use OpenTelemetry is


dotnet add package OpenTelemetry.Extensions.Hosting

Then, in an ASP.NET application, you can configure tracing like so:


var builder = WebApplication.CreateBuilder(args);
builder.Services
	.AddOpenTelemetry()
	.WithTracing(
	    configurator =>
	    {
	        configurator.AddSource("myapplication");
	    });

This is the complete application setup to tell OpenTelemetry to collect traces from your application.


The most important line is the part .AddSource(...): You can create as many activities ( == spans) in your code as you like. As long as you do not tell OpenTelemetry to listen to them in this line, it will not collect and publish them.


In ASP.NET applications, you can also add auto-instrumentation for infrastructural code, such as all HTTP requests or database calls via EntityFramework:


builder.Services  
    .AddOpenTelemetry()  
    .WithTracing(  
        configurator =>  
        {  
            configurator.AddSource("myapplication");  
            configurator.AddAspNetCoreInstrumentation();  
            configurator.AddEntityFrameworkCoreInstrumentation();  
        });

There are many different auto-instrumentation packages available from the NuGet store.


Exporting and viewing traces

In order to view the collected traces somewhere, the easiest thing we can do is to export them to the console. While that is not a good solution to visualize traces, at least we will see that the actual code is working. The console exporter can be found in the OpenTelemetry.Exporter.Console NuGet package:


builder.Services  
    .AddOpenTelemetry()  
    .WithTracing(  
        configurator =>  
        {  
            configurator.AddSource("myapplication");  
            configurator.AddConsoleExporter();  
        });

Creating activities/spans in your application code

First of all, we need a class to provide us with an ActivitySource for our whole application:


public static class DiagnosticsConfig
{
    public const string SourceName = "myapplication";
    public static readonly ActivitySource = new ActivitySource(SourceName);
}

Note: Whether or not to use static, please refer to the FAQ below.


With this in place, we can actually create a new span in our code:


[HttpGet]
public async Task GetProducts(string category)
{
    using var activity = DiagnosticsConfig.Source.StartActivity("Retrieving the entities");
    activity?.SetTag("category", category");
    await Task.Delay(TimeSpan.FromSeconds(0.5));
    // ... heavy work
    return Ok(...);
}

Calling StartActivity will create a new span and set it as the current span, right below the one automatically created by the HTTP request. It will also display the name we provided in the method call.


Via SetTag we added custom payload as metadata to this span which we can filter on later.


Please note that we are using null-propagation on the activity object: If no one is listening to our traces, the ActivitySource will return null to us, which is a performance consideration.


Wherever we visualize this trace (Console, Kibana, Grafana, Jaeger, etc.) it will display something like this:


HTTP GET ------------------------550ms--|
    Retrieve the entities ------512ms--|
        "category": "food"

There are different kinds of spans that you can create using various overloads on the StartActivity method:


  • Internal related to code that is purely related to your application.
  • Client Outgoing and synchronous request to an external component/application.
  • Server Incoming and synchronous request from an external component/application.
  • Producer Asynchronous output provided to an external component/application.
  • Consumer Asynchronous input received from an external component/application.

You can also use methods on the activity, for example to set a custom status or report exceptions that happened during the processing of your requests:


activity?.SetStatus(...);
activity?.RecordException(ex);
activity?.AddEvent(...);

In this way, it is very much possible to get rid of logs as your primary debugging source.



With the basics off the table, let's cover more topics in a Q&A style below:


FAQ


How many ActivitySources does my application need?

Strictly speaking, it only needs a single one. And while it is generally possible to create many activity sources in your code, it is absolutely not necessary, because there is no benefit from it. That is also why the easiest way to go about is to have a static instance per application somewhere in your domain logic.


Still, other instrumentation libraries will create their own activity sources, like Hangfire and MassTransit (for which you will have to add the source names during startup).



Should I inject the ActivitySource via DI?

Please don't. There is absolutely no benefit from doing this.
ActivitySources are tracked by name and not by instance, so you might as well define them static on class level rather than in a seperate class, but why would you?
It does not increase readability, it does not improve testability, it just creates boilerplate code.


So please do not inject ActivitySource or any kind of wrapper object into your code.



How much metadata should I add to my spans?

Add as much as you need. It is very likely that you do not yet know what you might want to know, so it is best to have as much metadata available as possible. Just put all the information via tags inside of your span.


If you think that this will have a significant effect on performance, you are optimizing on a microsecond scale. It is questionable if .NET is the right ecosystem for you in that case.



Should I follow naming conventions for tags?

Yes, absolutely. It is a good idea to have a constants class defining how you want your tags to look like:


public static class DiagnosticsConfig
{
    public const string ProductId = "product.id";
    public const string ProductCategory = "product.category";
}

This will make your code cleaner and more maintainable. Using dot-notation will also enforce structuring on your spans.


It might even be a good idea to share these conventions across applications and teams.


Also, there are semantic conventions defined by the OpenTelemetry project.



Can I enrich spans with static data?

Yes, in the setup call, you can define what information should be added to every trace your application emits:


builder.Services
    .AddOpenTelemetry()
    .ConfigureResource(resourceBuilder =>
    {
        resourceBuilder.AddAttributes(new List<KeyValuePair<string, object>>
        {
            new("node.id", 123)
        });
    })
    .WithTracing(
        configurator =>
        {
            configurator.AddSource("myapplication");
        });

Can I enrich spans with non-static data?

Yes, this is done via processors, which are very much a pipeline-like approach:


public class UserIdProcessor : BaseProcessor
{
    public UserIdProcessor(IHttpContextAccessor httpContextAccessor)
    {
        _httpContextAccessor = httpContextAccessor;
    }

    public override void OnStart(Activity data)
    {
        data.SetTag("user.name", httpContextAccessor.HttpContext?.User...);
    }

    public override void OnStop(Activity data)
    {
        
    }
}

Processors can be added during the setup phase of your application.
Please be aware that, since they are affecting every single activity in your code, you should not perform expensive operations in these methods or you might run into performance problems.


Processors are the right choice if you are sure you want to add non-static data to every single span.


If you only want to add information to some root span, for example the one related to the HTTP GET request, you can do it like this:


Activity.Current?.SetTag("product.id", productId);

In this way, you have access to the currently active span without the need to create a separate one below it.



Should my application send everything to the OpenTelemetry endpoint?

In general: Yes. But there are different kinds of sampling strategies within OpenTelemetry:


  • Head sampling: Sampling within your applicaiton, e.g., making sure that your data is based on a percentage factor.
  • Tail sampling: Sampling the data in an OpenTelemetry Collector, which is basically a proxy sitting between all applications and the actual data ingestion into your storage provider.

While head sampling is easy to use, it bears the problem that you might lose data. The sampler will decide during creation of the root span whether or not this specific trace should be sampled. At that point, you do not know whether your code might fail somewhere downstream. If it does, the information is lost completely.


Therefore, it is generally a good idea to use a collector and let that do the sampling (or filtering) of your data before it reaches your storage. That way, you have much more control over it.



Can I try out and use anything else locally (rather than using the console)?

Of course. One very simple solution is to run an instance of Jaeger inside of Docker and to use the package OpenTelemetry.Exporter.OpenTelemetryProtocol


The docker-compose file looks like this:


version: '3.7'
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "13133:13133"
      - "16686:16686"
      - "4317:4317"
    environment:
      - COLLECTOR_OTLP_ENABLED=true
      - LOG_LEVEL=debug

Then, in your application startup, add a line like this:


builder.Services  
    .AddOpenTelemetry()  
    .WithTracing(  
        configurator =>  
        {  
            configurator.AddSource("myapplication");  
            configurator.AddOtlpExporter(o => o.Endpoint = new Uri("http://localhost:4317"));  
        });

You can access the Jaeger dashboard via http://localhost:16686.


Conclusion

Of course, this article does not serve as a full tutorial on how to set up OpenTelemetry on your productive applications. I hope it is rather a quick reference for you, sparing you some of the research effort I had to go through to get started with using distributed tracing in my projects.


Feel free to reach out to me in case you'd like to know more on this topic.



Please share on social media, stay in touch via the contact form, and subscribe to our post newsletter!

Be the first to know when a new post was released

We don’t spam!
Read our Privacy Policy for more info.

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.