Ken Muse

An Intro to SourceLink


In the previous posts, we explored the nature of PDBs and symbol servers. Using PDBs, we have a way to debug our code. Symbol servers provide a way to associate the PDBs with the original code and to distribute the PDBs without including them with the compiled application. This was especially important in the early days of the internet. The nature of this separation, however, prevents PDBs from working in important situations (such as for capturing memory minidumps or creating stack traces on a server). Almost as challenging, the primary tool used for source indexing is closed-source and Windows based; without this step, it was difficult to step into the source code used to compile the application.

Developers needed a way to make debugging easy, eliminating the need for a symbol server while still allowing them to step into the related source code. Several years ago, the .NET community rallied around a standardized approach: Source Link. SourceLink uses NuGet packages that are specific to the source control provider. The build tasks in this package insert additional details into the PDB according to an extensible, open specification. This is similar to traditional source indexing, but is an open standard. It also allows multiple source code providers to co-exist, unlike the original source indexing. This makes it possible to correctly link to code from Git submodules.

Out of the box, SourceLink can support:

  • Azure DevOps (Git and TFVC)
  • Azure DevOps Server (form)
  • GitHub and GitHub Enterprise Server
  • GitLab
  • Bitbucket
  • GitWeb
  • Gitea

In addition, Visual Studio, JetBrains Rider, and other IDEs have been enhanced to support SourceLink. Visual Studio can even use SourceLink to show the appropriate source code when a user chooses to “Go to Definition”. JetBrains Rider offers a similar feature, “Navigate to Sources from Symbol Files”.

Limitations

The current implementation supports mapping the source code for .NET and .NET Framework assemblies, but only supports mapping to the repository for Microsoft Visual C++ packages. Embedding the source code, commit SHA, or specific URL is not yet supported for native binaries. There is also an open issue to improve SourceLink support for Native AOT symbols. At the current time, those details are removed from the final generated debug files. Currently, support is targeted for .NET 9.0.

The SourceLinked PDB

The system relies on a simple premise – every file path should be composable as a URL that can retrieve the file (with proper authentication). The NuGet package for each provider is responsible for providing the URL format and customizing it if necessary. For example:

  • Azure DevOps TFVC repos use https://{collectionUri}/{projectId}/_versionControl?version={revision}&path=/*
  • Azure DevOps Git repos use https://dev.azure.com/{org}/{project}/_apis/git/repositories/{repoName}/items?versionDescriptor.version={commitId}&versionDescriptor.versionType=commit&api-version=4.1-preview&scopePath=/*
  • GitHub repos use https://raw.githubusercontent.com/{owner}/{repo}/{commitId}/*

To minimize the need to rewrite all of the paths in the PDB, SourceLink maps path prefixes to a URL prefix. For cases where a generic path mapping does not make sense, SourceLink can map a specific file path to a specific URL.

A JSON file containing these mappings is embedded in the PDB files CustomDebugInformation table (0x37). At debug time,the JSON file is used to construct a complete URL for retrieving the source code. The source code is then downloaded, assuming the user is able to authenticate with the remove system.

As a practical example, consider a build using source code from https://github.com/mycorp/myapp with a commit cc51fed2. The generated JSON might look like this:

1{
2    "documents": {
3        "/_/*": "https://raw.githubusercontent.com/mycorp/myapp/cc51fed2/src/MyApp/*"
4    }
5}

Every file path in the PDB that starts with /_/ will be instead prefixed with the provided URL. For example, /_/Controllers/HelloController.cs would become https://raw.githubusercontent.com/mycorp/myapp/cc51fed2/src/MyApp/Controllers/HelloController.cs.

SourceLink is not specific to standalone PDBs. It also works with embedded PDBs. If you’re interested in a bit more technical depth, the Microsoft DevBlogs provide a deeper exploration of SourceLink.

With all of these changes, it’s common to wonder which system to use. SourceLink is now recommended for all projects and PDBs to optimize the debugging experience. It’s important to know that the two are not mutually exclusive – symbol servers (including NuGet) can use SourceLink enabled PDBs. As you’ve seen, this has some advantages over the traditional source indexed PDBs.

The general recommendation today is to use embedded PDBs. This ensures the symbols are always available with the binary. It eliminates the need for a symbol server and provides maximum debugging support for Native AOT. For production applications, it is important to ensure that you are using portable or embedded PDBs to have access to stack traces, snapshot debugging, and mini-dumps. You’ll find the minor increase in size is worth it the first time you try to track down a memory leak or race condition!

Ahead-of-time native compilation requires PDBs for all dependencies to be available in order to generate the final native code debugging information. In this case, utilizing SourceLink and local PDBs ensures that the generated executable can create usable stack traces and support full debugging. This toolchain does not support symbol servers and requires local symbol files. Without those files, profiling and debugging can have “ degraded or broken results.” If your code can be used in Native AOT applications and you want to support debugging, consider embedded PDBs.

If you’re distributing libraries using NuGet, you’ll need to weigh the options. Traditionally, the recommendation has been to use a .snupkg to contain the symbol files. This makes the library package smaller and exposes the symbol files via the NuGet symbol server. This may be optimal if you expect a large number of users, a limited need to debug into the code, and no need to retrieve a complete stack trace. If you want to allow the library to be debugged when included in a Native AOT application, you’ll need to embed a portable PDB in the main package or use an embedded PDB. Remember that developers can always choose to trim or strip the symbols at build time, but they cannot recreate missing symbols.

What about Microsoft?

So why does the .NET runtime use a symbol server? Their code is broadly incorporated into all .NET executables, so embedding or including the debug symbols with every distribution would substantially increase the size for all applications. It’s also incredibly rare for developers to need to step into the framework source code. Typically, if suffices to know which framework API was called from the application and the call stack that led to that call. Consequently, it makes sense to only download the debug symbols when needed rather than embedding them in every application.

That said, several prominent voices at Microsoft offer their recommendations for creating debugger-friendly applications. Their approaches are generally aligned with the recommendations above.

Vance Morrison, an Architect at Microsoft (and a Performance Architect for the .NET Framework), recommends enabling SourceLink. “Most Microsoft code is SourceLinked now and if its is not, it is a bug.” He also recommends that developers “should include their (portable) PDBs in their nuget packages. Microsoft does not do this but only because of special Microsoft circumstances (given in my previous comment). We strongly encourage people to take these steps. Ultimately we want to have a secure symbol server story that everyone can use, but that that is far enough away that people should NOT wait. They should be including PDBs now.”

Tomáš Matoušek, a Principle Software Development Engineer at Microsoft, offers slightly broader recommendations. He observed, “I don’t think there is a silver bullet approach that works for all cases. It really depends on what the DLLs and PDBs are used for.” He suggested using portable PDBs in a .snupkg for all libraries being published to NuGet. Developers will then only pull the symbols if they require them. However, for internal applications, CI builds, or applications being deployed to production, use embedded PDBs with SourceLink. It keeps the debugging components self-contained, avoids managing a symbol server, and ensures that stack traces and debug information are available. He observes that this is also the approach used by the Bing team (and others) with thousands of assemblies.

Finally, Claire Novotny, a PM on the NuGet team summarizes, “our recommendation is to use Embedded PDB’s largely for the simplicity and compatibility aspects. There’s no additional package required (the .snupkg), no symbol server required (client config or server), and you’ll always have it for diagnostics since you’re far less likely to lose your running binary than some other artifact.”

Understanding the tradeoffs

In his seminal article on PDBs, John Robbins observed, PDB files are as important as source code. He highlighted that it was important to “love, hold, and protect your PDBs." After a decade, these observations still hold true. At the same time, a few things have changed. Instead of relying on symbol servers, we can now embed the PDBs. We can also use SourceLink to make it possible to securely access the source code without requiring a symbol server.

The most important thing is to have a clear strategy for enabling developers to fully use PDBs; any modern strategy should include SourceLink. Without a strategy, you could be spending far more time than necessary chasing down issues! Worse, you won’t be able to take advantage of solutions like snapshot debugging to handle hard-to-reproduce issues. Hopefully having an understanding of the issues and the benefits of these different approaches will help you to find a path that’s right for you and your teams.