Understanding Symbol Servers

Category:

DevOps

Programming

Tags:

#DevOps

#.NET

Published: August 10, 2023 Reading Time: 7 min

This is a post in the series Mastering PDBs and Debugging. The posts in this series include:

Aug 3, 2023 - What Every Developer Should Know About PDBs
Aug 10, 2023 - Understanding Symbol Servers
Aug 17, 2023 - An Introduction to SourceLink
Aug 24, 2023 - Understanding .NET Debug vs Release
Sep 7, 2023 - Forcing .NET Into Debug Mode

In the past, a symbol server was an essential part of the debugging process. In fact, when John Robbins wrote his original articles, symbol servers were an important (and recommended) part of the development process. A symbol server is simply a centralized system that stores the PDB files (debugging symbols) for later access. This allows the compiled code to be shipped separately from the symbols required for debugging. By pointing a debugger to the symbol server, the necessary symbols could be downloaded dynamically and pulled into the debugger on-demand. It is essentially nothing more than a well-defined file structure served from a file share or using HTTP. Some of the most common debugging environments – Visual Studio, WinDBG, and JetBrains Rider and CLion – all understand how to query and retrieve files from a symbol server.

First, PDB files undergo a process called source indexing. This process embeds the path to the source code details about the source control system into the PDB, associating the file with the specific changesets/commits used to compile the code. This prevents you from needing to try to locate the source code associated with a specific build. It also makes it easy for developers to step directly into the code, assuming they have access to the related source control system. Historically, this relies on the PDBStr tool (from the Windows SDK’s Debugging Tools) to embed the required details.

Next, the files are moved into a symbol store. This is typically a series of structured folders based on the name of the assembly and a key-based structure built using components of the executable/PDB. They key concatenates the MVID Guid and the age field into a hexadecimal string. This allows a specific build of the PDB to be quickly located using a known path. From there, it can retrieve a pointer to a different location, a compressed file, or the uncompressed file. Historically, this processing was done using the symstore tool.

Both the PDB and its assembly are copied into the symbol store. This allows memory minidumps to store minimal header information, reducing the file size. This information is then used to retrieve both the binary and its symbols for debugging. It also supports cases where the binary file may not be directly available on the file system; the debugger can still access the original file by its name and “key”.

If you’re using Azure DevOps, the PublishSymbols task can index Microsoft v7 (full) PDBs and add them to the package feed’s symbol store, automating the process. If you’re publishing to NuGet, and want to use their symbol server, you can create a .snupkg with Portable PDB symbols compiled using Visual Studio 15.9 or higher. NuGet will add the PDB files to its public symbol server and make them available for download. The files must be source-indexed (or SourceLinked) before uploading.

Public and private symbols

There are two broad types of symbols. The first is public symbols. These provide function names and global variables, the minimum requirements for debugging and stack traces. The other is more detailed and contains local variables, line number details, and data structure information. These are called private symbols. When both are present in the same PDB file, it’s referred to as a full symbol file. This is the nature of managed code PDBs and the origin of calling a Microsoft PDB a “full” PDB. When only the public symbols are present, it’s referred to as a stripped symbol file.

Note

At one point in time, Microsoft embedded debugging information into the native code executables. This information was normally stripped out and placed in a separate file which contained the public symbols. These legacy files had the extension .dbg and are still mentioned in the documentation for symbol servers.

For native code, a full symbol file may be measured in MB while a stripped symbol file can be KB. This significant difference in code sides (and the ability to hide some of the implementation details) is sometimes necessary. A prominent example of this is the Windows operating system. Microsoft provides symbols for the native binaries, but they do not want to share the enough details to recreate the underlying code and they need to minimize the download size for the symbols. In fact, it’s the sheer size of these symbols that led to the need for a symbol server. Including the files locally would potentially double the size of the operating system to support an uncommon debugging need. The binplace tool is used to convert a full symbol file to a stripped symbol file. For most first-party application scenarios, this is not needed or recommended.

Using a symbol server

To use a symbol server, developers need to be able to reach the symbols. This typically requires registering the server in with the debugger. As discussed in the previous post, the debugger tries to resolve symbols for the assemblies that are currently in memory. If the files cannot be found locally, the debugger will attempt to retrieve the files from the symbol server. If the developer has access to the source code, then the related source will be downloaded and displayed to support the debugging experience.

Limitations of symbol servers

Unfortunately, symbol servers suffer from several challenges. For example, the servers themselves can become overwhelmed. When the same symbols are frequently requested, it can lead to bottlenecks reading the files. The issue is more pronounced when using file server shares. The biggest problem, however, is that many scenarios don’t support symbol servers. For example:

Snapshot Debugging can take a snapshot (minidump) of the memory state at the moment an exception occurs allowing a debugger to step into the code as if it was active. Azure Application Insights supports this functionality. This functionality requires the PDBs to be available during the snapshot in order to resolve the just-in-time (JIT) native code to the related .NET symbols and line numbers. Without those files, you may be required to debug the native code directly.
For similar reasons, .NET Native and NativeAOT expect the symbols to be locally available and cannot perform on-demand loading from a symbol server. In addition, to create a usable PDB, the original PDBs for the assemblies (and any dependencies) must be available at compile time. Without those, the native PDB that is generated may be incomplete.
Production systems where stack traces with line numbers may be needed to troubleshoot errors. Full stack traces are only possible when the symbols are co-located with the binaries.
Mirrored package management feeds do not also mirror the symbol server. This blocks symbol servers for enterprises that choose to block or limit access to NuGet’s feeds and mirror an approved package list locally.
Symbol servers are not accessible from air-gapped networks.

Despite all of this, the biggest limiting factor is that most package management systems lack symbol server support. Azure DevOps does not support source indexing portable PDBs ( source). NuGet supports indexing portable PDBs, but not full (Windows) PDBs. Artifactory supports Microsoft PDB v7 (C/C++ multi-stream file) and portable PDBs ( source).

Microsoft has been working for years on a longer-term (and open source) solution to this problem. At the moment, however, there are limited options. All of these issues makes it substantially more difficult to consider a symbol server for most application needs.

Considering the alternatives

One of the primary advantages of a fully-configured symbol server is the source indexed PDBs. In fact, it’s often a driving consideration when teams implement a symbol server. It allows directly stepping into the code being debugged. With managed code, all of the metadata exists in the executables, so there are fewer reasons to need to provide a public endpoint with stripped symbols.

This led to an interesting discussion. With managed code, having PDBs that are source indexed has value without having a symbol server present. By having the PDBs embedded or distributed with the assemblies, we get the best support for troubleshooting any problems that might arise. Having the files reference the source code allows for a complete debugging experience. Having a documented and standardized way to source index the PDBs also means that it is possible to publish PDBs to NuGet that can be mapped back to the original code.

The result of this line of thinking was an improved way of source indexing files and in many cases, a replacement for the symbol server: Source Link.

But that’s a topic for another post. 🤔