Detect and Remove Dead Code with Roslyn

Jason Ge
10 min readJan 19, 2022

Any large project will create lots of dead code (code that is not used by the application) overtime. How to identify these dead code and remove them are a challenge work to do.

NDepend tool, as a static analyzer, can scan the whole solution and find out the potential dead types. However, there are several limitations to this report:

  1. By default, it only scans the internal types (classes, interfaces, structure and enum). All the public types are not included in the report. In order to get a more coverage of the code, we need to include the public types as well. This can be achieved by changing the dead type CQLinq (Code Query Linq) query by commenting out !t.IsPublic.
  2. The report itself is not totally accurate, especially when dealing with enum types, types that contains only constants, types referenced only in attribute, types only referenced in “is” statement, etc..
  3. The report is based on the scan of the dlls, not the source code level. Some information, like number of lines of source code, is not the real one.

The .NET Compiler Platform SDK (Roslyn APIs) can load the solution in VisualStudio and perform the analysis over the entire solution. In this article, we will demonstrate how to use Roslyn APIs to find and remove the dead code in a given solution.

Note, you can download the whole solution discussed in this article from GitHub repository: https://github.com/jason-ge/DeadCodeRemover.

Install Roslyn packages

First we need to create a .NET Framework Console application and install following packages:

  • Microsoft.Build.Framework Version 17.0.0
  • Microsoft.Build.Locator Version 1.4.1
  • Microsoft.CodeAnalysis Version 4.0.1
  • Microsoft.CodeAnalysis.Workspaces.MSBuild Version 4.0.1

Load solution into workspace

using (var workspace = MSBuildWorkspace.Create())
{
var solution = await workspace.OpenSolutionAsync(
@"your solution file path",
new ConsoleProgressReporter());
......
}

Once you have the Solution instance, you can loop through all the projects inside the solution using solution object’s Projects property.

Build Types in Project

First we need to extract all the type declarations from the project. We use following class to store type information we extract from the project.

internal class TypeInfo
{
public string FullName
{
get { return $"{Symbol.ContainingNamespace}.{Symbol.Name}"; }
}
public Project ContainingProject { get; set; }
public Document ContainingDocument { get; set; }
public INamedTypeSymbol Symbol { get; set; }
public SyntaxNode Node { get; set; }
public int NumberOfLines { get; set; }
public IEnumerable<ISymbol> TypesUsingMe { get; set; }
public bool? IsDead { get; set; }
public string RemovalAction { get; set; }
public int Depth { get; set; }
}

We iterate each document (source code file) in the project and store information into a collection of above TypeInfo objects.

foreach (var doc in project.Documents)
{
var model = await doc.GetSemanticModelAsync();
var declarations = (await doc.GetSyntaxRootAsync()).DescendantNodes().Where(n => n is TypeDeclarationSyntax || n is EnumDeclarationSyntax);
foreach (var declaration in declarations)
{
var lineSpan = declaration.GetLocation().GetMappedLineSpan();
int lines = lineSpan.EndLinePosition.Line - lineSpan.StartLinePosition.Line + 1;
var typeSymbol = (INamedTypeSymbol)model.GetDeclaredSymbol(declaration); IEnumerable<ISymbol> typesUsingMe = await GetTypesUsingMe(solution, project, typeSymbol); var typeInfo = new TypeInfo()
{
Symbol = typeSymbol,
Node = declaration,
TypesUsingMe = typesUsingMe,
ContainingDocument = doc,
ContainingProject = project,
NumberOfLines = lines,
};
types.Add(typeInfo);
}
}

# Get Document’s SemanticModel

var model = await doc.GetSemanticModelAsync();

With the SemanticModel, we can get the ISymbol for the type declaration and thus find out the references to the type.

# Get Declared Symbol

var typeSymbol = (INamedTypeSymbol)model.GetDeclaredSymbol(declaration);

GetDeclaredSymbol() method in SemanticModel class returns ISymbol object. Once we get the ISymbol instance, we can loop through all the references to this symbol and determine if it is dead or not. Also, since the declaration syntax node we had is either class, interface, struct or enum, we can cast the result to INamedTypeSymbol object. With INamedTypeSymbol object, we can check its base type, what interface it has implemented, etc..

# Get SyntaxTree

await doc.GetSyntaxRootAsync()

# Filter out only class, interface, struct and enum SyntaxNode

var syntaxNodes = (await doc.GetSyntaxRootAsync()).DescendantNodes().Where(n => n is TypeDeclarationSyntax || n is EnumDeclarationSyntax);

# Get number of lines in source code of the type

var lineSpan = syntaxNode.GetLocation().GetMappedLineSpan();
int lines = lineSpan.EndLinePosition.Line - lineSpan.StartLinePosition.Line + 1;

# Find References

We use SymbolFinder.FindReferencesAsync() method to get all the references to the symbol. SymbolFinder.FindReferencesAsync() method returns a collection of ReferencedSymbol objects. The ReferencedSymbol class has two properties:

  1. Definition: the ISymbol definition of this reference.
  2. Locations: the list of locations the above definition are referenced. It could be empty which means the definition above is not used.

Normally, each class would have at least two records in “Find References” result: the class definition itself and its default constructor. The locations property of each result tell us where the definition are used. Again, if the Locations property is empty, the definition is not used.

# Get Types using Me

The most difficult part is to get the all the external references to a given ISymbol. Even though you can use SymbolFinder.FindReferencesAsync() method to find all the references to the type, these references may be self references. For example, if you have a class definition like following:

1  public sealed class ClassWithSelfReferences
2 {
3 private readonly string _name;
4 public ClassWithSelfReferences() : this("DefaultClass"){}
5 public ClassWithSelfReferences(string name)
6 {
7 _name = name;
8 }
9 public ClassWithSelfReferences GetNewInstance()
10 {
11 return new ClassWithSelfReferences("MyClass");
12 }
13 }

The result from SymbolFinder.FindReferencesAsync() method would have 3 references:

3 self references

The ClassWithSelfReferences type is referenced in one location:

  • GetNewInstance() method return type in line number 9.

The .ctor means constructor reference. The first .ctor record is for the default constructor. The “0 refs” means it is not used anywhere. The second .ctor record is for the constructor with a string parameter. It is referenced in two places:

  • this() constructor in line number 4.
  • The new statement inside GetNewInstance() mehtod in line number 11.

All the references in the above results are self references. If no other external references to this class, this class should be classified as “dead”. We need a way to filter out these self references in order to get the correct reference count.

For each reference, we need to check each location in the Locations property. We can get the SyntaxNode object in that location using following code:

var node = loc.Location.SourceTree?.GetRoot()?.FindNode(loc.Location.SourceSpan);

After that, we need to traverse up the syntax node’s parent until we reached to a type declaration (class, interface or struct).

while (node != null && 
!node.IsKind(SyntaxKind.ClassDeclaration) &&
!node.IsKind(SyntaxKind.InterfaceDeclaration) &&
!node.IsKind(SyntaxKind.StructDeclaration))
{
node = node.Parent;
}

Once we reached the SyntaxNode declaration of one of the 3 types, we can get the ISymbol information from the SyntaxNode and then compare it with original ISymbol instance using SymbolEqualityComparer. If they are the same, the reference is self reference, otherwise, it is a real reference.

var compilation = await loc.Document.Project.GetCompilationAsync();
ISymbol refSymbol = compilation.GetSemanticModel(loc.Location.SourceTree).GetDeclaredSymbol(node);
if (refSymbol != null &&
!SymbolEqualityComparer.Default.Equals(nodeSymbol, typeSymbol))
{
// Found a reference
typesUsingMe.Add(refSymbol);
}

# Dealing with Extension Methods

The static class with extension method(s) inside may have 0 references to the class itself, but the extension method(s) inside the class may be referenced. Therefore, for static class with extension method(s), we need to check the references on each extension method. The references count to the class would be the sum of all the reference count to the extension method(s).

if (type.IsStatic)
{
var extMethods = ((INamedTypeSymbol)type).GetMembers()
.Where(s => s.Kind == SymbolKind.Method)
.Where(m => ((IMethodSymbol)m).IsExtensionMethod);
if (extMethods.Any())
{
List<ISymbol> typesUsingMe = new List<ISymbol>();
foreach (var extMethod in extMethods)
{
// Check the reference count of the extMethod
typesUsingMe.AddRange(await FilterSelfReferences(solution, project, type));
}
return typesUsingMe;
}
}

# Known Types Repository

Now we have built out the all the types in the project. Some types have references (TypesUsingMe property is not empty) and some not. However, we cannot determine if the type is dead or not solely based on this property because:

  1. Some public types may be used by different application. For example, the class library that provide common functions to other applications.
  2. Some types may be created by reflection, therefore, there are no direct references.
  3. Some types may be used directly by .NET framework. For example, custom configuration section class that inherits from ConfigurationSection class.

Therefore, we need to run each type through a custom “Know Types Repository” (implement interface IKnownTypesRepository). The interface has following methods:

internal interface IKnownTypesRepository
{
bool IsKnownType(INamedTypeSymbol type);
void LoadKnownTypes(string path);
}

It is up to you to provide an implementation for this interface and filter out all the types that you know are not dead but would have no references for the above mentioned reasons.

If the type has 0 references and it is not one of the known types, it is dead.

# Nested Dead Types

Even though the type has references, it may still be dead type if all its references are dead type. Therefore, we have to introduce a concept of death “depth”. Depth 0 means the dead type has no reference. Depth 1 means the dead type is referenced by dead types with depth 0; Depth 2 means the dead type is referenced by dead types with depth 1; and so on.

# Find Dead Types

In order to find dead types, we have to use recursive function. First we will divide the types list into two groups:

  1. The types that have not be checked yet (IsDead property is null)
  2. The dead types at current depth -1

For each type in group 1, if all the references are inside the dead types group, it would be dead type.

private void FindDeadTypes(IEnumerable<TypeInfo> types, 
KnownTypesRepository knownTypes, int depth)
{
IEnumerable<TypeInfo> unknownTypes = types.Where(t => !t.IsDead.HasValue);
IEnumerable<TypeInfo> deadTypes = types.Where(t => t.IsDead == true && t.Depth == depth — 1);
if (unknownTypes.Count() == 0)
{
return;
}
bool found = false;
foreach (var type in unknownTypes)
{
if (type.TypesUsingMe.All(t => deadTypes.Select(dt => dt.Symbol).Contains(t, SymbolEqualityComparer.Default)))
{
type.IsDead = !knownTypes.IsKnowType(type.Symbol);
type.Depth = depth;
found = true;
}
}
if (!found)
{
return;
}
FindDeadTypes(types, knownTypes, depth+1);
}

Remove Dead Types

Once we have built all the types and determined which types are dead, our next step would be to remove them from the source code.

In what orders we remove these dead types matters. If we remove them in random order, it may cause error. We have to follow these rules to remove the dead types:

  1. First, we have to remove types by document. Only when all the types in one document are removed, we will proceed to next document
  2. Inside each document, we remove the dead types with depth 0 first, then depth 1, and so on.
foreach (var document in types.Select(t => t.ContainingDocument).Distinct())
{
await RemoveTypesFromDocument(document, types);
}

# RemoveTypesFromDocument()

To remove type declaration in a document, we use DocumentEditor object. The DocumentEditor.RemoveNode(SyntaxNode node) method takes SyntaxNode object as parameter and remove the SyntaxNode from the document’s SyntaxTree.

private async Task<int> RemoveTypesFromDocument(Document document, IEnumerable<TypeInfo> types)
{
int count = 0;
DocumentEditor docEditor = await DocumentEditor.CreateAsync(document);
foreach (var typeInfo in types.Where(t => t.ContainingDocument == document).OrderBy(t => t.Depth))
{
foreach (var syntaxRef in typeInfo.Symbol.DeclaringSyntaxReferences)
{
var syntaxNode = await syntaxRef.GetSyntaxAsync();
docEditor.RemoveNode(syntaxNode);
}
count++;
}
return count;
}

# Save Changes to File

Roslyn’s entire workspace & syntax APIs are immutable. To persist the changes to source code file, you need to use DocumentEditor object.

The DocumentEditor.RemoveNode(SyntaxNode node) method only removes the type from the memory. It does not persist the changes to source code file.

We use docEditor.GetChangedDocument() to get the changed document object, then we check if the changed document contains any type declarations (class, interface, struct or enum). If not, the document is empty (For .cs file, it means it has only the using statements and namespace left) and we can safely remove the file by calling project.RemoveDocument(document.Id). Otherwise, we get the changed document content and use StreamWriter to write to file.

private async Task<Project> SaveChanges(Project project, 
Document document,
DocumentEditor docEditor)
{
var newDoc = docEditor.GetChangedDocument();
var declarations = newDoc
.GetSyntaxRootAsync()
.Result
.DescendantNodes();
if (!declarations.Any(d =>
d.Kind() == SyntaxKind.ClassDeclaration ||
d.Kind() == SyntaxKind.StructDeclaration ||
d.Kind() == SyntaxKind.InterfaceDeclaration ||
d.Kind() == SyntaxKind.EnumDeclaration)) {
return project.RemoveDocument(document.Id);
}
else
{
var newContent = (await newDoc.GetSyntaxTreeAsync())
.GetCompilationUnitRoot()
.NormalizeWhitespace()
.GetText()
.ToString();
using (var fs = new StreamWriter(newDoc.FilePath))
{
fs.Write(newContent);
}
return null;
}
}

# Save Project Changes

The call project.RemoveDocument(document.Id) returns a new project object. The original one stays the same (Immutable). In order to save the new project, you need to use the Workspace API:

var newProject = await SaveChanges(project, document, docEditor);
if (newProject != null)
{
workspace.TryApplyChanges(newProject.Solution);
}

However, the call workspace.TryApplyChanges(newProject.Solution) would fail if the project is a SDK style project. I am not able to find out the reason yet. If anyone knows how to fix it, please let me know.

As a workaround, I have to manually remove the document and modify the project file if needed. Some document may have framework generated designer file associated, some by have both designer file and resource file. We have to delete these associated files as well and remove the relationship information (Mainly for resource file) from the project file.

Again, you can download the whole solution from GitHub repository: https://github.com/jason-ge/DeadCodeRemover.

The solution contains 3 projects:

  1. DeadCodeRemover: this is the main project and contains code for detect and remove dead code.
  2. UnitTestDeadCodeRemover: This is a unit test project to test certain function of the DeadCodeRemover project.
  3. ClassLibrary4Roslyn: This is a sample solution used by DeadCodeRemover and UnitTestDeadCodeRemover projects. The UnitTestDeadCodeRemover project run unit tests agains the types in this project. You can run DeadCodeRemover console application agains ClassLibrary4Roslyn project or you can test with your own solution.

The DeadCodeRemover is a console application and has two command line parameters:

  1. -s <Solution file path>: This parameter specifies where the solution file is. This is mandatory parameter
  2. -p <Project file path>: This parameter specifies where the project file is. This is optional parameter. If not provided, the application will perform the dead code analysis and removal on all the projects in the solution.

Just a note, I have received an email from NDepend lead developer claiming the issues I mentioned in this article have been fixed in their latest or coming releases. If you have the latest NDepend, you can give it a try.

Happy coding!

--

--

Jason Ge
Jason Ge

Written by Jason Ge

Software developer with over 20 years experience. Recently focus on Vue/Angular and asp.net core.

No responses yet