Optimize file enumeration to skip build directories during traversal#14
Conversation
…traversal Co-authored-by: HandyS11 <62420910+HandyS11@users.noreply.github.com>
Co-authored-by: HandyS11 <62420910+HandyS11@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes file enumeration in EntityFileDiscovery.cs by implementing manual directory recursion that filters out build and version control directories (bin, obj, .git, node_modules) before traversing them, rather than enumerating all files first and filtering afterward.
Changes:
- Replaced
RecurseSubdirectories = truewith manual recursion in both entity and base class file discovery methods - Introduced
MaxSearchDepthconstant to limit recursion depth in base class file searches - Added early exit optimizations in
SearchForBaseClassFilesRecursiveto stop searching once all files are found
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| HashSet<string> entityTypeNames, | ||
| string normalizedContextPath, | ||
| Dictionary<string, string> entityFiles) | ||
| { |
There was a problem hiding this comment.
The SearchDirectoryForEntitiesAsync method is now just a thin wrapper that directly calls SearchDirectoryRecursiveAsync without adding any additional logic. Consider either removing this wrapper and calling SearchDirectoryRecursiveAsync directly from DiscoverEntityFilesAsync (line 49), or add the depth parameters to SearchDirectoryRecursiveAsync and initialize them in this wrapper method to make it serve a clear purpose.
| { | |
| { | |
| // Basic validation to ensure we do not attempt to search with invalid inputs. | |
| if (string.IsNullOrWhiteSpace(searchDir) || | |
| entityTypeNames == null || | |
| entityFiles == null) | |
| { | |
| return; | |
| } | |
| // Avoid starting a recursive search on a directory that does not exist. | |
| if (!System.IO.Directory.Exists(searchDir)) | |
| { | |
| return; | |
| } |
| private static async Task SearchDirectoryRecursiveAsync( | ||
| string currentDir, | ||
| HashSet<string> entityTypeNames, | ||
| string normalizedContextPath, | ||
| Dictionary<string, string> entityFiles) |
There was a problem hiding this comment.
The SearchDirectoryRecursiveAsync method lacks depth tracking and maximum depth enforcement, unlike SearchForBaseClassFilesRecursive which has these safeguards. This creates an inconsistency and potential risk of unbounded recursion if the file system contains symbolic links or deeply nested directory structures. Consider adding currentDepth and maxDepth parameters to match the pattern used in SearchForBaseClassFilesRecursive, and add an early return check similar to line 461.
| // Recursively process subdirectories, skipping excluded directories | ||
| foreach (var subDir in Directory.EnumerateDirectories(currentDir, "*", options)) | ||
| { | ||
| var dirName = Path.GetFileName(subDir); | ||
| if (dirName is "bin" or "obj" or ".git" or "node_modules") | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| await ProcessSourceFileAsync(fullPath, entityTypeNames, entityFiles); | ||
| await SearchDirectoryRecursiveAsync(subDir, entityTypeNames, normalizedContextPath, entityFiles); | ||
| } |
There was a problem hiding this comment.
SearchDirectoryRecursiveAsync should check if all entities have been found and return early, similar to the optimization in SearchForBaseClassFilesRecursive (lines 484-487, 502-505). This would prevent unnecessary traversal once all entity files have been discovered. Consider adding a check after processing files and before recursing into subdirectories: if (entityTypeNames.All(entityFiles.ContainsKey)) return;
| public static class EntityFileDiscovery | ||
| { | ||
| /// <summary> | ||
| /// Maximum recursion depth when searching for base class files to prevent infinite recursion |
There was a problem hiding this comment.
The documentation for MaxSearchDepth indicates it's only used for "base class files", but this constant should also apply to entity file searching in SearchDirectoryRecursiveAsync to prevent unbounded recursion. Update the documentation to reflect that this constant applies to all recursive file search operations in this class.
| /// Maximum recursion depth when searching for base class files to prevent infinite recursion | |
| /// Maximum recursion depth for recursive file search operations in this class to prevent infinite recursion |
| private static async Task SearchDirectoryRecursiveAsync( | ||
| string currentDir, | ||
| HashSet<string> entityTypeNames, | ||
| string normalizedContextPath, | ||
| Dictionary<string, string> entityFiles) | ||
| { | ||
| try | ||
| { | ||
| // Process files in the current directory | ||
| var options = new EnumerationOptions | ||
| { | ||
| RecurseSubdirectories = true, IgnoreInaccessible = true, AttributesToSkip = FileAttributes.System | ||
| RecurseSubdirectories = false, | ||
| IgnoreInaccessible = true, | ||
| AttributesToSkip = FileAttributes.System | ||
| }; | ||
|
|
||
| foreach (var csFile in Directory.EnumerateFiles(searchDir, EfAnalysisConstants.FilePatterns.CSharpFiles, | ||
| options)) | ||
| foreach (var csFile in Directory.EnumerateFiles(currentDir, EfAnalysisConstants.FilePatterns.CSharpFiles, options)) | ||
| { | ||
| var fullPath = Path.GetFullPath(csFile); | ||
| if (fullPath.Equals(normalizedContextPath, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // Skip common non-source directories that can be large | ||
| var pathSegments = fullPath.Split(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar); | ||
| if (pathSegments.Any(s => s is "bin" or "obj" or ".git" or "node_modules")) | ||
| await ProcessSourceFileAsync(fullPath, entityTypeNames, entityFiles); | ||
| } | ||
|
|
||
| // Recursively process subdirectories, skipping excluded directories | ||
| foreach (var subDir in Directory.EnumerateDirectories(currentDir, "*", options)) | ||
| { | ||
| var dirName = Path.GetFileName(subDir); | ||
| if (dirName is "bin" or "obj" or ".git" or "node_modules") | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| await ProcessSourceFileAsync(fullPath, entityTypeNames, entityFiles); | ||
| await SearchDirectoryRecursiveAsync(subDir, entityTypeNames, normalizedContextPath, entityFiles); |
There was a problem hiding this comment.
The file WorkspaceTypeDiscovery.cs (lines 61-71) still uses the old pattern of RecurseSubdirectories = true with post-traversal filtering. For consistency and performance, consider applying the same optimization pattern used in this PR to that file as well.
| foreach (var subDir in Directory.EnumerateDirectories(currentDir, "*", options)) | ||
| { | ||
| var dirName = Path.GetFileName(subDir); | ||
| if (dirName is "bin" or "obj" or ".git" or "node_modules") |
There was a problem hiding this comment.
Directory name filtering uses pattern matching which is case-sensitive ("bin" or "obj" or ".git" or "node_modules"). On Windows, directory names are case-insensitive, so "Bin", "BIN", "Obj", "OBJ" would not be filtered out. Consider using case-insensitive comparison, similar to how path comparisons are handled elsewhere in the codebase (e.g., line 226 uses StringComparison.OrdinalIgnoreCase). Example: if (dirName.Equals("bin", StringComparison.OrdinalIgnoreCase) || dirName.Equals("obj", StringComparison.OrdinalIgnoreCase) || ...)
| if (dirName is "bin" or "obj" or ".git" or "node_modules") | |
| if (string.Equals(dirName, "bin", System.StringComparison.OrdinalIgnoreCase) | |
| || string.Equals(dirName, "obj", System.StringComparison.OrdinalIgnoreCase) | |
| || string.Equals(dirName, ".git", System.StringComparison.OrdinalIgnoreCase) | |
| || string.Equals(dirName, "node_modules", System.StringComparison.OrdinalIgnoreCase)) |
| foreach (var subDir in Directory.EnumerateDirectories(currentDir, "*", options)) | ||
| { | ||
| var dirName = Path.GetFileName(subDir); | ||
| if (dirName is "bin" or "obj" or ".git" or "node_modules") |
There was a problem hiding this comment.
Directory name filtering uses pattern matching which is case-sensitive ("bin" or "obj" or ".git" or "node_modules"). On Windows, directory names are case-insensitive, so "Bin", "BIN", "Obj", "OBJ" would not be filtered out. Consider using case-insensitive comparison, similar to how path comparisons are handled elsewhere in the codebase (e.g., line 226 uses StringComparison.OrdinalIgnoreCase). Example: if (dirName.Equals("bin", StringComparison.OrdinalIgnoreCase) || dirName.Equals("obj", StringComparison.OrdinalIgnoreCase) || ...)
| if (dirName is "bin" or "obj" or ".git" or "node_modules") | |
| if (dirName.Equals("bin", System.StringComparison.OrdinalIgnoreCase) | |
| || dirName.Equals("obj", System.StringComparison.OrdinalIgnoreCase) | |
| || dirName.Equals(".git", System.StringComparison.OrdinalIgnoreCase) | |
| || dirName.Equals("node_modules", System.StringComparison.OrdinalIgnoreCase)) |
| foreach (var csFile in Directory.EnumerateFiles(currentDir, EfAnalysisConstants.FilePatterns.CSharpFiles, options)) | ||
| { | ||
| var fullPath = Path.GetFullPath(csFile); | ||
| if (fullPath.Equals(normalizedContextPath, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // Skip common non-source directories that can be large | ||
| var pathSegments = fullPath.Split(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar); | ||
| if (pathSegments.Any(s => s is "bin" or "obj" or ".git" or "node_modules")) | ||
| await ProcessSourceFileAsync(fullPath, entityTypeNames, entityFiles); | ||
| } |
There was a problem hiding this comment.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
The current file enumeration in
EntityFileDiscovery.csenumerates all files including those inbin,obj,.git, andnode_modules, then filters them out post-traversal. This is inefficient for large projects.Changes
RecurseSubdirectories = truewith manual recursion that filters directories before descendingSearchDirectoryForEntitiesAsyncandSearchForBaseClassFilesMaxSearchDepthBefore
After
This prevents the file system from enumerating potentially thousands of files in build artifacts and dependency directories.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.