Enhance ghidra backend with existing project feature by saniyafatima07 · Pull Request #3087 · mandiant/capa

saniyafatima07 · 2026-05-25T12:53:24Z

This PR adds support for analyzing existing Ghidra projects directly using .gpr project input.

Users can now provide input in the format:

capa /path/to/project.gpr

For multi-program projects:

CAPA_GHIDRA_PROGRAM_PATH=/folder/program capa /path/to/project.gpr

Motivation & Context

Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:

increases analysis time
ignores previously analyzed projects and annotations
duplicates existing analysis work

This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.

Implementation Details

Added automatic .gpr detection to select the Ghidra backend when a Ghidra project file is provided as input.
Added recursive Ghidra project file enumeration using domain_file.getPathname() to discover programs within the project.
Added automatic program selection for single-program projects.
Added CAPA_GHIDRA_PROGRAM_PATH support for selecting the target program in multi-program projects.
Added informative error handling that lists available project program paths when disambiguation is required.
Updated Ghidra loader flow to:
- open existing projects using create=False
- reuse already analyzed programs via consume_program
- skip temporary project creation/import flow for .gpr input
Default behavior remains unchanged for non-.gpr inputs.

Tests

Added tests for:

automatic Ghidra backend selection for .gpr input
skipping generic file extractor probing for Ghidra project input

Closes #3004

Checklist

CHANGELOG updated
Added few tests
Documentation updated
This submission includes AI-generated code and I have provided details in the description.

Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:

refining implementation approach
improving edge case handling

All code was reviewed, modified and tested manually before submission.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.

CHANGELOG updated or no update needed, thanks! 😄

saniyafatima07 · 2026-05-25T13:42:29Z

@mike-hunhoff
I have tried implementing this feature with the new approach as per the discussion in #3066 .
Could you please review it?
Thank you for your time!

mike-hunhoff

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

mike-hunhoff · 2026-05-28T15:33:23Z

Great work splitting up the code into helper functions to keep things concise.

saniyafatima07 · 2026-05-28T18:26:38Z

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Thank you for the review Mike.
I will address all the comments.
Sure.

saniyafatima07 · 2026-06-01T14:35:41Z

@mike-hunhoff @larchchen @Maijin I have made all the requested changes.
Could you please review it?

mike-hunhoff

Thank you @saniyafatima07 ! I've left comments for your review.

saniyafatima07 · 2026-06-05T14:16:04Z

Thank you for the review @mike-hunhoff . I have addressed all the requested changes.

mike-hunhoff

Thanks @saniyafatima07 , I've left comments for your review!

saniyafatima07 · 2026-06-06T11:20:44Z

Done @mike-hunhoff .
Could you please review?

mike-hunhoff

Thanks @saniyafatima07 ! I've left additional comments for your review.

mike-hunhoff · 2026-06-22T20:39:47Z



-def get_extractor_filters_from_cli(args, input_format) -> FilterConfig:
+def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig:


Is this still needed?

Nope, I've removed it.

mike-hunhoff · 2026-06-22T20:41:56Z

+### Ghidra project support
+capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.
+


Let's remove this here, and add it to the documentation here.

mike-hunhoff · 2026-06-22T20:43:28Z

Let's remove these for now. We can add in a more robust test for this specific feature if it becomes a problem in the future.

mike-hunhoff · 2026-06-22T20:43:52Z

+
+
+def test_gpr_uses_ghidra_backend(tmp_path):
+    args = SimpleNamespace(input_file=tmp_path / "sample.gpr", backend=capa.main.BACKEND_AUTO)
+
+    assert capa.main.get_backend_from_cli(args, FORMAT_AUTO) == capa.main.BACKEND_GHIDRA
+    assert capa.main.get_sample_path_from_cli(args, capa.main.BACKEND_GHIDRA) is None
+
+
+def test_gpr_skips_generic_file_extractor_probe(tmp_path):
+    args = SimpleNamespace(input_file=tmp_path / "sample.gpr")
+
+    assert capa.main.get_file_extractors_from_cli(args, FORMAT_AUTO) == []


Let's remove these; see my earlier comment.

mike-hunhoff · 2026-06-22T20:53:15Z

+    if args.input_file.suffix.lower() == ".gpr":
+        return FORMAT_GHIDRA_PROJECT
+


Move this to

capa/capa/helpers.py

Line 223 in f850024

def get_format_from_extension(sample: Path) -> str:

mike-hunhoff · 2026-06-22T20:56:45Z

    if args.backend != BACKEND_AUTO:
        return args.backend

+    if args.input_file.suffix.lower() == ".gpr":


Suggested change

if args.input_file.suffix.lower() == ".gpr":

if input_format == FORMAT_GHIDRA_PROJECT:

mike-hunhoff · 2026-06-22T20:58:18Z

+    elif backend == BACKEND_GHIDRA:
+        return None


Can you explain why this is needed versus the other disassembly backends like IDA and Binary Ninja?

More specifically, I added this to address the .gpr files since those are not the actual binary samples. But checking BACKEND_GHIDRA was too broad, so I changed it to check the .gpr format and return None only for .gpr inputs.

mike-hunhoff · 2026-06-22T21:01:52Z

+    if args.input_file.suffix.lower() == ".gpr":
+        logger.debug("skipping generic file extractor probe for Ghidra project input")
+        return []
+


Is this needed? Doesn't capa.loader.get_file_extractors return [] if input_format is FORMAT_GHIDRA_PROJECT?

No, its not needed. I verified it again and it already returns [ ] . I will remove it

saniyafatima07 · 2026-06-23T20:25:14Z

Thank you for the review @mike-hunhoff . I have addressed all the requested changes.

github-actions Bot previously requested changes May 25, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Comment thread capa/ghidra/helpers.py

Comment thread capa/ghidra/helpers.py

Comment thread capa/loader.py

Comment thread capa/loader.py

Comment thread capa/main.py Outdated

saniyafatima07 marked this pull request as ready for review May 25, 2026 13:42

saniyafatima07 mentioned this pull request May 26, 2026

Add sample.gpr mandiant/capa-testfiles#312

Open

mike-hunhoff requested changes May 28, 2026

View reviewed changes

mike-hunhoff requested a review from a team May 28, 2026 16:27

saniyafatima07 mentioned this pull request May 29, 2026

Enhance ghidra backend with existing project feature #3066

Closed

4 tasks

mike-hunhoff requested changes Jun 4, 2026

View reviewed changes

Comment thread capa/main.py Outdated

Comment thread capa/main.py Outdated

mike-hunhoff requested changes Jun 5, 2026

View reviewed changes

Comment thread capa/loader.py

Comment thread capa/main.py Outdated

Comment thread capa/main.py Outdated

mike-hunhoff requested changes Jun 22, 2026

View reviewed changes

saniyafatima07 added 13 commits June 24, 2026 01:03

Enhance ghidra backend with existing project feature

6669204

Update changelog with new feature

b388cb1

Fix errors

1525906

Update documentation

957b854

Minor fix

45fbf0b

Fix: Changelog and doc

9338532

Fix: Formatting issues

c58bb2b

Handle cleanup during exception

9f7ee5a

Add additional logs for lock exception handling

58f211f

Fix: import assert_never error

c4b5ab5

Add exception and new return value

b8622c9

Improve cleanup, add ghidra to static format

ef70aa8

Address comments

4ba5dd7

saniyafatima07 force-pushed the ghidra-feature-new branch from 031ed53 to 4ba5dd7 Compare June 23, 2026 19:37

Update tests data submodule

5ee451b



		def get_extractor_filters_from_cli(args, input_format) -> FilterConfig:
		def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig:

		### Ghidra project support
		capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.

		if args.input_file.suffix.lower() == ".gpr":
		return FORMAT_GHIDRA_PROJECT

	if args.input_file.suffix.lower() == ".gpr":
	if input_format == FORMAT_GHIDRA_PROJECT:

Uh oh!

Conversation

saniyafatima07 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation & Context

Implementation Details

Tests

Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saniyafatima07 commented May 25, 2026

Uh oh!

mike-hunhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saniyafatima07 commented May 28, 2026

Uh oh!

saniyafatima07 commented Jun 1, 2026

Uh oh!

mike-hunhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

saniyafatima07 commented Jun 5, 2026

Uh oh!

mike-hunhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saniyafatima07 commented Jun 6, 2026

Uh oh!

mike-hunhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saniyafatima07 commented Jun 23, 2026

Uh oh!

saniyafatima07 commented May 25, 2026 •

edited

Loading