Skip to content

Enhance ghidra backend with existing project feature#3087

Open
saniyafatima07 wants to merge 14 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new
Open

Enhance ghidra backend with existing project feature#3087
saniyafatima07 wants to merge 14 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new

Conversation

@saniyafatima07

@saniyafatima07 saniyafatima07 commented May 25, 2026

Copy link
Copy Markdown
Collaborator

This PR adds support for analyzing existing Ghidra projects directly using .gpr project input.

Users can now provide input in the format:

capa /path/to/project.gpr

For multi-program projects:

CAPA_GHIDRA_PROGRAM_PATH=/folder/program capa /path/to/project.gpr

Motivation & Context

Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:

  • increases analysis time
  • ignores previously analyzed projects and annotations
  • duplicates existing analysis work

This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.

Implementation Details

  • Added automatic .gpr detection to select the Ghidra backend when a Ghidra project file is provided as input.
  • Added recursive Ghidra project file enumeration using domain_file.getPathname() to discover programs within the project.
  • Added automatic program selection for single-program projects.
  • Added CAPA_GHIDRA_PROGRAM_PATH support for selecting the target program in multi-program projects.
  • Added informative error handling that lists available project program paths when disambiguation is required.
  • Updated Ghidra loader flow to:
    • open existing projects using create=False
    • reuse already analyzed programs via consume_program
    • skip temporary project creation/import flow for .gpr input
  • Default behavior remains unchanged for non-.gpr inputs.

Tests

Added tests for:

  • automatic Ghidra backend selection for .gpr input
  • skipping generic file extractor probing for Ghidra project input

Closes #3004

Checklist

  • CHANGELOG updated
  • Added few tests
  • Documentation updated
  • This submission includes AI-generated code and I have provided details in the description.

Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:

  • refining implementation approach
  • improving edge case handling

All code was reviewed, modified and tested manually before submission.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.

Comment thread capa/ghidra/helpers.py
Comment thread capa/ghidra/helpers.py
Comment thread capa/loader.py
Comment thread capa/loader.py
Comment thread capa/main.py Outdated
@github-actions github-actions Bot dismissed their stale review May 25, 2026 13:10

CHANGELOG updated or no update needed, thanks! 😄

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

@mike-hunhoff
I have tried implementing this feature with the new approach as per the discussion in #3066 .
Could you please review it?
Thank you for your time!

@saniyafatima07 saniyafatima07 marked this pull request as ready for review May 25, 2026 13:42

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Comment thread capa/ghidra/helpers.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work splitting up the code into helper functions to keep things concise.

Comment thread capa/loader.py
Comment thread capa/loader.py
Comment thread capa/main.py Outdated
Comment thread doc/usage.md Outdated
Comment thread CHANGELOG.md Outdated
Comment thread capa/ghidra/helpers.py Outdated
@mike-hunhoff mike-hunhoff requested a review from a team May 28, 2026 16:27
@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Thank you for the review Mike.
I will address all the comments.
Sure.

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

@mike-hunhoff @larchchen @Maijin I have made all the requested changes.
Could you please review it?

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saniyafatima07 ! I've left comments for your review.

Comment thread capa/main.py Outdated
Comment thread capa/main.py Outdated
@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Thank you for the review @mike-hunhoff . I have addressed all the requested changes.

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saniyafatima07 , I've left comments for your review!

Comment thread capa/loader.py
Comment thread capa/main.py Outdated
Comment thread capa/main.py Outdated
@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Done @mike-hunhoff .
Could you please review?

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saniyafatima07 ! I've left additional comments for your review.

Comment thread capa/main.py Outdated


def get_extractor_filters_from_cli(args, input_format) -> FilterConfig:
def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I've removed it.

Comment thread doc/usage.md Outdated
Comment on lines +46 to +48
### Ghidra project support
capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this here, and add it to the documentation here.

Comment thread tests/data

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these for now. We can add in a more robust test for this specific feature if it becomes a problem in the future.

Comment thread tests/test_main.py Outdated
Comment on lines +357 to +369


def test_gpr_uses_ghidra_backend(tmp_path):
args = SimpleNamespace(input_file=tmp_path / "sample.gpr", backend=capa.main.BACKEND_AUTO)

assert capa.main.get_backend_from_cli(args, FORMAT_AUTO) == capa.main.BACKEND_GHIDRA
assert capa.main.get_sample_path_from_cli(args, capa.main.BACKEND_GHIDRA) is None


def test_gpr_skips_generic_file_extractor_probe(tmp_path):
args = SimpleNamespace(input_file=tmp_path / "sample.gpr")

assert capa.main.get_file_extractors_from_cli(args, FORMAT_AUTO) == []

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these; see my earlier comment.

Comment thread capa/main.py Outdated
Comment on lines +562 to +564
if args.input_file.suffix.lower() == ".gpr":
return FORMAT_GHIDRA_PROJECT

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to

def get_format_from_extension(sample: Path) -> str:

Comment thread capa/main.py Outdated
if args.backend != BACKEND_AUTO:
return args.backend

if args.input_file.suffix.lower() == ".gpr":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if args.input_file.suffix.lower() == ".gpr":
if input_format == FORMAT_GHIDRA_PROJECT:

Comment thread capa/main.py Outdated
Comment on lines +631 to +632
elif backend == BACKEND_GHIDRA:
return None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is needed versus the other disassembly backends like IDA and Binary Ninja?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, I added this to address the .gpr files since those are not the actual binary samples. But checking BACKEND_GHIDRA was too broad, so I changed it to check the .gpr format and return None only for .gpr inputs.

Comment thread capa/main.py Outdated
Comment on lines +741 to +744
if args.input_file.suffix.lower() == ".gpr":
logger.debug("skipping generic file extractor probe for Ghidra project input")
return []

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Doesn't capa.loader.get_file_extractors return [] if input_format is FORMAT_GHIDRA_PROJECT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its not needed. I verified it again and it already returns [ ] . I will remove it

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Thank you for the review @mike-hunhoff . I have addressed all the requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ghidra: enable feature extraction from existing Ghidra project binary

2 participants