Enhance ghidra backend with existing project feature#3087
Enhance ghidra backend with existing project feature#3087saniyafatima07 wants to merge 14 commits into
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Code Review
This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.
CHANGELOG updated or no update needed, thanks! 😄
|
@mike-hunhoff |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.
There was a problem hiding this comment.
Great work splitting up the code into helper functions to keep things concise.
Thank you for the review Mike. |
|
@mike-hunhoff @larchchen @Maijin I have made all the requested changes. |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Thank you @saniyafatima07 ! I've left comments for your review.
|
Thank you for the review @mike-hunhoff . I have addressed all the requested changes. |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Thanks @saniyafatima07 , I've left comments for your review!
|
Done @mike-hunhoff . |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Thanks @saniyafatima07 ! I've left additional comments for your review.
|
|
||
|
|
||
| def get_extractor_filters_from_cli(args, input_format) -> FilterConfig: | ||
| def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig: |
There was a problem hiding this comment.
Is this still needed?
There was a problem hiding this comment.
Nope, I've removed it.
| ### Ghidra project support | ||
| capa can analyze programs directly from Ghidra projects by specifying the project file path (`.gpr`). If the project contains multiple programs, set the `CAPA_GHIDRA_PROGRAM_PATH` environment variable to specify which program to analyze. For example: `CAPA_GHIDRA_PROGRAM_PATH=/myprogram capa /path/to/project.gpr`. | ||
|
|
There was a problem hiding this comment.
Let's remove this here, and add it to the documentation here.
There was a problem hiding this comment.
Let's remove these for now. We can add in a more robust test for this specific feature if it becomes a problem in the future.
|
|
||
|
|
||
| def test_gpr_uses_ghidra_backend(tmp_path): | ||
| args = SimpleNamespace(input_file=tmp_path / "sample.gpr", backend=capa.main.BACKEND_AUTO) | ||
|
|
||
| assert capa.main.get_backend_from_cli(args, FORMAT_AUTO) == capa.main.BACKEND_GHIDRA | ||
| assert capa.main.get_sample_path_from_cli(args, capa.main.BACKEND_GHIDRA) is None | ||
|
|
||
|
|
||
| def test_gpr_skips_generic_file_extractor_probe(tmp_path): | ||
| args = SimpleNamespace(input_file=tmp_path / "sample.gpr") | ||
|
|
||
| assert capa.main.get_file_extractors_from_cli(args, FORMAT_AUTO) == [] |
There was a problem hiding this comment.
Let's remove these; see my earlier comment.
| if args.input_file.suffix.lower() == ".gpr": | ||
| return FORMAT_GHIDRA_PROJECT | ||
|
|
| if args.backend != BACKEND_AUTO: | ||
| return args.backend | ||
|
|
||
| if args.input_file.suffix.lower() == ".gpr": |
There was a problem hiding this comment.
| if args.input_file.suffix.lower() == ".gpr": | |
| if input_format == FORMAT_GHIDRA_PROJECT: |
| elif backend == BACKEND_GHIDRA: | ||
| return None |
There was a problem hiding this comment.
Can you explain why this is needed versus the other disassembly backends like IDA and Binary Ninja?
There was a problem hiding this comment.
More specifically, I added this to address the .gpr files since those are not the actual binary samples. But checking BACKEND_GHIDRA was too broad, so I changed it to check the .gpr format and return None only for .gpr inputs.
| if args.input_file.suffix.lower() == ".gpr": | ||
| logger.debug("skipping generic file extractor probe for Ghidra project input") | ||
| return [] | ||
|
|
There was a problem hiding this comment.
Is this needed? Doesn't capa.loader.get_file_extractors return [] if input_format is FORMAT_GHIDRA_PROJECT?
There was a problem hiding this comment.
No, its not needed. I verified it again and it already returns [ ] . I will remove it
031ed53 to
4ba5dd7
Compare
|
Thank you for the review @mike-hunhoff . I have addressed all the requested changes. |
This PR adds support for analyzing existing Ghidra projects directly using
.gprproject input.Users can now provide input in the format:
For multi-program projects:
Motivation & Context
Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:
This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.
Implementation Details
.gprdetection to select the Ghidra backend when a Ghidra project file is provided as input.domain_file.getPathname()to discover programs within the project.CAPA_GHIDRA_PROGRAM_PATHsupport for selecting the target program in multi-program projects.create=Falseconsume_program.gprinput.gprinputs.Tests
Added tests for:
.gprinputCloses #3004
Checklist
Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:
All code was reviewed, modified and tested manually before submission.