fix(fuzz): prevent fatal concurrent map-write crash in MultiPartForm when scanning multiple targets#7126
Conversation
…ent data race
MultiPartForm is the only stateful DataFormat implementation — it stores
a mutable `boundary` string and a `filesMetadata` map that are written
during every Decode/ParseBoundary call. Registering it as a singleton in
dataformat.init() and returning the same pointer from Get() caused a
fatal concurrent map-write crash (and -race DATA RACE) when multiple
goroutines scanned different multipart/form-data targets simultaneously.
The pattern:
dataformat.Get("multipart/form-data") ← all goroutines get same pointer
singleton.ParseBoundary(...) ← goroutine A writes m.boundary
singleton.Decode(...) ← goroutine B writes m.filesMetadata
→ fatal error: concurrent map writes
Fix: add a RegisterDataFormatFactory() alongside RegisterDataFormat().
Get() checks the factory map first; if a factory is registered it calls
it to create a fresh instance per invocation. MultiPartForm is
re-registered as a factory. All other DataFormats remain singletons
(they have no mutable state so sharing is safe).
Tests pass with -race flag:
ok github.com/projectdiscovery/nuclei/v3/pkg/fuzz/dataformat 1.655s
Fixes projectdiscovery#7028
Neo - PR Security ReviewNo security issues found Highlights
Hardening Notes
Comment |
WalkthroughA factory mechanism for data formats was added and MultiPartForm was migrated from a singleton to a per-call factory. Get() now returns a fresh instance when a factory is registered, avoiding shared state across concurrent callers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/fuzz/dataformat/dataformat.go`:
- Around line 32-36: The package registers the multipart/form-data format only
via RegisterDataFormatFactory (RegisterDataFormatFactory,
MultiPartFormDataFormat, NewMultiPartForm) which breaks package-level Encode()
because Encode() resolves encoders from the dataformats registry and will report
"dataformat multipart/form-data is not supported"; update registration so
multipart is available in both registries: call
RegisterDataFormat(MultiPartFormDataFormat, NewMultiPartForm()) or otherwise
ensure dataformats map gets a concrete DataFormat instance for
MultiPartFormDataFormat in addition to the factory registration (also apply the
same dual-registration fix to the other affected entries around the
RegisterDataFormatFactory calls referenced in the diff).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9139bb44-450e-4cfd-9417-2e509986ce98
📒 Files selected for processing (1)
pkg/fuzz/dataformat/dataformat.go
|
Addressed CodeRabbit feedback: ✅ Fixed Encode() to use Get() function for factory lookup Build verified successfully. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/fuzz/dataformat/dataformat.go`:
- Around line 69-73: RegisterDataFormatFactory currently only stores the
constructor in dataformatFactories so Get() can create instances, but Decode()
iterates only over dataformats and therefore never sees factory-only formats;
update RegisterDataFormatFactory to also ensure Decode() can auto-detect by
either instantiating a fresh DataFormat (via the provided factory) and
registering it into the package-level dataformats map (ensuring thread-safety)
or modify Decode() to consult dataformatFactories when iterating candidates;
refer to RegisterDataFormatFactory, dataformatFactories, Get(), Decode(), and
dataformats when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b9c7c3ff-de57-42a6-88ec-933bb9d2d8bc
📒 Files selected for processing (1)
pkg/fuzz/dataformat/dataformat.go
| // RegisterDataFormatFactory registers a constructor for a stateful dataformat. | ||
| // Each call to Get() will invoke the factory and return a fresh instance, | ||
| // preventing data races between concurrent goroutines. | ||
| func RegisterDataFormatFactory(name string, factory func() DataFormat) { | ||
| dataformatFactories[name] = factory |
There was a problem hiding this comment.
Factory-registered formats are still invisible to package-level Decode().
RegisterDataFormatFactory only affects Get(), but Decode() still walks dataformats only. That means any format registered exclusively through the new API can be encoded/resolved but will never participate in auto-detection/decoding. Either instantiate factory-backed formats there as well, or document that factory registrations are intentionally Get/Encode-only.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/dataformat/dataformat.go` around lines 69 - 73,
RegisterDataFormatFactory currently only stores the constructor in
dataformatFactories so Get() can create instances, but Decode() iterates only
over dataformats and therefore never sees factory-only formats; update
RegisterDataFormatFactory to also ensure Decode() can auto-detect by either
instantiating a fresh DataFormat (via the provided factory) and registering it
into the package-level dataformats map (ensuring thread-safety) or modify
Decode() to consult dataformatFactories when iterating candidates; refer to
RegisterDataFormatFactory, dataformatFactories, Get(), Decode(), and dataformats
when making the change.
Problem
Fixes #7028
Scanning multiple targets with a
multipart/form-datatemplate crashes with:or (with
-race):Root Cause
MultiPartFormis the only statefulDataFormat— it stores a mutableboundarystring and afilesMetadatamap that are written on everyDecode/ParseBoundarycall. It is registered as a singleton indataformat.init(), so every goroutine that callsdataformat.Get("multipart/form-data")receives the same pointer.When N > 1 targets are scanned concurrently, goroutines race to write the same fields:
Fix
Add a
RegisterDataFormatFactory(name, func() DataFormat)alongside the existingRegisterDataFormat().Get()checks the factory map first and calls the constructor to return a fresh instance per invocation.MultiPartForm is re-registered as a factory. All other DataFormats (JSON, XML, Raw, Form) remain singletons — they have no mutable state.
Verification
The SDK reproduction test from #7028 also passes without
-racedetecting any race.Summary by CodeRabbit