Skip to content

step.request_parse: support application/x-www-form-urlencoded body parsing#258

Merged
intel352 merged 4 commits intomainfrom
copilot/support-urlencoded-body-parsing
Mar 4, 2026
Merged

step.request_parse: support application/x-www-form-urlencoded body parsing#258
intel352 merged 4 commits intomainfrom
copilot/support-urlencoded-body-parsing

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

step.request_parse with parse_body: true only handled JSON bodies — form-encoded payloads (e.g. Twilio webhooks) were silently dropped and the body stream was consumed, breaking downstream steps like step.webhook_verify.

Changes

  • Content-Type dispatch: After reading body bytes, strip any ; parameters from Content-Type and branch on application/x-www-form-urlencoded vs. the existing JSON path
  • Form parsing: Use url.ParseQuery() to parse the body; single-value fields are exposed as string, multi-value fields as []string
  • Raw body cache: Raw bytes are now stored in pc.Metadata["_raw_body"] whenever the body is freshly read from req.Body, regardless of Content-Type — ensuring downstream steps (e.g. HMAC-SHA1 verification) can always access the original payload
  • Cache-first body read: If pc.Metadata["_raw_body"] is already populated by a prior step (e.g. step.webhook_verify), the cached bytes are used directly instead of reading the (possibly exhausted) req.Body

Example

Given a Twilio POST body Body=Hello&From=%2B15551234567&To=%2B15559876543:

- name: parse-request
  type: step.request_parse
  config:
    parse_body: true

Produces:

steps.parse-request.body.Body  = "Hello"
steps.parse-request.body.From  = "+15551234567"
steps.parse-request.body.To    = "+15559876543"
pc.Metadata["_raw_body"]       = []byte("Body=Hello&From=...")
Original prompt

This section details on the original issue you should resolve

<issue_title>step.request_parse: support application/x-www-form-urlencoded body parsing</issue_title>
<issue_description>## Current Behavior

step.request_parse with parse_body: true only supports JSON bodies. When the Content-Type is application/x-www-form-urlencoded, the step reads the body bytes via io.ReadAll() then calls json.Unmarshal(), which fails silently. As a result:

  • output["body"] is never populated — the form field data is lost
  • The body stream is consumed and unavailable for subsequent steps (e.g. step.webhook_verify)

This breaks webhook integrations that send form-encoded POST bodies, such as Twilio.

Requested Behavior

When parse_body: true is set and the Content-Type header is application/x-www-form-urlencoded, step.request_parse should:

  1. Parse the body using url.ParseQuery() instead of json.Unmarshal()
  2. Expose each form field in output["body"] as a string (first value) or []string (multiple values)
  3. Cache the raw body bytes in pc.Metadata["_raw_body"] so subsequent steps (e.g. step.webhook_verify) can still access the original payload

Example

Given a Twilio webhook POST with body:

Body=Hello&From=%2B15551234567&To=%2B15559876543&MessageSid=SM1234

After step.request_parse with parse_body: true:

steps.<name>.body.Body       = "Hello"
steps.<name>.body.From       = "+15551234567"
steps.<name>.body.To         = "+15559876543"
steps.<name>.body.MessageSid = "SM1234"

And pc.Metadata["_raw_body"] should contain the original raw bytes so step.webhook_verify (which needs to re-read the body for HMAC-SHA1 validation) can still function.

Workaround

A custom step.form_parse step that reads _raw_body from pipeline metadata and parses via url.ParseQuery().</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…rsing

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Copilot AI changed the title [WIP] Support application/x-www-form-urlencoded body parsing in request_parse step.request_parse: support application/x-www-form-urlencoded body parsing Mar 3, 2026
@intel352 intel352 marked this pull request as ready for review March 4, 2026 00:08
Copilot AI review requested due to automatic review settings March 4, 2026 00:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends step.request_parse to correctly parse application/x-www-form-urlencoded request bodies (common for webhook providers like Twilio) and adds tests for the new parsing behavior.

Changes:

  • Add Content-Type dispatch to parse application/x-www-form-urlencoded bodies via url.ParseQuery (with single vs multi-value handling).
  • Cache raw request body bytes into pc.Metadata["_raw_body"] (currently only on the form path).
  • Add unit tests covering form parsing, multi-value fields, and Content-Type parameters (e.g. charset).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
module/pipeline_step_request_parse.go Adds form-urlencoded parsing branch and attempts to cache raw body bytes.
module/pipeline_step_request_parse_test.go Adds coverage for form-urlencoded parsing and raw body caching for that path.

Comment on lines 158 to +162
req, _ := pc.Metadata["_http_request"].(*http.Request)
if req != nil && req.Body != nil {
bodyBytes, err := io.ReadAll(req.Body)
if err == nil && len(bodyBytes) > 0 {
var bodyData map[string]any
if json.Unmarshal(bodyBytes, &bodyData) == nil {
output["body"] = bodyData
ct := req.Header.Get("Content-Type")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestParseStep always reads from req.Body when parse_body is enabled, but it never checks for an existing pc.Metadata["_raw_body"]. If another step (e.g. step.webhook_verify) has already read and cached the body, req.Body will be at EOF and this step will fail to populate output["body"]. Prefer using the cached _raw_body when present, and only fall back to reading req.Body when it is not cached.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +183
if strings.EqualFold(ct, "application/x-www-form-urlencoded") {
pc.Metadata["_raw_body"] = bodyBytes
if formValues, parseErr := url.ParseQuery(string(bodyBytes)); parseErr == nil {
bodyData := make(map[string]any)
for k, v := range formValues {
if len(v) == 1 {
bodyData[k] = v[0]
} else {
bodyData[k] = v
}
}
output["body"] = bodyData
}
} else {
var bodyData map[string]any
if json.Unmarshal(bodyBytes, &bodyData) == nil {
output["body"] = bodyData
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw body is cached into pc.Metadata["_raw_body"] only for application/x-www-form-urlencoded. For JSON (and any other Content-Type), this step still consumes req.Body and leaves _raw_body unset, which can break downstream steps that rely on the raw body cache (including step.webhook_verify / gitlab_parse_webhook patterns). Cache bodyBytes into _raw_body whenever you read it (before Content-Type dispatch), regardless of parsing success.

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +238
// Raw body should be cached in metadata
rawBody, ok := pc.Metadata["_raw_body"].([]byte)
if !ok {
t.Fatal("expected _raw_body in metadata")
}
if string(rawBody) != `Body=Hello&From=%2B15551234567&To=%2B15559876543&MessageSid=SM1234` {
t.Errorf("unexpected _raw_body: %s", rawBody)
}
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no test covering the common “body was already read and cached” flow: when pc.Metadata["_raw_body"] is set (e.g. by step.webhook_verify) and req.Body is empty/consumed, request_parse should still parse from the cached bytes. Adding a test for that scenario would prevent regressions and validate the intended raw-body-cache interoperability.

Copilot generated this review using guidance from organization custom instructions.
@intel352
Copy link
Contributor

intel352 commented Mar 4, 2026

@copilot apply changes based on the comments in this thread

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 4, 2026

@copilot apply changes based on the comments in this thread

Applied all three review items in commit 18f1d93:

  1. Use cached _raw_body first: Before reading req.Body, the step now checks pc.Metadata["_raw_body"] and uses it if present — so a prior step consuming req.Body won't break parsing.
  2. Always cache raw bytes: _raw_body is now set whenever body bytes are freshly read from req.Body, regardless of Content-Type (JSON or form-urlencoded).
  3. New tests added: TestRequestParseStep_ParseBody_FormURLEncoded_CachedRawBody (parses from cached bytes when req.Body is empty) and TestRequestParseStep_ParseBody_JSON_CachesRawBody (verifies JSON path also populates _raw_body).

@intel352 intel352 merged commit a692591 into main Mar 4, 2026
14 checks passed
@intel352 intel352 deleted the copilot/support-urlencoded-body-parsing branch March 4, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

step.request_parse: support application/x-www-form-urlencoded body parsing

3 participants