Proposal: Binary Store Building Block by WhitWaldo · Pull Request #88 · dapr/proposals

WhitWaldo · 2025-08-23T23:15:11Z

Increasingly, while writing applications that use Dapr, I keep running into the need to persist data that's too large to reasonably store using Dapr often because it's too large and will exhaust the memory resources of the sidecar, though frequently because it's likely too large to store in a key/value store.

It doesn't make a ton of sense to rely exclusively on bindings for this when that really just provides a Dapr-hosted alternative to the provider's SDK for something that we should increasingly have broad provider support for. Object and blob stores are really overloaded terms representing all manner of things depending on provider for which I think there's a fine opportunity to tackle in the future - this proposal isn't that.

Here, I propose an API devoid of List and even Metadata operations so it can accommodate the broadest of possible storage providers and instead suggest that we increasingly lean on the SDKs to provide the state management instead of putting all that weight on the runtime and the components. It's a slim implementation that should be pretty easily added, but which would provide immediate benefits for popular Dapr features: Workflows and the new Agentic operations come to mind, but it would be beneficial for Actor and Cryptographic operations as well.

I look forward to your feedback!

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

…a few details, removed an extraneous bullet and generally cleaned it up some Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

olitomlinson · 2025-09-01T21:31:48Z

I'm massively in support, but how does this differ from the Object Store proposal? (Other than no support for metadata, anything else?)

WhitWaldo · 2025-09-02T09:46:40Z

I'm massively in support, but how does this differ from the Object Store proposal? (Other than no support for metadata, anything else?)

There are a few differences:

This proposal does not anticipate ever supporting a list operation so as to be more readily and broadly supported by those providers without such capability. Most specific object and blob stores that come to mind do offer such a feature. Leaving this feature as a possible differentiator for a future object/blob store API, although this would limit it to a smaller set of matching providers, is a fine trade-off to simply do without altogether here. This is looking to be little more than a provider to store large files in a way the current state store cannot and without all the other current state management add-ons.
This does not purport to offer those behaviors that might be more specific to object and blob stores to perform operations on data through signed URLs. Again, that might be a fine feature to use in a future state store that's more narrowly tailed to that sort of operation. This isn't that.
As you indicated, object and blob stores often persist and maintain a lot of metadata. In my experience, blob stores mostly just store it, but object stores will often act on it (e.g. checksum validation). No need to deal with any of that here, including several of the points brought up in your linked discussion (e.g. Content-Length, Content-Hash, ETag, and other metadata being used for other extraneous purposes).
We talked about my goal here to avoid having the SDKs deal with serialization here. An object or blob store often handles unstructured data in some format or another and I think we should absolutely create more specialized data stores that support operations more suited to one type or another (certainly could be useful from an agentic tooling and pluggable component perspective), but here, in the name of simplification and starting with a low threshold, I would like to put the responsibility on the developer for ensuring that their data can be serialized and encoded and have the API exclusively persist, retrieve and delete that data with no room for any other possibilities.
Object and blob store often support hierarchical or operational permissions structures such as append-only writes, write-only permissions (e.g. no deletion via API), etc. That's also intentionally excluded from consideration here.

Put more simply - those other stores anticipate the developer wanting to do both simple and far more advanced operations with their data. I'd certainly like to build more specialized data stores to accommodate such requirements, but this proposal seeks to do away with any complexities and do one thing really well: manage the reading, writing and deletion of large files in a resource-limiting and highly performant manner which is not possible in today's Dapr state management.

olitomlinson · 2026-02-13T20:44:38Z

Adding for how we might use this in Workflows for storing large activity inputs / outputs

lindner · 2026-02-16T21:44:09Z

Seems like the existing s3 and other existing bindings could be mapped. Have you tried a PoC?

WhitWaldo · 2026-02-17T02:24:17Z

Seems like the existing s3 and other existing bindings could be mapped. Have you tried a PoC?

@lindner The first step is proposing the shape of the block (as I've done here) and soliciting public feedback on the API shape and try to discern if anything else seems necessary within the described purpose of the API.

Out of the box, I'd certainly like to target support for Azure Blob Storage and provide an S3-compatible component (as this would facilitate connectivity with S3 itself, but also the many providers that offer S3-compatible APIs).

Next steps are getting tentative maintainer sign-off (no point building a POC if it's not going to be accepted) and then starting development of it - as I indicated in Discord, I intend to build this out as part of the next Dapr release (1.18).

olitomlinson · 2026-04-08T23:54:27Z

In the context of its usage for storing large activity inputs and outputs in Workflows, I would strongly recommend that this design allows a workflow author to programmatically choose the path/directory to the binary file.

This is to support multi-tenant use-cases where each tenants data MUST be stored in different locations.

/store/tenant-a/

/store/tenant-b/

/store/tenant-c/

Having this location set at the time of scheduling the workflow (not registering the workflow) gives a good level of flexibility.

builder.Services.AddDaprWorkflow(options =>
    {
         options.RegisterWorkflow<MyWorkflow>( BinaryStoreName = "my-binary-store");
    }
    
    ...
    var tenantId = "tenant-a";
    var workflowId = "2c0882d7";
    
    await workflowClient.ScheduleNewWorkflowAsync(
            name: nameof(MyWorkflow),
            instanceId: workflowId,
            input: orderInfo,
            InputOutputBinaryStorePath: $"/store/{tenantId}/wf/{workflowId}"
            );

In the example above, assuming we're using an S3 Binary Store, the Activity input / output blobs would be stored in the following location

/store/tenant-a/wf/2c0882d7/activity/{activity-id}/output/
/store/tenant-a/wf/2c0882d7/activity/{activity-id}/input/

There is an assumption that workflows have an implicit Activity Id which uniquely identifies each activity call. We use that Activity Id, in the path above.

Building on the above example, the Reference to the blob becomes {app-id}||{binary-store-name}||{location}||{file-id}

myApp||my-binary-store||/store/tenant-a/wf/2c0882d7/activity/123/input/xyz

The Reference is what is encoded in the Workflow History, rather than the blob contents.

The SDK can then dereference the data whenever the user demands it throughout the workflow. It may even be the case that the data is never dereferenced, until end of the Workflow when someone requests the output of the completed workflow, which maybe one (or more) large blobs!

WhitWaldo · 2026-04-10T21:28:52Z

In the context of its usage for storing large activity inputs and outputs in Workflows, I would strongly recommend that this design allows a workflow author to programmatically choose the path/directory to the binary file.

This is to support multi-tenant use-cases where each tenants data MUST be stored in different locations.

/store/tenant-a/

/store/tenant-b/

/store/tenant-c/

Having this location set at the time of scheduling the workflow (not registering the workflow) gives a good level of flexibility.
builder.Services.AddDaprWorkflow(options =>
    {
         options.RegisterWorkflow<MyWorkflow>( BinaryStoreName = "my-binary-store");
    }
    
    ...
    var tenantId = "tenant-a";
    var workflowId = "2c0882d7";
    
    await workflowClient.ScheduleNewWorkflowAsync(
            name: nameof(MyWorkflow),
            instanceId: workflowId,
            input: orderInfo,
            InputOutputBinaryStorePath: $"/store/{tenantId}/wf/{workflowId}"
            );
In the example above, assuming we're using an S3 Binary Store, the Activity input / output blobs would be stored in the following location

/store/tenant-a/wf/2c0882d7/activity/{activity-id}/output/ /store/tenant-a/wf/2c0882d7/activity/{activity-id}/input/

There is an assumption that workflows have an implicit Activity Id which uniquely identifies each activity call. We use that Activity Id, in the path above.

Building on the above example, the Reference to the blob becomes {app-id}||{binary-store-name}||{location}||{file-id}

myApp||my-binary-store||/store/tenant-a/wf/2c0882d7/activity/123/input/xyz

The Reference is what is encoded in the Workflow History, rather than the blob contents.

The SDK can then dereference the data whenever the user demands it throughout the workflow. It may even be the case that the data is never dereferenced, until end of the Workflow when someone requests the output of the completed workflow, which maybe one (or more) large blobs!

Might this instead be done more like how actors currently stores state in KVs? Set a path on the component at registration time that's used as the root and defer to the workflow to pick an appropriate path to save the reference to relative to the registration path? Presumably the runtime would pick a path referencing the workflow ID and any namespace values itself and then the user needn't figure out how to specify their own paths?

WhitWaldo added 2 commits August 23, 2025 18:04

Initial file store proposal

bb19f99

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Spelling, grammar and formatting check

98d5253

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

WhitWaldo self-assigned this Aug 23, 2025

WhitWaldo added the enhancement New feature or request label Aug 23, 2025

WhitWaldo added 4 commits August 23, 2025 18:18

Added reference to data reference issue in .NET SDK

f71bc2a

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Added another source for inspiration

061e410

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Fixed checkbox formatting at bottom

76986f3

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Added alternative name 'BinaryStore' to thoughts at the bottom

04c6712

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

WhitWaldo mentioned this pull request Jun 25, 2025

[Proposal] Specialty Store Building Blocks (Key/Value, Document, Relational, Queue, etc.) dapr/dapr#7339

Open

WhitWaldo added 2 commits August 23, 2025 19:12

Added link back to my specialized store issue in dapr/dapr

73fdd2a

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Updated to named this the BinaryStore instead of FileStore, reworded …

06a30ba

…a few details, removed an extraneous bullet and generally cleaned it up some Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

WhitWaldo changed the title ~~Proposal: File Store Building Block~~ Proposal: Binary Store Building Block Aug 24, 2025

Fixed quote marks

f4f57cc

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

mikeee mentioned this pull request Sep 30, 2025

v1.17 Release Planning dapr/dapr#9096

Closed

WhitWaldo mentioned this pull request Sep 30, 2025

Workflows: Add first class support for input/output data references dapr/dotnet-sdk#1533

Open

WhitWaldo mentioned this pull request Jan 6, 2026

Workflows: First class support for input/output db referencing dapr/dapr#8706

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Binary Store Building Block#88

Proposal: Binary Store Building Block#88
WhitWaldo wants to merge 9 commits into
dapr:mainfrom
WhitWaldo:filestore

WhitWaldo commented Aug 23, 2025

Uh oh!

olitomlinson commented Sep 1, 2025 •

edited

Loading

Uh oh!

WhitWaldo commented Sep 2, 2025

Uh oh!

olitomlinson commented Feb 13, 2026

Uh oh!

lindner commented Feb 16, 2026

Uh oh!

WhitWaldo commented Feb 17, 2026

Uh oh!

olitomlinson commented Apr 8, 2026 •

edited

Loading

Uh oh!

WhitWaldo commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WhitWaldo commented Aug 23, 2025

Uh oh!

olitomlinson commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WhitWaldo commented Sep 2, 2025

Uh oh!

olitomlinson commented Feb 13, 2026

Uh oh!

lindner commented Feb 16, 2026

Uh oh!

WhitWaldo commented Feb 17, 2026

Uh oh!

olitomlinson commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WhitWaldo commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

olitomlinson commented Sep 1, 2025 •

edited

Loading

olitomlinson commented Apr 8, 2026 •

edited

Loading