NIFI-15682: Add Bulk Replay feature for provenance events#11016
NIFI-15682: Add Bulk Replay feature for provenance events#11016Scrooge-McDucks wants to merge 1 commit intoapache:mainfrom
Conversation
bc3369a to
b625cda
Compare
b625cda to
ca51668
Compare
|
Checking if anyone has had a chance to review this PR |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for proposing this new feature @Scrooge-McDucks.
The concept sounds useful, but the level of complexity means this will need some significant review. As a result, it may not receive the attention needed to be ready for inclusion. It may be better close the issue for now, seek collaboration on the Jira issue, and open it again when there is sufficient commitment to review from the multiple reviewers needed to bring this to completion.
As an example area for consideration, introducing multiple new application properties raises some immediate concerns about the level of complexity and configurability needed. The project has made significant effort in recent years to reduce the number of configuration options, so adding anything new requires some planning. It may be the case that some property values should simply be hard-coded as they might never need to change.
At another level, the complexity around the asynchronous request and replay execution is understandable, but also requires careful inspection. These are the types of features that can benefit from one or more system integration tests.
Another potential way forward is to reduce the scope of the pull request, separating framework and frontend changes. That is not a guarantee for review, but it could be an option.
Summary
This PR adds Bulk Replay support for provenance events.
NiFi already allows replaying individual provenance events, but recovering from a broader issue can still be slow and repetitive. When a processor is misconfigured, a downstream dependency fails, or a large set of FlowFiles is processed incorrectly, users may need to replay many related events rather than one at a time. This change introduces a bulk replay workflow so users can search for provenance events for a processor, select the events they want, and submit them as a server-side replay job.
This is also my first frontend feature in NiFi, so I would especially welcome feedback on the UI approach and overall user experience.
Motivation
Replay is an important recovery mechanism in NiFi, but the current experience is centered on single-event replay. That works well for targeted recovery, but it becomes inefficient when operators need to recover from larger operational issues affecting many FlowFiles.
This feature is intended to make that recovery workflow more practical by:
What changed
User experience
Users can right-click a processor and select Bulk Replay to open a replay search dialog scoped to that processor. From there they can:
Submitted jobs are visible from the Bulk Replay Status page, where users can:
Permissions
Bulk Replay is permission-bound by provenance access. A user must have permission to query and view provenance events for the target component before they can discover eligible events and submit them for replay. This keeps bulk replay aligned with NiFi’s existing provenance security model.
Cluster behavior
Bulk replay jobs execute on the primary node.
If a replay item depends on content located on a disconnected node, the worker waits up to the configured timeout for that node to reconnect before marking the item as failed. This gives the cluster a chance to recover before replay is abandoned for affected items.
If the primary node is lost during execution, the job can be resumed by the newly elected primary node.
Configuration
This PR adds configuration for bulk replay limits and behavior, including:
Design notes
This implementation keeps the replay workflow close to the processor and provenance experience users already know, while moving execution into a managed server-side job model. That provides better visibility, better operational control, and a more scalable workflow for replaying many events.
Bulk Replay jobs are currently stored in memory. This keeps the initial implementation simpler, but it also means job state and job history are lost after a full restart. A future enhancement could add a persistent job store so replay jobs and history survive restart and provide stronger operational durability.
Testing
Testing performed includes:
Feedback welcome
This is my first frontend feature in NiFi, so I would particularly welcome feedback on the UI flow, terminology, and any areas where the user experience could be improved.
Screenshots
Processor context menu
Bulk Replay search and selection dialog
Bulk Replay status page
Bulk Replay job details
Clear replay jobs dialog
Summary
NIFI-156820
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000VerifiedstatusPull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation