diff --git a/.cspell.json b/.cspell.json index 9c99675846..775e500c82 100644 --- a/.cspell.json +++ b/.cspell.json @@ -184,6 +184,8 @@ "paas", "Pantothenic", "parallelization", + "Partitioner", + "partitioner", "pbcopy", "pflag", "pgoutput", diff --git a/docs/products/fast_data_v2/best_practices/img/full-refresh-architecture.png b/docs/products/fast_data_v2/best_practices/img/full-refresh-architecture.png new file mode 100644 index 0000000000..ec503c5cbd Binary files /dev/null and b/docs/products/fast_data_v2/best_practices/img/full-refresh-architecture.png differ diff --git a/docs/products/fast_data_v2/best_practices/initial_load_full_refresh.md b/docs/products/fast_data_v2/best_practices/initial_load_full_refresh.md new file mode 100644 index 0000000000..9ca00b2d84 --- /dev/null +++ b/docs/products/fast_data_v2/best_practices/initial_load_full_refresh.md @@ -0,0 +1,75 @@ +--- +id: initial_load_full_refresh +title: Initial Load & Full Refresh Operations +sidebar_label: Initial Load & Full Refresh Operations +--- + +While you are evolving your Fast Data pipelines, you may need to perform a re-ingestion of all messages previously ingested into the system. +For example, you need to update a filter logic to refine data subsets, restructure how aggregations are organized, optimize storage by pruning obsolete records, fix transformation bugs, or generally evolve your Single View schema. + +Especially in production environment, during Initial Load / Full Refresh processes it is extremely important not to lose the **Near Real-Time (NRT)operational continuity** from what is changing on the data sources ingested by your Fast Data Pipeline. + +## Full Refresh architectural pattern + +**To guarantee the business continuity** despite the need for a full events re-ingestion, you can see an example of **Full Refresh Architecture** (from a screenshot of the **Control Plane UI**). + +![Full Refresh Architecture](img/full-refresh-architecture.png) + +As shown in the diagram, the messages from the _topic.input_ are consumed by two different flows: + +- **NRT (Near-Real-Time) Layer**: the flow in the upper half of the pipeline shows a [Stream Processor service](/products/fast_data_v2/stream_processor/10_Overview.md), which is responsible to simply forward the message to the next stage of the pipeline and to ensure business continuity +- **Backup Layer**: the flow in the lower half of the pipeline shows several processes responsible to perform a backup of the messages in a backup store: in the example, the messages inside _topic.input_ are consumed by the [Kango service](/products/fast_data_v2/kango/10_Overview.md) to compact and generate MongoDB documents. These documents are then stored in a **MongoDB collection**, which can be used **as backup**. Then a [Mongezium service](/products/fast_data_v2/mongezium_cdc/10_Overview.md) is configured to read these MongoDB document changes and consequently generate the Kafka messages published to the _topic.backup_ topic, which can be read by a _Stream Processor_ that can stay **paused and activated only when you need to reingest messages** into the pipeline. + +These operations can be easily executed leveraging **Fast Data Control Plane UI** to govern and orchestrate every stage of **Initial Load** or **Full Refresh** operations with precision and zero friction. + +:::note +Thanks to the backup layer and full refresh architectural pattern, it is possible to eliminate some critical operational constraints: instead of requesting full refreshes from external data sources or relying on infinite topic retention, you maintain a controlled backup flow that you can internally manage within your pipeline architecture, minimizing time-loss and exposure to external systems and organizational overhead. +::: + +To configure this **routing pattern** that enables the two different flows representing the regular processing of the messages (upper flow) and the backup management (lower flow), the _Stream Processor_ services of both the two layers must be configured with the **Custom Partitioner** settings, in order to make possible to produce messages on a segregated subset of the partitions of the _topic.merge_ topic. For more info about the custom partitioner settings, visit the dedicated [page](/products/fast_data_v2/stream_processor/20_Configuration.mdx). +By dedicating a set of topic partitions to the backup flow and the remaining ones to the regular flow, you reach a clearer separation of the two layers and can better regulate the speed of the reingestion of the backup messages with the speed of the ingestion in the regular flow. + +In the last Process step in the above shown picture, a _Stream Processor_ can include a dedicated logic to further guard the system from introducing messages that we might want not to be included anymore (e.g. messages from the backup flow that are now older because the regular flow - still processing - has already produced newer messages of a specific identifier in the output stream - this guard can be implemented for example by checking the timestamp of the createAt / updatedAt fields of the event coming from the source database with a internal cache for the needed service logics). + +Some final considerations: + +- you can choose whether the backup store should include the messages already refined through a transformation logic layer, to have them as a ready-to-use backup faster to reingest into the pipeline, or instead to include the raw messages, to have a more complete backup that can be reingested even with different transformation logics +- you can decide to have a faster **backup store using a Kafka topic with infinite retention** without the MongoDB persistency layer, to have a faster reingestion of the messages and have Kafka itself to deal with retention and compaction because maybe you might not need an efficient and durable storage + +## Controlled Initialization + +When performing an _Initial Load_ process, you can even use the same architecture shown in the previous diagram. +During pipeline initialization, every Fast Data workload can be configured with a default **paused** runtime state. This is managed via the **`onCreate`** parameter within each microservice's **ConfigMap**. By initializing flows in a paused state, you ensure that no workload begins consuming data immediately after deployment, allowing for manual orchestration. +Then, start resuming the first execution steps: the NRT layer will start consuming messages from the input topic; the backup one will start too, butt remind keeping in a paused state its final process (not useful during a pipeline initialization). + +## Iterative Pipeline Activation + +Whenever it is necessary to start the _Full Refresh_ process or an _Initial Load_, you can simply resume the consumption from the UI, allowing the messages in the backup topic to be reingested into the pipeline in a controlled way. +Typically, this first step involves executing transformation logic to ensure incoming data is compliant with Fast Data formats (e.g., casting, mapping, and data quality enhancements). +Once processed, these messages are produced into the output streams, ready for the subsequent stages of the pipeline. + +You can monitor the flow of the pipeline from the UI, and quickly identify bottlenecks or issues, or perform quick operations to fix them (e.g. pausing the regular flow, to allow the backup flow to process the messages and catch up with the regular flow, before resuming it again). + +## Ingestion and Lag Monitoring + +Whether it is during the regular flow of the pipeline, or an _Initial Load_ or a _Full Refresh_ operation, you have full visibility of the state of the pipeline and full control of it. + +Once the environment is ready, you can regulate message loading into the ingestion layer of your pipeline, pausing and resuming consumptions of topic messages in services. As the queues fill, the Control Plane provides real-time visibility into **Consumer Lag** across every pipeline edge, allowing you to monitor the volume of data awaiting processing. + +## Advanced Aggregation Management + +When dealing with **Aggregate execution steps**, the **Aggregation Graph Canvas** provides a centralized strategic view. This interface is specifically designed to manage complex scenarios where multiple data streams must be merged. + +**Best Practice: The Leaf-to-Head Strategy** +For efficient ingestion, it is recommended to resume consumption following a "bottom-up" approach: + +1. **Start from the Leaves**: Resume consumption at the leaf nodes of the aggregation graph. +2. **Monitor Lag**: Observe the incremental decrease in consumer lag. +3. **Progression**: Once the lag approaches zero, move to the next level of the graph. +4. **Activate the Head Node**: Finally, resume the head node of the aggregation. + +:::note +By keeping the head node in a **Paused** state while the leaves process data, you prevent the production of premature events in the output stream. Once the head is resumed, it will produce the final aggregated output, significantly reducing redundant processing load on downstream stages. +::: + +By combining real-time **Consumer Lag monitoring** with granular **runtime state control**, the Control Plane transforms complex Initial Load and Full Refresh operations into a manageable, transparent, and highly efficient process. diff --git a/docs/products/fast_data_v2/best_practices/overview.md b/docs/products/fast_data_v2/best_practices/overview.md new file mode 100644 index 0000000000..3663c0f29b --- /dev/null +++ b/docs/products/fast_data_v2/best_practices/overview.md @@ -0,0 +1,44 @@ +--- +id: overview +title: Best Practices +sidebar_label: Overview +--- + +This section provides best practices and operational strategies for effectively designing and managing Fast Data v2 pipelines. + +## How to navigate this section + +The Fast Data v2 Best Practices are organized into three main areas to guide you through different stages of your data pipeline lifecycle: + +### [Pipeline Development & Testing](/products/fast_data_v2/best_practices/pipeline_development_testing.md) + +Start here during the development phase of your Fast Data pipelines. Learn how to: +- Visualize pipeline architecture as you build it +- Simulate performance scenarios with pause/resume controls +- Test system behavior under different load patterns before promoting to production + +### [Initial Load & Full Refresh Operations](/products/fast_data_v2/best_practices/initial_load_full_refresh.md) + +Master the operational strategies for managing data re-ingestion in production. Understand: +- How to maintain Near Real-Time operational continuity during complex pipeline changes +- The Full Refresh architectural pattern with NRT and Backup layers +- Controlled initialization and iterative pipeline activation +- Consumer lag monitoring and the Leaf-to-Head strategy for aggregations + +### [System Optimization & Reliability](/products/fast_data_v2/best_practices/system_optimization_reliability.md) + +Ensure your Fast Data infrastructure runs efficiently and reliably. Discover: +- Strategic resource allocation through granular runtime controls +- Performance optimization techniques +- Enhanced system reliability and fault isolation +- Maintenance strategies and graceful degradation patterns + +--- + +## Key Concepts + +**Runtime Control**: The ability to pause and resume message consumption at any pipeline stage, enabling precise orchestration of data flows without stopping the entire pipeline. + +**Near Real-Time (NRT) Continuity**: Maintaining continuous processing of new incoming data while performing full refreshes or data reprocessing operations on historical data. + +**Backup Layer**: A dedicated flow that maintains a controlled backup of your messages, enabling full refresh operations without requiring infinite topic retention or direct access to source databases. diff --git a/docs/products/fast_data_v2/best_practices/pipeline_development_testing.md b/docs/products/fast_data_v2/best_practices/pipeline_development_testing.md new file mode 100644 index 0000000000..7173a7ec2d --- /dev/null +++ b/docs/products/fast_data_v2/best_practices/pipeline_development_testing.md @@ -0,0 +1,15 @@ +--- +id: pipeline_development_testing +title: Pipeline Development & Testing +sidebar_label: Pipeline Development & Testing +--- + +This section covers best practices for developing and testing Fast Data v2 pipelines during the development phase, where you can safely experiment and validate your architecture before promoting to production. + +## Visualize Fast Data Pipelines while Building Them + +During the Fast Data development phase, users can iteratively configure and continuously deploy in the development environment new Fast Data pipeline steps. Control Plane UI will provide the new architecture steps incrementally rendered, offering immediate visual feedback as the pipeline evolves. + +## Performance Testing and Simulation + +During the Fast Data development phase, users can simulate different scenarios for performance testing by pausing and resuming messages consumption along the pipeline. In this way, user can pause and resume operations to test system behavior under different load patterns before to promote to production. diff --git a/docs/products/fast_data_v2/best_practices/system_optimization_reliability.md b/docs/products/fast_data_v2/best_practices/system_optimization_reliability.md new file mode 100644 index 0000000000..65b022b420 --- /dev/null +++ b/docs/products/fast_data_v2/best_practices/system_optimization_reliability.md @@ -0,0 +1,17 @@ +--- +id: system_optimization_reliability +title: System Optimization & Reliability +sidebar_label: System Optimization & Reliability +--- + +This section covers strategies for optimizing Fast Data v2 system performance and ensuring reliability through granular runtime controls and architectural best practices. + +## Strategic Resource Allocation and Performance Optimization + +By leveraging the ability to pause and resume message-consuming microservices in real-time and verifying the lag of your topic and the stability of your services, the Control Plane ensures that computing power is strategically directed toward high-priority tasks during peak demand periods. +These granular runtime controls facilitate a balanced distribution of processing loads across every stage of the architecture, effectively mitigating bottlenecks and ensuring maximum resource utilization throughout your entire Fast Data v2 infrastructure. + +## Enhanced System Reliability + +When faced with scheduled maintenance or unforeseen anomalies, the Control Plane allows for precise intervention by pausing specific pipeline segments, ensuring that controlled troubleshooting occurs without compromising the broader system workflow. +This systematic approach extends into post-maintenance phases, where operations can be resumed gradually to verify stability and minimize recovery time. Beyond routine maintenance, these runtime controls facilitate effective fault isolation, enabling you to contain issues within localized segments to protect the integrity of the overall infrastructure. By implementing graceful degradation through precise shutdown and startup procedures, you ensure that your Fast Data v2 environment maintains absolute operational integrity even in challenging circumstances. diff --git a/docs/products/fast_data_v2/runtime_management/best_practices.md b/docs/products/fast_data_v2/runtime_management/best_practices.md deleted file mode 100644 index a4efac11d5..0000000000 --- a/docs/products/fast_data_v2/runtime_management/best_practices.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -id: best_practices -title: Best Practices -sidebar_label: Best Practices ---- - -This page provides best practices and operational strategies for effectively managing Fast Data v2 pipelines using the Control Plane UI runtime controls. - -## Development Data Pipelines Best Practices - -### Visualize Fast Data Pipelines while Building Them - -During the Fast Data development phase, users can iteratively configure and continuously deploy in the development environment new Fast Data pipeline steps. Control Plane UI will provide the new architecture steps incrementally rendered, offering immediate visual feedback as the pipeline evolves. - -### Performance Testing and Simulation - -During the Fast Data development phase, users can simulate different scenarios for performance testing by pausing and resuming messages consumption along the pipeline. In this way, user can pause and resume operations to test system behavior under different load patterns before to promote to production. - -## Operational Management Strategies - -### Initial Load and Full Refresh Processes Management - -The **Control Plane UI** allows you to govern and orchestrate every stage of **Initial Load** or **Full Refresh** operations with precision and zero friction. - -#### 1. Controlled Initialization - -To ensure a stable start, every Fast Data workload can be configured with a default **Paused** runtime state. This is managed via the **`onCreate`** parameter within each microservice's **ConfigMap**. By initializing flows in a paused state, you ensure that no workload begins consuming data immediately after deployment, allowing for manual orchestration. - -#### 2. Ingestion and Lag Monitoring - -Once the environment is ready, you can initiate message loading into the ingestion layer of your pipeline. As the queues fill, the Control Plane provides real-time visibility into **Consumer Lag** across every pipeline edge, allowing you to monitor the volume of data awaiting processing. - -#### 3. Iterative Pipeline Activation - -After the initial data load, you can trigger consumption for the first stage of the pipeline using the **Resume** button. - -* **Transformation Stage**: Typically, this first step involves executing transformation logic to ensure incoming data is compliant with Fast Data formats (e.g., casting, mapping, and data quality enhancements). -* **Downstream Flow**: Once processed, these messages are produced into the output streams, ready for the subsequent stages of the pipeline. - -#### 4. Advanced Aggregation Management - -When dealing with **Aggregate execution steps**, the **Aggregation Graph Canvas** provides a centralized strategic view. This interface is specifically designed to manage complex scenarios where multiple data streams must be merged. - -**Best Practice: The Leaf-to-Head Strategy** -For efficient ingestion, it is recommended to resume consumption following a "bottom-up" approach: - -1. **Start from the Leaves**: Resume consumption at the leaf nodes of the aggregation graph. -2. **Monitor Lag**: Observe the incremental decrease in consumer lag. -3. **Progression**: Once the lag approaches zero, move to the next level of the graph. -4. **Activate the Head Node**: Finally, resume the head node of the aggregation. - -:::note -By keeping the head node in a **Paused** state while the leaves process data, you prevent the production of premature events in the output stream. Once the head is resumed, it will produce the final aggregated output, significantly reducing redundant processing load on downstream stages. -::: - -By combining real-time **Consumer Lag monitoring** with granular **runtime state control**, the Control Plane transforms complex Initial Load and Full Refresh operations into a manageable, transparent, and highly efficient process. - -### Strategic Resource Allocation and Performance Optimization - -By leveraging the ability to pause and resume message-consuming microservices in real-time, the Control Plane ensures that computing power is strategically directed toward high-priority tasks during peak demand periods. These granular runtime controls facilitate a balanced distribution of processing loads across every stage of the architecture, effectively mitigating bottlenecks and ensuring maximum resource utilization throughout your entire Fast Data v2 infrastructure. - -### Enhanced System Reliability - -When faced with scheduled maintenance or unforeseen anomalies, the Control Plane allows for precise intervention by pausing specific pipeline segments, ensuring that controlled troubleshooting occurs without compromising the broader system workflow. -This systematic approach extends into post-maintenance phases, where operations can be resumed gradually to verify stability and minimize recovery time. Beyond routine maintenance, these runtime controls facilitate effective fault isolation, enabling you to contain issues within localized segments to protect the integrity of the overall infrastructure. By implementing graceful degradation through precise shutdown and startup procedures, you ensure that your Fast Data v2 environment maintains absolute operational integrity even in challenging circumstances. diff --git a/docs/products/fast_data_v2/runtime_management/control_plane_ui.md b/docs/products/fast_data_v2/runtime_management/control_plane_ui.md index 5e12bca771..43b4633d57 100644 --- a/docs/products/fast_data_v2/runtime_management/control_plane_ui.md +++ b/docs/products/fast_data_v2/runtime_management/control_plane_ui.md @@ -156,7 +156,7 @@ The pipeline provides **Pause Data Consumption** and **Resume Data Consumption** Pause and Resume buttons are available whenever you click on a pipeline step that supports runtime state control for specific data flows. Additionally, for the Aggregate execution step, these same controls are also available directly within the Aggregation Graph Canvas, providing enhanced utility for managing Initial Load and Full Refresh scenarios, allowing for more efficient and optimized runtime control in these and other operational scenarios. -For more detailed operational strategies and best practices on using these runtime controls effectively, visit the [Best Practices documentation](/products/fast_data_v2/runtime_management/best_practices.md). +For more detailed operational strategies and best practices on using these runtime controls effectively, visit the [Best Practices documentation](/products/fast_data_v2/best_practices/overview.md). ## Navigating UI diff --git a/docs/products/fast_data_v2/runtime_management/img/full-refresh-architecture.png b/docs/products/fast_data_v2/runtime_management/img/full-refresh-architecture.png new file mode 100644 index 0000000000..ec503c5cbd Binary files /dev/null and b/docs/products/fast_data_v2/runtime_management/img/full-refresh-architecture.png differ diff --git a/docs/products/fast_data_v2/runtime_management/overview.md b/docs/products/fast_data_v2/runtime_management/overview.md index 93ac6bd31b..083d9397ba 100644 --- a/docs/products/fast_data_v2/runtime_management/overview.md +++ b/docs/products/fast_data_v2/runtime_management/overview.md @@ -62,4 +62,4 @@ Here are some useful links to start adopting Runtime Management features into yo - visit the [Control Plane UI documentation](/products/fast_data_v2/runtime_management/control_plane_ui.md) to learn how to interact with the Control Plane frontend interface and manage your Fast Data pipelines; - visit the [Application Configuration documentation](/products/fast_data_v2/runtime_management/application_configuration.md) to understand how to configure the Control Plane application and to enable the communication with the Fast Data Engine workloads; - visit the [Compatibility Matrix](/products/fast_data_v2/runtime_management/compatibility_matrix.md) to check whether your infrastructure and Fast Data v2 services are compatible with the Runtime Management features; -- visit the [Best Practices documentation](/products/fast_data_v2/runtime_management/best_practices.md) for recommendations on initial load strategies, monitoring approaches, and optimization techniques. +- visit the [Best Practices documentation](/products/fast_data_v2/best_practices/overview.md) for recommendations on initial load strategies, monitoring approaches, and optimization techniques. diff --git a/sidebars.json b/sidebars.json index a017d49b38..4a669162b2 100644 --- a/sidebars.json +++ b/sidebars.json @@ -2314,9 +2314,28 @@ { "id": "products/fast_data_v2/runtime_management/compatibility_matrix", "type": "doc" + } + ] + }, + { + "label": "Best Practices", + "type": "category", + "collapsed": true, + "link": { + "id": "products/fast_data_v2/best_practices/overview", + "type": "doc" + }, + "items": [ + { + "id": "products/fast_data_v2/best_practices/pipeline_development_testing", + "type": "doc" + }, + { + "id": "products/fast_data_v2/best_practices/initial_load_full_refresh", + "type": "doc" }, { - "id": "products/fast_data_v2/runtime_management/best_practices", + "id": "products/fast_data_v2/best_practices/system_optimization_reliability", "type": "doc" } ]