Conversation
| configured persistence layer and, analyzing the aggregation graph, automatically | ||
| generate the recommended indexes for your use case. | ||
| - Kafka topics must exist on the Kafka cluster with the appropriate configuration (partitions, retention, replication factor); see [Topics](/products/fast_data_v2/kafka.md#topics) in the Kafka Reference; | ||
| - MongoDB collections must be defined on the MongoDB cluster with the necessary indexes. |
There was a problem hiding this comment.
I think this no more, Farm Data can automatically create sink collections and also the proper indexes for them on the basis of the aggregation graph configured
| :::warning | ||
|
|
||
| All Fast Data v2 workloads have `allow.auto.create.topics` hardcoded to `"false"`. Topics must be created with the proper configuration before starting the services. | ||
|
|
||
| ::: |
There was a problem hiding this comment.
I should not mention this hardcoded setting, I think it is enough the previous phrase
|
|
||
| ## Consumer Configuration | ||
|
|
||
| ### Required Properties |
There was a problem hiding this comment.
This paragraph seems not to be about required properties given that client.id is not a required property. Maybe to change the paragraph title
| ### Fixed Properties | ||
|
|
||
| The following producer properties are hardcoded across all Fast Data v2 workloads and cannot be overridden: | ||
|
|
||
| | Property | Value | Reason | | ||
| | -------------------------- | --------- | ------------------------------------------------------------------------------------------------ | | ||
| | `allow.auto.create.topics` | `"false"` | Topics must be created manually with the correct partition, retention, and replication settings. | | ||
| | `enable.idempotence` | `"true"` | Prevents duplicate messages from being produced to the broker. | | ||
| | `acks` | `"all"` | Requires acknowledgement from all in-sync replicas before a write is considered successful. | | ||
|
|
||
| The first parameter is included to enforce user responsibility over topics creation, so that the proper configurations, such as number of partitions, replication factor and retention policy are set. In addition, the latter properties ensure that no duplicated messages are produced on Kafka brokers. |
There was a problem hiding this comment.
I'm not sure I want to specify certain properties that are hardcoded
| Where: | ||
|
|
||
| - `__sink` is a constant prefix that signals the collection is used internally by Farm Data; | ||
| - `<aggregation_id>` is the value of the `id` configuration field identifying the aggregation process. This identifier **must be between 8 and 16 characters** and must comply with MongoDB [collection name restrictions](https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Restriction-on-Collection-Names); |
There was a problem hiding this comment.
here to mention that the property is also findable inside the json schema of Farm Data service
| - `<aggregation_id>` is the value of the `id` configuration field identifying the aggregation process. This identifier **must be between 8 and 16 characters** and must comply with MongoDB [collection name restrictions](https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Restriction-on-Collection-Names); | ||
| - `<aggregation_node_name>` is the name of a node in the aggregation graph. | ||
|
|
||
| ### Required Indexes |
There was a problem hiding this comment.
Here I'd mention the fact that Farm Data automatically create the necessary indexes on the basis of the configuration graph
There was a problem hiding this comment.
I'd also put an example about: aggregation graph ---> list of automatically created indexes in the various sink collections (one for each data stream)
| "config": { | ||
| "url": "mongodb://localhost:27017/farm-data", | ||
| "database": "farm-data", | ||
| "appName": "eu.miaplatfor.farm-data.lakes" |
There was a problem hiding this comment.
| "appName": "eu.miaplatfor.farm-data.lakes" | |
| "appName": "eu.miaplatform.farm-data.lakes" |
| } | ||
| ``` | ||
|
|
||
| For full persistence configuration details, see [Farm Data Configuration — Persistence](/products/fast_data_v2/farm_data/20_Configuration.mdx#persistence). |
There was a problem hiding this comment.
I don't find this link
|
|
||
| Kango reads Kafka records and persists them into MongoDB collections. It acts as the final persistence step of a Fast Data pipeline, writing processed and aggregated data into the operational data store. | ||
|
|
||
| ### Write Modes |
There was a problem hiding this comment.
This part a bit overlap the one written here /docs/next/products/fast_data_v2/kango/30_usage#write-mode
| | `strict` *(default)* | Only fields from the `after` payload are **retained**. Insert operations act as _replace_ (unknown fields are discarded). Update operations _unset_ fields that existed in `before` but are absent from `after`. | | ||
| | `partial` | Fields from the `after` payload are **merged** onto the stored document. Insert operations act as _upserts_; updates apply only the changed fields. | | ||
|
|
||
| ### Required Indexes |
There was a problem hiding this comment.
Here (or maybe better inside Kango usage docs page? ), could be useful to point out about how Kango performs on the basis of the type of operation "op" of the consumed message
|
|
||
| ::: | ||
|
|
||
| ## Processing Function |
There was a problem hiding this comment.
Inside this section and inside the cache section I'd insert a lot of different examples of functions, pointing out the importance of the signature, the handling of the "op" type of messages, in order to strongly help users in understanding how to deal with js functions for stream processor
Description
Various improvements to Fast Data v2 documentation.
Architecture > Kafka: add new page with Kafka reference and link other pages to itArchitecture > MongoDB: add new page with MongoDB reference and link other pages to itPull Request Type
PR Checklist