Skip to content

Edit: Fast Data v2 improvements#2488

Draft
nicola88 wants to merge 7 commits intomainfrom
edit/fast-data-v2-improvements
Draft

Edit: Fast Data v2 improvements#2488
nicola88 wants to merge 7 commits intomainfrom
edit/fast-data-v2-improvements

Conversation

@nicola88
Copy link
Copy Markdown
Contributor

@nicola88 nicola88 commented Mar 19, 2026

Description

Various improvements to Fast Data v2 documentation.

  • Architecture > Kafka: add new page with Kafka reference and link other pages to it
  • Architecture > MongoDB: add new page with MongoDB reference and link other pages to it

Pull Request Type

  • Documentation content changes
  • Bugfix / Missing Redirects
  • Docusaurus site code changes

PR Checklist

  • The commit message follows our guidelines included in the CONTRIBUTING.md
  • All tests of Lint, cspell and check-content pass. (How to launch them?)
  • No sensitive content has been committed

@nicola88 nicola88 changed the title Remove references to indexer CLI Edit: Fast Data v2 improvements Mar 19, 2026
configured persistence layer and, analyzing the aggregation graph, automatically
generate the recommended indexes for your use case.
- Kafka topics must exist on the Kafka cluster with the appropriate configuration (partitions, retention, replication factor); see [Topics](/products/fast_data_v2/kafka.md#topics) in the Kafka Reference;
- MongoDB collections must be defined on the MongoDB cluster with the necessary indexes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this no more, Farm Data can automatically create sink collections and also the proper indexes for them on the basis of the aggregation graph configured

Comment on lines +32 to +36
:::warning

All Fast Data v2 workloads have `allow.auto.create.topics` hardcoded to `"false"`. Topics must be created with the proper configuration before starting the services.

:::
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should not mention this hardcoded setting, I think it is enough the previous phrase


## Consumer Configuration

### Required Properties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph seems not to be about required properties given that client.id is not a required property. Maybe to change the paragraph title

Comment on lines +169 to +179
### Fixed Properties

The following producer properties are hardcoded across all Fast Data v2 workloads and cannot be overridden:

| Property | Value | Reason |
| -------------------------- | --------- | ------------------------------------------------------------------------------------------------ |
| `allow.auto.create.topics` | `"false"` | Topics must be created manually with the correct partition, retention, and replication settings. |
| `enable.idempotence` | `"true"` | Prevents duplicate messages from being produced to the broker. |
| `acks` | `"all"` | Requires acknowledgement from all in-sync replicas before a write is considered successful. |

The first parameter is included to enforce user responsibility over topics creation, so that the proper configurations, such as number of partitions, replication factor and retention policy are set. In addition, the latter properties ensure that no duplicated messages are produced on Kafka brokers.
Copy link
Copy Markdown
Contributor

@albertotessarotto albertotessarotto Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I want to specify certain properties that are hardcoded

Where:

- `__sink` is a constant prefix that signals the collection is used internally by Farm Data;
- `<aggregation_id>` is the value of the `id` configuration field identifying the aggregation process. This identifier **must be between 8 and 16 characters** and must comply with MongoDB [collection name restrictions](https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Restriction-on-Collection-Names);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here to mention that the property is also findable inside the json schema of Farm Data service

- `<aggregation_id>` is the value of the `id` configuration field identifying the aggregation process. This identifier **must be between 8 and 16 characters** and must comply with MongoDB [collection name restrictions](https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Restriction-on-Collection-Names);
- `<aggregation_node_name>` is the name of a node in the aggregation graph.

### Required Indexes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd mention the fact that Farm Data automatically create the necessary indexes on the basis of the configuration graph

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also put an example about: aggregation graph ---> list of automatically created indexes in the various sink collections (one for each data stream)

"config": {
"url": "mongodb://localhost:27017/farm-data",
"database": "farm-data",
"appName": "eu.miaplatfor.farm-data.lakes"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"appName": "eu.miaplatfor.farm-data.lakes"
"appName": "eu.miaplatform.farm-data.lakes"

}
```

For full persistence configuration details, see [Farm Data Configuration — Persistence](/products/fast_data_v2/farm_data/20_Configuration.mdx#persistence).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this link


Kango reads Kafka records and persists them into MongoDB collections. It acts as the final persistence step of a Fast Data pipeline, writing processed and aggregated data into the operational data store.

### Write Modes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part a bit overlap the one written here /docs/next/products/fast_data_v2/kango/30_usage#write-mode

| `strict` *(default)* | Only fields from the `after` payload are **retained**. Insert operations act as _replace_ (unknown fields are discarded). Update operations _unset_ fields that existed in `before` but are absent from `after`. |
| `partial` | Fields from the `after` payload are **merged** onto the stored document. Insert operations act as _upserts_; updates apply only the changed fields. |

### Required Indexes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (or maybe better inside Kango usage docs page? ), could be useful to point out about how Kango performs on the basis of the type of operation "op" of the consumed message


:::

## Processing Function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this section and inside the cache section I'd insert a lot of different examples of functions, pointing out the importance of the signature, the handling of the "op" type of messages, in order to strongly help users in understanding how to deal with js functions for stream processor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants