feat(comments): Add runner for comments migration separately #380

sakshamarora1 · 2026-02-02T16:02:30Z

closes: #286

Steps

Update the collection queries for a collection, retreive all the comments for the records in the records found and create a json metadata file.

ipython ./scripts/dump_comments_to_migrate.py

Output file: comments_metadata.json
Another output file generated for the missing users: users_metadata.json

For missing users:
users_metadata.json file will be read and then this script (with some tweaks) can be run to find out the missing users in the new system.
https://gitlab.cern.ch/cds-team/production_scripts/-/blob/master/cds-rdm/migration/dump_users.py?ref_type=heads
Create those users using:

cds-migrator-kit comments commenters-run --filepath /comments/users_metadata.json --missing-users-filepath /eos/media/cds/cds-rdm/dev/migration/users/people.csv --dry-run

Place comments_metadata.json in /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/
Finally migrate the comments:

invenio migration comments --filepath /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/comments_metadata.json --dirpath /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/ --dry-run

kpsherva · 2026-02-09T11:05:04Z

cds_migrator_kit/rdm/comments/load.py

+        self.all_record_versions = {
+            str(hit["versions"]["index"]): hit for hit in search_result
+        }
+        oldest_version = min(


wouldn't it be faster via record._record.versions[-1]? I mean instead of scan_versions etc.

That returns an instance of VersionsManager and it doesn't have other versions stored in it. We will have to do scan_versions to find all the versions and select the minimum un-deleted version available

kpsherva · 2026-02-09T11:07:58Z

cds_migrator_kit/rdm/comments/load.py

+        elif comment_status == "dm":
+            comment_payload["payload"].update(
+                {
+                    "content": "comment was deleted by the moderator.",


in RDM we do not have the "moderator" - it would be good to align it with what we display when we delete a comment in RDM (I don't remember the exact text). ping @zzacharo for more opinions

We not display this content for the deleted comments, we do this in the frontend

kpsherva · 2026-02-09T15:40:30Z

cds_migrator_kit/rdm/comments/load.py

+            {}, request=request.model, request_id=str(request.id), type=event_type
+        )
+
+        if data.get("file_relation"):


can you add a small comment on why we are doing this?

kpsherva · 2026-02-09T16:05:55Z

cds_migrator_kit/rdm/comments/load.py

+        )
+        return self.all_record_versions[str(oldest_version)]
+
+    def create_event(self, request, data, community, record, parent_comment_id=None):


is there any way we can optimise this function to be more readable? there are a lot of conditional statements, some with repeated conditions, also it would be good if we avoid nesting

I did some more optimisations

kpsherva · 2026-02-09T16:07:14Z

cds_migrator_kit/rdm/comments/load.py

+                {"user": str(user.id)}, raise_=True
+            )
+        else:
+            print("User not found for email: ", data.get("created_by"))


the print is redundant if you raise.
what will happen if you raise? will the whole script halt? and need to be re-run?

No, it gets caught and logged in _load() and now that I have put it under the UnitOfWork context as you suggested, it will rollback when this is raised.

kpsherva · 2026-02-09T16:49:15Z

cds_migrator_kit/rdm/comments/load.py

+        event.model.version_id = 0
+
+        event.commit()
+        db.session.commit()


would it be better if we do the uow instead? otherwise you will need to re-index all requests
plus, from records migration experience I can tell you uow is faster

kpsherva · 2026-02-09T16:49:52Z

cds_migrator_kit/rdm/comments/load.py

+        created_at = datetime.fromisoformat(record["created"])
+        request.model.created = created_at
+
+        request.commit()


this part would also benefic from uow

kpsherva · 2026-02-09T16:51:00Z

scripts/copy_comments_attached_files.py

+    environment, collection
+)
+"""
+collection_name/


nice, this docstring very helpful, thank you!

kpsherva · 2026-02-09T16:51:46Z

scripts/dump_comments_to_migrate.py

+
+
+# Function to flatten arbitrarily nested comment replies into a 1-level replies list
+def flatten_replies(comments_list):


let's do the rubber duck excersise on this one :)

sakshamarora1 added 2 commits February 2, 2026 17:01

feat(comments): Add runner for comments migration separately

70119d4

feat(comments): Add link in comment content for linked files

a9b2e14

sakshamarora1 marked this pull request as ready for review February 4, 2026 16:33

kpsherva reviewed Feb 9, 2026

View reviewed changes



		# Function to flatten arbitrarily nested comment replies into a 1-level replies list
		def flatten_replies(comments_list):

feat(comments): Add runner for comments migration separately #380

Are you sure you want to change the base?

feat(comments): Add runner for comments migration separately #380

Uh oh!

Conversation

sakshamarora1 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sakshamarora1 commented Feb 2, 2026 •

edited

Loading