-
Notifications
You must be signed in to change notification settings - Fork 10
feat(comments): Add runner for comments migration separately #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(comments): Add runner for comments migration separately #380
Conversation
| self.all_record_versions = { | ||
| str(hit["versions"]["index"]): hit for hit in search_result | ||
| } | ||
| oldest_version = min( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be faster via record._record.versions[-1]? I mean instead of scan_versions etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That returns an instance of VersionsManager and it doesn't have other versions stored in it. We will have to do scan_versions to find all the versions and select the minimum un-deleted version available
| elif comment_status == "dm": | ||
| comment_payload["payload"].update( | ||
| { | ||
| "content": "comment was deleted by the moderator.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in RDM we do not have the "moderator" - it would be good to align it with what we display when we delete a comment in RDM (I don't remember the exact text). ping @zzacharo for more opinions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We not display this content for the deleted comments, we do this in the frontend
| {}, request=request.model, request_id=str(request.id), type=event_type | ||
| ) | ||
|
|
||
| if data.get("file_relation"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a small comment on why we are doing this?
| ) | ||
| return self.all_record_versions[str(oldest_version)] | ||
|
|
||
| def create_event(self, request, data, community, record, parent_comment_id=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any way we can optimise this function to be more readable? there are a lot of conditional statements, some with repeated conditions, also it would be good if we avoid nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more optimisations
| {"user": str(user.id)}, raise_=True | ||
| ) | ||
| else: | ||
| print("User not found for email: ", data.get("created_by")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the print is redundant if you raise.
what will happen if you raise? will the whole script halt? and need to be re-run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it gets caught and logged in _load() and now that I have put it under the UnitOfWork context as you suggested, it will rollback when this is raised.
| event.model.version_id = 0 | ||
|
|
||
| event.commit() | ||
| db.session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better if we do the uow instead? otherwise you will need to re-index all requests
plus, from records migration experience I can tell you uow is faster
| created_at = datetime.fromisoformat(record["created"]) | ||
| request.model.created = created_at | ||
|
|
||
| request.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part would also benefic from uow
| environment, collection | ||
| ) | ||
| """ | ||
| collection_name/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this docstring very helpful, thank you!
|
|
||
|
|
||
| # Function to flatten arbitrarily nested comment replies into a 1-level replies list | ||
| def flatten_replies(comments_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do the rubber duck excersise on this one :)
closes: #286
Steps
Output file:
comments_metadata.jsonAnother output file generated for the missing users:
users_metadata.jsonFor missing users:
users_metadata.jsonfile will be read and then this script (with some tweaks) can be run to find out the missing users in the new system.https://gitlab.cern.ch/cds-team/production_scripts/-/blob/master/cds-rdm/migration/dump_users.py?ref_type=heads
Create those users using:
Place
comments_metadata.jsonin/eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/Finally migrate the comments: