Adiciona tarefa para migrar dados de Institution para RawOrganization por coleção#1284
Conversation
…sks.py Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adiciona uma nova task Celery para executar, em lote e por lista de coleções (e opcionalmente por ISSN), a migração de dados de Institution para os campos raw_* (RawOrganization) nos históricos (PublisherHistory, OwnerHistory, SponsorHistory, CopyrightHolderHistory) associados aos journals.
Changes:
- Cria a task
task_migrate_institution_history_to_raw_institutioncom filtros por coleção e ISSN e retorno de estatísticas. - Implementa contadores por tipo de histórico e registra erros via
UnexpectedEvent. - Adiciona uma suíte de testes cobrindo migração, filtros e cenário sem instituição.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
journal/tasks.py |
Nova task Celery para migração em lote de institution → raw_* em históricos, com filtros e estatísticas. |
journal/tests.py |
Novos testes unitários para validar migração, filtros por coleção/ISSN e contadores. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Create mock self with request attribute for Celery task | ||
| mock_self = MagicMock() | ||
| mock_self.request = MagicMock() | ||
|
|
||
| # Run the task | ||
| result = task_migrate_institution_history_to_raw_institution( | ||
| mock_self, |
There was a problem hiding this comment.
These tests call the Celery bound task with an explicit positional self (mock_self). For bind=True tasks, Celery supplies the task instance automatically when the task is called; passing a positional argument here will be treated as username and then conflicts with username="testuser" (TypeError: multiple values for argument 'username'). Call the task without the extra positional argument (or use .run(...) / .apply(kwargs=...)) and avoid mocking request in a way that makes _get_user attempt User.objects.get(pk=<MagicMock>).
| # Create mock self with request attribute for Celery task | |
| mock_self = MagicMock() | |
| mock_self.request = MagicMock() | |
| # Run the task | |
| result = task_migrate_institution_history_to_raw_institution( | |
| mock_self, | |
| # Run the task | |
| result = task_migrate_institution_history_to_raw_institution.run( |
| # Create mock self with request attribute for Celery task | ||
| mock_self = MagicMock() | ||
| mock_self.request = MagicMock() | ||
|
|
||
| # Run task only for "scl" collection | ||
| result = task_migrate_institution_history_to_raw_institution( | ||
| mock_self, | ||
| username="testuser", | ||
| collection_acron_list=["scl"], | ||
| ) |
There was a problem hiding this comment.
Same issue as above: this bound Celery task should not receive an explicit positional self argument in the test. Passing mock_self will be interpreted as username and will conflict with the username= kwarg, causing the test to error before exercising the task logic.
| # Create mock self with request attribute for Celery task | ||
| mock_self = MagicMock() | ||
| mock_self.request = MagicMock() | ||
|
|
||
| # Run task only for specific ISSN | ||
| result = task_migrate_institution_history_to_raw_institution( | ||
| mock_self, | ||
| username="testuser", | ||
| collection_acron_list=["scl"], | ||
| journal_issns=["1234-5678"], | ||
| ) |
There was a problem hiding this comment.
Same issue as above: remove the positional mock_self argument when invoking the bound Celery task (or call .run / .apply). As written, this invocation will raise due to multiple values for username and won't test the ISSN filtering behavior.
| # Create mock self with request attribute for Celery task | ||
| mock_self = MagicMock() | ||
| mock_self.request = MagicMock() | ||
|
|
||
| # Run the task | ||
| result = task_migrate_institution_history_to_raw_institution( | ||
| mock_self, | ||
| username="testuser", | ||
| collection_acron_list=["scl"], | ||
| ) |
There was a problem hiding this comment.
Same issue as above: invoking a bind=True Celery task in tests should not pass an explicit positional self. This currently prevents the test from running and can also make _get_user try to query with a MagicMock pk depending on how request is mocked.
| if journal.publisher_history.filter(institution__isnull=False).exists(): | ||
| migrated_items = journal.migrate_publisher_history_to_raw() | ||
| migrated_publishers += len(migrated_items) | ||
| logger.info( | ||
| f"Migrated {len(migrated_items)} publisher history records " | ||
| f"for journal {journal.id}" | ||
| ) | ||
|
|
||
| # Check and migrate owner_history records with institution != None | ||
| if journal.owner_history.filter(institution__isnull=False).exists(): | ||
| migrated_items = journal.migrate_owner_history_to_raw() | ||
| migrated_owners += len(migrated_items) | ||
| logger.info( | ||
| f"Migrated {len(migrated_items)} owner history records " | ||
| f"for journal {journal.id}" | ||
| ) | ||
|
|
||
| # Check and migrate sponsor_history records with institution != None | ||
| if journal.sponsor_history.filter(institution__isnull=False).exists(): | ||
| migrated_items = journal.migrate_sponsor_history_to_raw() | ||
| migrated_sponsors += len(migrated_items) | ||
| logger.info( | ||
| f"Migrated {len(migrated_items)} sponsor history records " | ||
| f"for journal {journal.id}" | ||
| ) | ||
|
|
||
| # Check and migrate copyright_holder_history records with institution != None | ||
| if journal.copyright_holder_history.filter(institution__isnull=False).exists(): | ||
| migrated_items = journal.migrate_copyright_holder_history_to_raw() | ||
| migrated_copyright_holders += len(migrated_items) | ||
| logger.info( | ||
| f"Migrated {len(migrated_items)} copyright holder history records " |
There was a problem hiding this comment.
The counters here use len(migrated_items), but Journal.migrate_*_history_to_raw() appends all history items (including those where institution is already null and therefore nothing is migrated). This will inflate the "migrated_*" stats whenever a journal has a mix of null/non-null institutions. Prefer counting the queryset that actually needs migration (e.g., publisher_history.filter(institution__isnull=False).count()) and increment by that value (and you can use that count to decide whether to call the migration method, avoiding the extra .exists() query too).
| if journal.publisher_history.filter(institution__isnull=False).exists(): | |
| migrated_items = journal.migrate_publisher_history_to_raw() | |
| migrated_publishers += len(migrated_items) | |
| logger.info( | |
| f"Migrated {len(migrated_items)} publisher history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate owner_history records with institution != None | |
| if journal.owner_history.filter(institution__isnull=False).exists(): | |
| migrated_items = journal.migrate_owner_history_to_raw() | |
| migrated_owners += len(migrated_items) | |
| logger.info( | |
| f"Migrated {len(migrated_items)} owner history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate sponsor_history records with institution != None | |
| if journal.sponsor_history.filter(institution__isnull=False).exists(): | |
| migrated_items = journal.migrate_sponsor_history_to_raw() | |
| migrated_sponsors += len(migrated_items) | |
| logger.info( | |
| f"Migrated {len(migrated_items)} sponsor history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate copyright_holder_history records with institution != None | |
| if journal.copyright_holder_history.filter(institution__isnull=False).exists(): | |
| migrated_items = journal.migrate_copyright_holder_history_to_raw() | |
| migrated_copyright_holders += len(migrated_items) | |
| logger.info( | |
| f"Migrated {len(migrated_items)} copyright holder history records " | |
| publisher_to_migrate = journal.publisher_history.filter( | |
| institution__isnull=False | |
| ).count() | |
| if publisher_to_migrate: | |
| journal.migrate_publisher_history_to_raw() | |
| migrated_publishers += publisher_to_migrate | |
| logger.info( | |
| f"Migrated {publisher_to_migrate} publisher history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate owner_history records with institution != None | |
| owner_to_migrate = journal.owner_history.filter( | |
| institution__isnull=False | |
| ).count() | |
| if owner_to_migrate: | |
| journal.migrate_owner_history_to_raw() | |
| migrated_owners += owner_to_migrate | |
| logger.info( | |
| f"Migrated {owner_to_migrate} owner history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate sponsor_history records with institution != None | |
| sponsor_to_migrate = journal.sponsor_history.filter( | |
| institution__isnull=False | |
| ).count() | |
| if sponsor_to_migrate: | |
| journal.migrate_sponsor_history_to_raw() | |
| migrated_sponsors += sponsor_to_migrate | |
| logger.info( | |
| f"Migrated {sponsor_to_migrate} sponsor history records " | |
| f"for journal {journal.id}" | |
| ) | |
| # Check and migrate copyright_holder_history records with institution != None | |
| copyright_holder_to_migrate = journal.copyright_holder_history.filter( | |
| institution__isnull=False | |
| ).count() | |
| if copyright_holder_to_migrate: | |
| journal.migrate_copyright_holder_history_to_raw() | |
| migrated_copyright_holders += copyright_holder_to_migrate | |
| logger.info( | |
| f"Migrated {copyright_holder_to_migrate} copyright holder history records " |
| - migrated_copyright_holders: Total copyright holder history records migrated | ||
| - error_count: Number of errors encountered | ||
| """ | ||
| user = _get_user(self.request, username=username, user_id=user_id) |
There was a problem hiding this comment.
Variable user is not used.
| user = _get_user(self.request, username=username, user_id=user_id) | |
| _get_user(self.request, username=username, user_id=user_id) |
O que esse PR faz?
Implementa
task_migrate_institution_history_to_raw_institutionque executa migração de dados de Institution para RawOrganization em PublisherHistory, OwnerHistory, SponsorHistory e CopyrightHolderHistory para uma lista de coleções.A tarefa:
collection_acron_liste opcionalmente porjournal_issnsinstitution__isnull=FalseOnde a revisão poderia começar?
journal/tasks.pylinha 638 - função principal da tarefajournal/tests.pylinha 687 - testes cobrindo todos os cenários (filtros, migração, contadores)Como este poderia ser testado manualmente?
institutionforam setados para None e camposraw_*populadosOu executar os testes:
Algum cenário de contexto que queira dar?
Os métodos de migração já existem no modelo Journal desde a implementação de RawOrganizationMixin. Esta tarefa permite executar a migração em lote para múltiplas coleções via Celery, seguindo o padrão de outras tarefas similares no módulo (ex:
task_export_journals_to_articlemeta,task_replace_institution_by_raw_institution).A implementação usa
.iterator()para otimizar uso de memória ao processar grandes volumes de journals.Screenshots
N/A - tarefa backend sem interface gráfica
Quais são tickets relevantes?
Relacionado à issue sobre migração de dados Institution → RawOrganization
Referências
journal/models.pylinhas 1313-1375journal/tasks.pyOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.