Skip to content

Adiciona tarefa para migrar dados de Institution para RawOrganization por coleção#1284

Merged
robertatakenaka merged 6 commits intomainfrom
copilot/migrate-data-institution-to-raworganization
Feb 7, 2026
Merged

Adiciona tarefa para migrar dados de Institution para RawOrganization por coleção#1284
robertatakenaka merged 6 commits intomainfrom
copilot/migrate-data-institution-to-raworganization

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

O que esse PR faz?

Implementa task_migrate_institution_history_to_raw_institution que executa migração de dados de Institution para RawOrganization em PublisherHistory, OwnerHistory, SponsorHistory e CopyrightHolderHistory para uma lista de coleções.

A tarefa:

  • Filtra journals por collection_acron_list e opcionalmente por journal_issns
  • Processa apenas registros com institution__isnull=False
  • Chama os métodos de migração existentes do modelo Journal
  • Retorna estatísticas detalhadas (total_journals, migrated_publishers, migrated_owners, migrated_sponsors, migrated_copyright_holders, error_count)
# Exemplo de uso
task_migrate_institution_history_to_raw_institution.delay(
    username="user",
    collection_acron_list=["scl", "mex"],
    journal_issns=["1234-5678"]  # opcional
)

Onde a revisão poderia começar?

journal/tasks.py linha 638 - função principal da tarefa
journal/tests.py linha 687 - testes cobrindo todos os cenários (filtros, migração, contadores)

Como este poderia ser testado manualmente?

  1. Executar via Django shell ou Celery worker:
from journal.tasks import task_migrate_institution_history_to_raw_institution
result = task_migrate_institution_history_to_raw_institution.delay(
    username="admin",
    collection_acron_list=["scl"]
)
  1. Verificar logs para acompanhar progresso
  2. Conferir que campos institution foram setados para None e campos raw_* populados

Ou executar os testes:

pytest journal/tests.py::TaskMigrateInstitutionHistoryTestCase -v

Algum cenário de contexto que queira dar?

Os métodos de migração já existem no modelo Journal desde a implementação de RawOrganizationMixin. Esta tarefa permite executar a migração em lote para múltiplas coleções via Celery, seguindo o padrão de outras tarefas similares no módulo (ex: task_export_journals_to_articlemeta, task_replace_institution_by_raw_institution).

A implementação usa .iterator() para otimizar uso de memória ao processar grandes volumes de journals.

Screenshots

N/A - tarefa backend sem interface gráfica

Quais são tickets relevantes?

Relacionado à issue sobre migração de dados Institution → RawOrganization

Referências

  • Métodos de migração em journal/models.py linhas 1313-1375
  • Padrão de tarefas existentes em journal/tasks.py
Original prompt

This section details on the original issue you should resolve

<issue_title>Crie uma tarefa que execute para uma dada lista de coleções a migração de dados de Institution para RawOrganization</issue_title>
<issue_description>### Descrição da tarefa
Crie uma tarefa que execute para uma dada lista de coleções a migração de dados de Institution para RawOrganization

Subtarefas

  • Observe journal/models, o modelo Journal contém métodos migrate_publisher_history_to_raw, migrate_owner_history_to_raw, migrate_sponsor_history_to_raw, and migrate_copyright_holder_history_to_raw
  • Crie uma tarefa em journal/tasks que se chame task_migrate_institution_history_to_raw_institution que tenha como parametros username, user_id, collection_acron_list, journal_issns (opcional), siga o padrão das demais tarefas, e que esta tarefa a partir dos filtros collection_acron_list, journal_issns, e também condicionado à institution_history correspondente se o institution é diferente de None, por exemplo, publisher_history__institution__isnull=False, faça a chamada aos métodos migrate_publisher_history_to_raw, migrate_owner_history_to_raw, migrate_sponsor_history_to_raw, and migrate_copyright_holder_history_to_raw

Considerações e notas

  • A implementação destas mudanças implica em aumentar o consumo de disco..
    </issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 5 commits February 6, 2026 11:52
…sks.py

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Create task to migrate data from Institution to RawOrganization Adiciona tarefa para migrar dados de Institution para RawOrganization por coleção Feb 6, 2026
Copilot AI requested a review from robertatakenaka February 6, 2026 11:59
@robertatakenaka robertatakenaka marked this pull request as ready for review February 7, 2026 17:37
Copilot AI review requested due to automatic review settings February 7, 2026 17:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adiciona uma nova task Celery para executar, em lote e por lista de coleções (e opcionalmente por ISSN), a migração de dados de Institution para os campos raw_* (RawOrganization) nos históricos (PublisherHistory, OwnerHistory, SponsorHistory, CopyrightHolderHistory) associados aos journals.

Changes:

  • Cria a task task_migrate_institution_history_to_raw_institution com filtros por coleção e ISSN e retorno de estatísticas.
  • Implementa contadores por tipo de histórico e registra erros via UnexpectedEvent.
  • Adiciona uma suíte de testes cobrindo migração, filtros e cenário sem instituição.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
journal/tasks.py Nova task Celery para migração em lote de institutionraw_* em históricos, com filtros e estatísticas.
journal/tests.py Novos testes unitários para validar migração, filtros por coleção/ISSN e contadores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +791 to +797
# Create mock self with request attribute for Celery task
mock_self = MagicMock()
mock_self.request = MagicMock()

# Run the task
result = task_migrate_institution_history_to_raw_institution(
mock_self,
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests call the Celery bound task with an explicit positional self (mock_self). For bind=True tasks, Celery supplies the task instance automatically when the task is called; passing a positional argument here will be treated as username and then conflicts with username="testuser" (TypeError: multiple values for argument 'username'). Call the task without the extra positional argument (or use .run(...) / .apply(kwargs=...)) and avoid mocking request in a way that makes _get_user attempt User.objects.get(pk=<MagicMock>).

Suggested change
# Create mock self with request attribute for Celery task
mock_self = MagicMock()
mock_self.request = MagicMock()
# Run the task
result = task_migrate_institution_history_to_raw_institution(
mock_self,
# Run the task
result = task_migrate_institution_history_to_raw_institution.run(

Copilot uses AI. Check for mistakes.
Comment on lines +856 to +865
# Create mock self with request attribute for Celery task
mock_self = MagicMock()
mock_self.request = MagicMock()

# Run task only for "scl" collection
result = task_migrate_institution_history_to_raw_institution(
mock_self,
username="testuser",
collection_acron_list=["scl"],
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: this bound Celery task should not receive an explicit positional self argument in the test. Passing mock_self will be interpreted as username and will conflict with the username= kwarg, causing the test to error before exercising the task logic.

Copilot uses AI. Check for mistakes.
Comment on lines +904 to +914
# Create mock self with request attribute for Celery task
mock_self = MagicMock()
mock_self.request = MagicMock()

# Run task only for specific ISSN
result = task_migrate_institution_history_to_raw_institution(
mock_self,
username="testuser",
collection_acron_list=["scl"],
journal_issns=["1234-5678"],
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: remove the positional mock_self argument when invoking the bound Celery task (or call .run / .apply). As written, this invocation will raise due to multiple values for username and won't test the ISSN filtering behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +935 to +944
# Create mock self with request attribute for Celery task
mock_self = MagicMock()
mock_self.request = MagicMock()

# Run the task
result = task_migrate_institution_history_to_raw_institution(
mock_self,
username="testuser",
collection_acron_list=["scl"],
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: invoking a bind=True Celery task in tests should not pass an explicit positional self. This currently prevents the test from running and can also make _get_user try to query with a MagicMock pk depending on how request is mocked.

Copilot uses AI. Check for mistakes.
Comment on lines +696 to +727
if journal.publisher_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_publisher_history_to_raw()
migrated_publishers += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} publisher history records "
f"for journal {journal.id}"
)

# Check and migrate owner_history records with institution != None
if journal.owner_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_owner_history_to_raw()
migrated_owners += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} owner history records "
f"for journal {journal.id}"
)

# Check and migrate sponsor_history records with institution != None
if journal.sponsor_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_sponsor_history_to_raw()
migrated_sponsors += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} sponsor history records "
f"for journal {journal.id}"
)

# Check and migrate copyright_holder_history records with institution != None
if journal.copyright_holder_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_copyright_holder_history_to_raw()
migrated_copyright_holders += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} copyright holder history records "
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counters here use len(migrated_items), but Journal.migrate_*_history_to_raw() appends all history items (including those where institution is already null and therefore nothing is migrated). This will inflate the "migrated_*" stats whenever a journal has a mix of null/non-null institutions. Prefer counting the queryset that actually needs migration (e.g., publisher_history.filter(institution__isnull=False).count()) and increment by that value (and you can use that count to decide whether to call the migration method, avoiding the extra .exists() query too).

Suggested change
if journal.publisher_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_publisher_history_to_raw()
migrated_publishers += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} publisher history records "
f"for journal {journal.id}"
)
# Check and migrate owner_history records with institution != None
if journal.owner_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_owner_history_to_raw()
migrated_owners += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} owner history records "
f"for journal {journal.id}"
)
# Check and migrate sponsor_history records with institution != None
if journal.sponsor_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_sponsor_history_to_raw()
migrated_sponsors += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} sponsor history records "
f"for journal {journal.id}"
)
# Check and migrate copyright_holder_history records with institution != None
if journal.copyright_holder_history.filter(institution__isnull=False).exists():
migrated_items = journal.migrate_copyright_holder_history_to_raw()
migrated_copyright_holders += len(migrated_items)
logger.info(
f"Migrated {len(migrated_items)} copyright holder history records "
publisher_to_migrate = journal.publisher_history.filter(
institution__isnull=False
).count()
if publisher_to_migrate:
journal.migrate_publisher_history_to_raw()
migrated_publishers += publisher_to_migrate
logger.info(
f"Migrated {publisher_to_migrate} publisher history records "
f"for journal {journal.id}"
)
# Check and migrate owner_history records with institution != None
owner_to_migrate = journal.owner_history.filter(
institution__isnull=False
).count()
if owner_to_migrate:
journal.migrate_owner_history_to_raw()
migrated_owners += owner_to_migrate
logger.info(
f"Migrated {owner_to_migrate} owner history records "
f"for journal {journal.id}"
)
# Check and migrate sponsor_history records with institution != None
sponsor_to_migrate = journal.sponsor_history.filter(
institution__isnull=False
).count()
if sponsor_to_migrate:
journal.migrate_sponsor_history_to_raw()
migrated_sponsors += sponsor_to_migrate
logger.info(
f"Migrated {sponsor_to_migrate} sponsor history records "
f"for journal {journal.id}"
)
# Check and migrate copyright_holder_history records with institution != None
copyright_holder_to_migrate = journal.copyright_holder_history.filter(
institution__isnull=False
).count()
if copyright_holder_to_migrate:
journal.migrate_copyright_holder_history_to_raw()
migrated_copyright_holders += copyright_holder_to_migrate
logger.info(
f"Migrated {copyright_holder_to_migrate} copyright holder history records "

Copilot uses AI. Check for mistakes.
- migrated_copyright_holders: Total copyright holder history records migrated
- error_count: Number of errors encountered
"""
user = _get_user(self.request, username=username, user_id=user_id)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable user is not used.

Suggested change
user = _get_user(self.request, username=username, user_id=user_id)
_get_user(self.request, username=username, user_id=user_id)

Copilot uses AI. Check for mistakes.
@robertatakenaka robertatakenaka merged commit ac00698 into main Feb 7, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crie uma tarefa que execute para uma dada lista de coleções a migração de dados de Institution para RawOrganization

2 participants