Skip to content

Add SciELO Books translator for COUNTER R5 metrics#56

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-scie-lo-livros-logs
Draft

Add SciELO Books translator for COUNTER R5 metrics#56
Copilot wants to merge 5 commits intomainfrom
copilot/add-scie-lo-livros-logs

Conversation

Copy link

Copilot AI commented Feb 6, 2026

Extends URL translation pipeline to support SciELO Books platform logs for COUNTER R5 metrics computation. All code uses English naming conventions for methods, variables, and comments.

Implementation

New translator class (translator/books.py)

  • URLTranslatorBooksSite handles book and chapter URL patterns
  • Generates PIDs in format BOOK:{id}/CHAPTER:{id} for hierarchical content
  • Uses DEFAULT_SCIELO_ISSN (0000-0000) since books lack ISSNs

Supported URL patterns:

/b/{book_id}                        → abstract (investigations only)
/c/{book_id}/{chapter_id}           → full_text (requests + investigations)
/pdf/{book_id}[/{chapter_id}]       → full_text
/epub/{book_id}                     → full_text
/download/{book_id}[/{chapter_id}]  → full_text

Legacy formats (/book/, /chapter/) also supported for backward compatibility.

Integration (url_translator.py)

  • Added PATTERNS_BOOKS_SITE with 7 regex patterns
  • Registered in translator identification pipeline alongside Classic, OPAC, Preprints, and Dataverse

Tests

  • 12 translator tests covering ID extraction, PID generation, format detection, content classification
  • 2 COUNTER R5 metrics tests for book landing pages and chapters
  • Sample fixture with 10 Apache log entries

Example Usage

from scielo_usage_counter.url_translator import URLTranslationManager

tm = URLTranslationManager(journals_metadata, articles_metadata)
result = tm.translate('https://books.scielo.org/c/book001/chap01')

# Returns:
# {
#   'pid_generic': 'BOOK:BOOK001/CHAPTER:CHAP01',
#   'content_type': 'full_text',
#   'media_format': 'html',
#   'scielo_issn': '0000-0000',
#   ...
# }
Original prompt

This section details on the original issue you should resolve

<issue_title>Adicionar suporte a logs originários do SciELO Livros</issue_title>
<issue_description>### Descrição da nova funcionalidade
O site SciELO Livros passará a ter acessos computados por meio da abordagem COUNTER R5. Este repositório é uma forma de ler arquivos de logs apache (ou bunnynet, que é outro issue em resolução) e transformá-los em unidades contáveis. Parte da estratégia para converter é usar classes translate, que, neste momento, suporta os logs originários do SciELO Classic, SciELO novo (OPAC), SciELO novo em versão alpha (SciELO Saúde Pública), SciELO Preprints e SciELO Data.

Critérios de aceitação

  • Implementar a classe translate para SciELO Livros
  • Criar testes para a nova classe translate
  • Adotar fixtures (logs do SciELO LivroS) oficiais a serem compartilhados pelo usuário @pitangainnovare
  • Criar testes específicos para SciELO Livros para a calculadora COUNTER, em counter.py

Anexos

  • Módulo principal é o arquivo url_translator.py
  • Em translate há as classes de tradução para outros sites
  • Módulos utilitários devem ser reaproveitados e estão em utils

Referências

N/A</issue_description>

<agent_instructions>Follow coding best practices. Generate tests for all created methods using official fixtures. Add clear, concise English docstrings for every generated function or class. Keep code concise, non-repetitive, and well-organized. Match the style of the existing repository unless it conflicts with best practices.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@pitangainnovare
Copy link
Contributor

@copilot, I will provide fixtures in the format of SciELO Books logs. You will likely need to wait for these resources, but you can start working with the knowledge you already have. Crawling the SciELO Books site and understanding which links should be classified as Item Request or Item Investigation is a good task. The goal is to count accesses for books and book chapters according to COUNTER R5 (Unique Item Requests, Unique Item Investigations, Total Item Requests, and Total Item Investigations). Each URL in the logs should therefore be mapped to a specific Book or Chapter object and, depending on the page, marked as Request or Investigation. See https://cop5.countermetrics.org/en/5.0.3/03-specifications/03-counter-report-common-attributes-and-elements.html for more information.

@pitangainnovare
Copy link
Contributor

@copilot Adote nomes de métodos, variáveis e comentários em idioma inglês.

Copilot AI and others added 3 commits February 6, 2026 01:01
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
…arity

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for logs from SciELO Livros Add SciELO Livros (Books) translator for COUNTER R5 metrics Feb 6, 2026
Copilot AI requested a review from pitangainnovare February 6, 2026 01:13
Copy link
Author

Copilot AI commented Feb 6, 2026

@copilot Adote nomes de métodos, variáveis e comentários em idioma inglês.

Understood. I'll rename all Portuguese terms to English. This includes:

  • Class name: URLTranslatorLivrosSiteURLTranslatorBooksSite
  • Pattern names: REGEX_LIVROS_SITE_*REGEX_BOOKS_SITE_* and PATTERNS_LIVROS_SITEPATTERNS_BOOKS_SITE
  • File name: livros.pybooks.py
  • Test class: TestTranslatorLivrosTestTranslatorBooks
  • Test file: test_livros.pytest_books.py

Will keep "SciELO Livros" only in comments where referring to the actual product name.

@copilot, I will provide fixtures in the format of SciELO Books logs. You will likely need to wait for these resources, but you can start working with the knowledge you already have. Crawling the SciE...

The implementation follows COUNTER R5 specifications for Item Requests and Item Investigations:

Item Investigations (abstract content):

  • Book landing pages: /b/{book_id}, /book/{book_id}

Item Requests (full-text content):

  • Book chapters: /c/{book_id}/{chapter_id}, /chapter/{book_id}/{chapter_id}
  • PDF downloads: /pdf/{book_id}[/{chapter_id}]
  • EPUB downloads: /epub/{book_id}
  • Generic downloads: /download/{book_id}[/{chapter_id}]

Each URL is mapped to a book or chapter PID (e.g., BOOK:BOOK001 or BOOK:BOOK001/CHAPTER:CHAP01). The translator computes all four metrics (Unique/Total Item Requests and Unique/Total Item Investigations) according to COUNTER R5 standards.

Ready to integrate official fixtures when available. Commit: 2fd6f0e

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copilot AI changed the title Add SciELO Livros (Books) translator for COUNTER R5 metrics Add SciELO Books translator for COUNTER R5 metrics Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adicionar suporte a logs originários do SciELO Livros

2 participants