Add BunnyCDN log format support for SciELO Brasil collection#54
Add BunnyCDN log format support for SciELO Brasil collection#54
Conversation
|
@copilot Use o fixture disponível em https://github.com/pitangainnovare/scielo_usage_counter/blob/v0.6.1/tests/fixtures/usage.bunny.log a solução e parte dos testes. Lembre-se de que o repositório https://github.com/pitangainnovare/scielo_usage_counter/blob/v0.6.1 implementou uma solução para um sistema legado. Então, patterns e coisas a eles relacionados podem ser reaproveitadas, caso você julge pertinente. |
- Updated requirements.txt to use scielo_log_validator@0.5.1 - Added PATTERN_BUNNYCDN_LOG_FORMAT to values.py for pipe-delimited logs - Created opac_bunnynet.py translator (delegates to OPAC translator) - Updated log_handler.py to detect and parse bunnynet format - Added Unix timestamp date handling in format_date() - Enhanced match_with_best_pattern() to detect pipe-delimited format - Modified parse_line() to handle bunnynet-specific fields - Created test fixture usage.bunnynet.log with sample logs - Added comprehensive tests in test_opac_bunnynet.py and test_log_handler.py - All new tests passing (6/6) Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
|
@copilot Look at this content formatted as bunnynet lines MISS 200 1755473649 29321 4339610 185.29.10.0 - http://www.scielo.br/j/rbb/a/qvkmfPDpQk4zZfSnWXJHrVQ/?lang=pt SE Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0 5dc9f2b1416a10af31321b8aad30b8f4 SE |
SciELO Brasil apache logs now arrive in BunnyCDN format (pipe-delimited, Unix timestamps) for files dated after 2025-08-16. This adds parsing and translation support for the new format while maintaining backward compatibility.
Changes
Log Format Detection & Parsing
PATTERN_BUNNYCDN_LOG_FORMATtovalues.py- matches 12-field pipe-delimited format:CACHE|STATUS|UNIX_TS|BYTES|ZONE|IP|REFERER|URL|GEO|UA|REQ_ID|GEOlog_handler.pywith auto-detection (counts pipe delimiters) and Unix timestamp parsingURL Translation
BunnynetOPACBridgetranslator intranslator/opac_bunnynet.pyURLTranslatorOPACSitesince URL structure is identical between formatsTesting
tests/fixtures/usage.bunnynet.logwith representative samplestests/translator/test_opac_bunnynet.pyfor translator validationDependencies
scielo_log_validatorfrom 0.4.0 to 0.5.1Example bunnynet log line:
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
docs.bunny.net/home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js(dns block)https://api.github.com/repos/pitangainnovare/scielo_usage_counter/git/trees/impl/add-bunny-format/usr/bin/curl curl -s REDACTED 53 -j ACCEPT(http block)https://api.github.com/repos/scieloorg/scielo_log_validator/contents//usr/bin/curl curl -s REDACTED(http block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.