- The master code is implemented at https://citations.toolforge.org/, and is intended for public use.
- When needed, the development branch is intended for major restructuring and testing.
This is some basic documentation about what this bot is and how some of the parts connect.
This is more properly a bot-gadget-tool combination. The parts are:
- Citation Bot, found in
index.html(web frontend) andprocess_page.php(information is POSTed to this and it does the citation expansion; backend). This automatically posts a new page revision with expanded citations and thus requires a bot account. All activity takes place on Tool Labs. Single pages can be GETed. - Citation expander (https://en.wikipedia.org/wiki/MediaWiki:Gadget-citations.js) +
gadgetapi.php. This comprises an Ajax front-end in the on-wiki gadget and a PHP backend API. generate_template.phpcreates the wiki reference given an identifier (for example: https://citations.toolforge.org/generate_template.php?doi=10.1109/SCAM.2013.6648183)
Bugs and requested changes are listed here: https://en.wikipedia.org/wiki/User_talk:Citation_bot.
The Citation Bot has two main user-facing interfaces with different performance characteristics:
- Default mode: Thorough mode (slow mode enabled via checkbox, checked by default)
- Slow mode operations: Searches for new bibcodes and expands URLs via external APIs
- Use case: Users who want comprehensive citation expansion and can wait longer
- Timeout limit: Typically completes for all pages, even if the web interface times out
- Default mode: Fast mode only (slow mode is always disabled)
- Operations performed:
- ✓ Expands PMIDs, DOIs, arXiv, JSTOR IDs to full citations
- ✓ Adds missing citation parameters (authors, title, journal, date, pages, etc.)
- ✓ Cleans up citation formatting and fixes template types
- Operations skipped:
- ✗ Searching for new bibcodes
- ✗ Expanding URLs via Zotero
- Why fast mode only: The gadget is designed for quick, in-browser citation expansion. Slow mode operations (bibcode searches and URL expansions) can exceed the web browser's connection timeout limit, causing the gadget to fail.
- Use case: Quick citation cleanup and expansion while editing Wikipedia articles
Note: Both interfaces perform core citation expansion effectively. The gadget sacrifices some thoroughness for speed and reliability to provide a better in-browser editing experience.
Basic structure of a Citation bot script:
- the
src/env.phpthat defines configuration constants (you can create it fromsrc/env.php.example) - the
src/includes/setup.phpthat sets up the functions needed (usually, you don't need to modify this file) - the Page functions to fetch/expand/post the page's text
A quick tour of the main files:
Entry points (under src/):
src/index.html: web frontendsrc/process_page.php: backend; POSTed page information triggers citation expansionsrc/gadgetapi.php: PHP backend API for the on-wiki Citation Expander gadgetsrc/generate_template.php: creates a wiki reference given an identifier
Includes (under src/includes/):
src/includes/constants.php: constants defined; further constants are split into files undersrc/includes/constants/src/includes/WikipediaBot.php: functions to facilitate HTTP access to the Wikipedia API.src/includes/NameTools.php: defines name functionssrc/includes/MathTools.php: converts MathML notation to LaTeX for Wikipedia citationssrc/includes/setup.php: sets up needed functions, requires most of the other files listed heresrc/includes/miscTools.php: a variety of functionssrc/includes/URLtools.php: normalize urls and extract information from urlssrc/includes/TextTools.php: string manipulation functions including converting to wikisrc/includes/WebTools.php: things unique to the web interfacesrc/includes/bot_curl.php: curl wrapper with bot-appropriate defaults and timeoutssrc/includes/user_messages.php: functions for reporting bot activity to userssrc/includes/doiTools.php: DOI-specific validation and normalization functionssrc/includes/big_jobs.php: handling for large batch jobssrc/includes/api/API*.php: sets up needed functions for expanding pmid/doi/url/etcsrc/includes/Page.php: Represents an individual page to expand citations on. Key methods arePage::get_text_from(),Page::expand_text(), andPage::write().src/includes/Template.php: most of the actual expansion happens here.Template::add_if_new()is generally (but not always) used to add parameters to the updated template;Template::tidy()cleans up the template, but may add parameters as well and have side effects.src/includes/WikiThings.php: Handles comments, nowiki, etc. tagssrc/includes/Parameter.php: contains information about template parameter names, values, and metadata, and methods to parse template parameters.
- Constants and definitions should be provided in
constants.php. - A good balance between splitting funcionality into single files and avoiding too many files should be maintained.
- The code is generally NOT written densely.
- Beware assignments in conditionals, one-line
if/foreach/elsestatements, and action taking place through method calls that take place in assignments or equality checks. - Also beware the difference between
else ifandelseif.
The bot requires PHP >= 8.4.
To run the bot from a new environment, you will need to create an src/env.php file (if one doesn't already exist) that sets the needed authentication tokens as environment variables. To do this, you can rename src/env.php.example to src/env.php, set the variables in the file, and then make sure the file is not world readable or writable:
chmod go-rwx env.php
To run the bot as a webservice from WM Toolforge:
become citations[-dev]
webservice stop
webservice --backend=kubernetes php8.4 start
Or for testing in the shell:
webservice --backend=kubernetes php8.4 shell
Before entering the k8s shell, it may be necessary to install phpunit (as wget is not available in the k8s shell).
In order to run on the command line one needs OAuth tokens as documented in src/env.php.example (there are additional API keys that are needed to run some functions). Change BOT_USER_AGENT in src/includes/setup.php to something else. Use composer to composer require mediawiki/oauthclient:2.3.0. Then the bot can be run such as:
/usr/bin/php ./src/process_page.php "Covid Watch|Water|COVID-19_apps" --slow --savetofiles
The command line tool will also accept page_list.txt and page_list2.txt as page names. In those cases the bot expects a file of such name to contain a single line of | separated page names. This code requires PHP 8.4 with optional packages included: php84-mbstring php84-sockets php84-opcache php84-openssl php84-xmlrpc php84-gettext php84-curl php84-intl php84-iconv
Command line parameters:
--slow- retrieve bibcodes and expand urls--savetofiles- save processed pages as files (with .md extension) instead of submitting them to Wikipedia
One way to set up a localhost that runs in your web browser is to use Docker. Install Docker Desktop on your computer, open a shell, cd to the root directory of this repo, type docker compose up -d, then visit http://localhost:8081.
To install Composer dependencies, start the container as noted above, then type docker exec -it citation-bot-php-1 composer update.
To do most bot tasks, you'll need to create an env.php file and populate it with API keys. See src/env.php.example in the src directory.
If the Citation Bot is currently blocked (i.e. Citation_bot is not a valid user on the target wiki), it will normally halt and display an error message. For developers who need to test or debug the bot's behaviour during a block without writing to Wikipedia, the ignore_block URL parameter can be passed in the request.
When ignore_block is present, the bot displays a warning — "Running bot anyway, but it will fail to write." — and continues processing. This is useful for inspecting what the bot would do without risking any edits to Wikipedia.
Example URL:
https://citations.toolforge.org/process_page.php?page=Example&ignore_block=1
Secondly, even when blocked, a user can run the bot on their own User: pages, but the bot will edit as the user.
Note: In this mode all citation expansion runs normally, but the bot will fail when it attempts to write the results back to Wikipedia. Use this only for debugging purposes.
Where issues require consensus on Wikipedia policy, they are discussed on the Citation Bot Talk Page. Most other issues should also be discussed there. The issues on GitHub are primarily for the developers internal use.
