This directory contains scripts for the data pipeline.
To install dependencies:
cd python
uv syncWhen a new sitting is added to the Hansard, we need to
- Ingest that sitting's transcript in our desired format using the Hansard API
- Generate summaries for the questions, bills, and motions in that sitting
- Update the summaries for the MPs' contributions based on any new involvements from this sitting
For example, if the sitting on 27 February 2026 has just been added, we would run
uv run batch_process_sqlite.py 27-02-2026
uv run generate_summaries_sqlite.py --sittings 27-02-2026
uv run generate_summaries_sqlite.py --membersThese scripts are described in more detail below.
Ingests parliament sitting data for a given date range (inclusive of both start and end) into the SQLite database at data/parliament.db.
uv run batch_process_sqlite.py START_DATE [END_DATE]# Single date
uv run batch_process_sqlite.py 14-01-2026
# Range of dates
uv run batch_process_sqlite.py 12-01-2026 14-01-2026Generates AI summaries for sitting sections and MP profiles using Gemini. The --only-blank flag generates summaries only for entries that don't have one yet.
# For sittings
uv run generate_summaries_sqlite.py --sittings START_DATE [END_DATE] [--only-blank]
# For MPs
uv run generate_summaries_sqlite.py --members [--only-blank]# Range of dates
uv run generate_summaries_sqlite.py --sittings 12-01-2026 14-01-2026
# MPs (based on last 20 contributions)
uv run generate_summaries_sqlite.py --members
# Only fill in missing summaries
uv run generate_summaries_sqlite.py --sittings 12-01-2026 --only-blank| File | Description |
|---|---|
db_sqlite.py |
Database connection and CRUD operations for SQLite |
hansard_api.py |
Client for fetching data from the Hansard API |
parliament_sitting.py |
Parsing and structuring of sitting data |
prompts.py |
Prompt templates for AI summary generation |
util.py |
Shared utility functions |
Below are the manual changes that were made post-ingestion.
- Before 8 July 2024, the Ministry of Digital Development and Information was known as the Ministry for Communications and Information.
- Before 25 July 2020, the Ministry of Sustainability and the Environment was known as the Ministry for the Environment and Water Resources.
Sections before the changes were re-categorised under the new ministry names after ingestion using an adhoc script.
Note that this is just a list of errors that we have found so far, and it is very possible that there might be more that we are not aware of. Feel free to contact us or raise an issue if you identify more that need correcting!
-
In the sitting on 14 October 2025, Senior Minister of State for Finance Jeffrey Siow is mistakenly referred to as "Second Minister for Defence" (an appointment that did not exist at the time) by the Speaker of Parliament. As such, the Finance (Income Taxes) and Corporate and Accounting Laws (Amendment) Bill were categorised under MINDEF by the ingestion script, and needed to be manually re-categorised under MOF in the SQLite database.
-
In the sitting on 3 March 2020, the section on "Committee of Supply Reporting Progress" is mistakenly classified as a bill introduction (
sectionType: "BI") instead of an oral statement (sectionType: "OS"). This was fixed by manually updating the section type in the SQLite database. -
The following bills have inconsistencies between readings in the Hansard API. The bill's correct name was taken from the AGC's website and the relevant section in the database was manually updated.
Bill First reading Second reading Issue Anti-Money Laundering and Other Matters Bill 2 July 2024 6 August 2024 First reading title: "Anti-money Laundering and Other Matters Bill" Society of Saint Maur Incorporation (Amendment) Bill 3 August 2023 6 February 2024 Second reading title: "The Society of Saint Maur Incorporation (Amendment) Bill" Post-appeal Applications in Capital Cases Bill 7 November 2022 29 November 2022 Second reading title: "Post-Appeal Applications in Capital Cases Bill" Second Supplementary Supply (FY 2021) Bill 28 February 2022 11 March 2022 Second reading title: "Second Supplementary Supply (2021) Bill" (missing FY) Economic Expansion Incentives (Relief from Income Tax) (Amendment) Bill 10 January 2022 14 February 2022 Second reading title: "Economic Expansion Incentives (Relief From Income Tax) (Amendment) Bill" Statute Law Reform Bill 3 November 2020 5 January 2021 First reading title: "Statute Law Reform Bill" (statute misspelt as statue) Housing and Development (Amendment) Bill 3 September 2020 6 October 2020 Second reading title: "Housing and Development Board (Amendment) Bill" Supplementary Supply (FY 2019) Bill 26 February 2020 6 March 2020 Second reading title: "Supplementary Supply (FY2019) Bill" (no space between FY and 2019) Goods and Services Tax (Amendment) Bill 7 October 2019 4 November 2019 First reading title: "Good and Services Tax (Amendment) Bill" (missing s in Goods) Supply Bill 26 February 2019 8 March 2019 First reading title: "Supply BIll" Tobacco (Control of Advertisements and Sale) (Amendment) Bill 14 January 2019 11 February 2019 First reading title: "Tobacco (Control of Advertisements and Sale (Amendment) Bill" (missing bracket after Sale) Supplementary Supply (FY 2016) Bill 28 February 2017 9 March 2017 Second reading title: "Supplementary Supply (FY2016) Bill" (no space between FY and 2016) Income Tax (Amendment No. 3) Bill 10 October 2016 10 November 2016 First reading title: "Income Tax (Amendment) (No 3) Bill", Second reading title: "Income Tax (Amendment No 3) Bill" (missing period) Income Tax (Amendment No. 2) Bill 14 April 2016 9 May 2016 First reading title: "Income Tax (Amendment No 2) Bill" (missing period) Final Supply (FY 2015) Bill 4 April 2016 14 April 2016 Second reading title: "Final Supply (FY2015) Bill" (no space between FY and 2015)