Congressional Data Scrape: 98th through 117th Congresses
This project scrapes data from the annual Resume of Congressional Activity, which are avaiable in PDF format.
Source
Description
Link
US Senate Web Site
Resumes of Congressional Activity, Session Dates
US Senate Web Site
Data was scraped using tabula, formatted in Excel using VBA, and tidied in Jupyter Lab using Python.
Tool / Library
Version
Adobe Acrobat Pro
2024.001.20629
JupyterLab
4.1.2
Microsoft Office 365, Excel
2403
Microsft Visual Basic for Applications
7.1
Python
3.12.2
tabula
1.2.1
Name
Description
data
Folder containing original data files and scrubbed output
code
Folder containing Jupyter notebooks and VBA exports
documentation
Folder containing test results and data integrity issues
Data Scrape and Validation Presenatation
Power Point recap of project and findings, saved as PDF
Asset
License / Use Policy
Original Code
MIT License
Congressional Activity
Federal Open Data Policy