This script allows you to extract invoice data from a PDF file and convert it into a JSON format. It can be useful when you have multiple invoices in PDF format and you need to organize them into a structured format for further analysis or processing.
Before running the script, ensure you have the following installed:
- Python (version 3 or above)
- pypdf library
- Regular Expression library (re)
- You can install pypdf and re libraries using pip:
pip install -r requirements.txtDownload the Script: Save the provided Python script (invoice_extractor.py) to your local machine.
Open your terminal or command prompt.
Navigate to the directory where the script main.py is saved.
Give the name along with the path of the pdf from where data is to be extracted and also give name and path of ths json file where data has to be organized in tuples.
Run the script using the following command:
python main.pyReview Output: After running the script, it will create a file named data.json in the same directory. This JSON file contains the extracted invoice data in a structured format.