AI-Powered Document Understanding and Processing Pipeline

Note that you need docker installed to run the setup as this is a containerized app.

This project uses a python 3.10-slim docker image.

Installing and Running

First clone the repository. Then read the below text for setup instructions.

Docker containers doesn't have GUI so we need to setup X Server in windows and allow docker to connect to it to diplay GUI. For this I'm using VcXsrv Windows X Server. You can install it buy running command:

choco install vcxsrv

After that go to docker-compose file and set YOUR_IP to local IP address. Which cna be found by typing ipconfig in cmd and selecting IPv4 address.

Then run Xlaunch from the start menu and disable access control to allow docker container to connnect. You can follow this tutorial to help you setup.

To run the app type below command in powershell/terminal of root directory.

docker-compose up --build

Put all the PDF files that you want to run in the folder named test.

Result

After browsing the PDF, you can get all the text from out_text file. And asking the question in chat, you can get answer for specifics.

You can also see the accuracy in terminal output you started the docker container in, along with word number and page number.

Libraries Used

tesseract-ocr
pytesseract
python3-tk (tkinter)
pdf2image
LayoutXLM
DocQuery

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Document Understanding and Processing Pipeline

Installing and Running

Result

Libraries Used

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Document Understanding and Processing Pipeline

Installing and Running

Result

Libraries Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages