OCR web (Software Engineering Course Project)

This project is an OCR (Optical Character Recognition) web interface for images/PDFs. The idea of this project is to study technologies like Python, Django, Tesseract(OCR), Continuous Integration, etc...

How to install and Run

Firstly setup a Virtual Environment by following the below steps:

Open Windows Powershell with Run as Administrator, and enter command Set-ExecutionPolicy RemoteSigned.
Then install virtualenv, by using command pip install virtualenv.
Go to the folder, where you want to create the folder for your website. (Let it be named as 'VirtualEnv' (You can choose any name!)).
Open Windows Powershell in that folder and run command virtualenv <folder_name> (Let <folder_name> be 'venv_folder').
Go inside the new folder created (venv_folder) by running command cd <folder_name>.
Then run command ./Scripts/activate. This will allow you to enter a functional virtual environment.
Then download the zip file of the code, and extract the zip inside this folder (<folder_name>, here 'venv_folder').

Then run the below commands:

pip install -r requirements.txt

python manage.py migrate

python manage.py runserver

So you can access in the local URL: localhost:8000

Inside the requirements.txt there is a package called pytesseract. It´s the wrapper to communicate with the Tesseract library (C/C++ code). So, the next step is to install the Tesseract itself.

For this, please follow the below instructions for your SO:

If an additional language is required, it is necessary to download it from here and move it to $TESSERACT_PATH/tessdata/

How to use

After you run the above mentioned commands (till python manage.py runserver), go to the local host.
There will be a file upload section in the centre of the screen. You can upload the file by clicking on the 'Browse' button, or by clicking anywhere on the input-file bar.
Once you click as mentioned, it will open a dialog box to select the file to be uploaded, from your local machine.
Select the file (acceptable file extensions are: .jpeg,.jpg,.png,.pdf; if uploaded file is of other extension, then it will show an alert), and click on 'Submit'.
You will be presented with 2 links (one for .docx output file, and other for .pdf output file) to download the output file.
The website is ready to use again, just after an output is given to user in form of those links.

Libraries Used

Django==4.0.4
django-widget-tweaks==1.4.12
Pillow==9.1.0
pytesseract==0.3.9
python-docx==0.8.11
fpdf==1.7.2
PyMuPDF==1.19.6
BootStrap
JQuery

Screen Shots

Home Page

Image Zoom

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.circleci		.circleci
OCR_web		OCR_web
project_assets		project_assets
static		static
templates		templates
uploader		uploader
.gitignore		.gitignore
B20CS094_Software_Engineering_Final_Project (1).zip		B20CS094_Software_Engineering_Final_Project (1).zip
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt
token.json		token.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR web (Software Engineering Course Project)

How to install and Run

How to use

Libraries Used

Screen Shots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCR web (Software Engineering Course Project)

How to install and Run

How to use

Libraries Used

Screen Shots

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages