PiLabs Medical Information Scraper

By Gwen Kiler and Ivan Neto

Follow template in scrapeTemplate.py, replacing as needed. A number of for loops is needed equal to as many pages deep the scraper must iterate through. For the template, it assumes a first index list, then a sub index of Aa Ab Ac and so on. This can be adjusted by removing the subletter loop and adjusting variable names in the deeper for loop accordingly.

Note: This template only works if the given website has an alphabetized index.

How to modify

Step 1: Create your parser.

You can follow the "How to write parser" section to create your parser.

Step 2: Create a WebsiteThread class.

This program works by placing each individual website into a different thread, through inheritance of the Thread class.

Navigate to ~/ClientThreads/ClientThreads.py. It should look like the following:

from threading import Thread

# Scrapers
from scrapers.DrugsComScraper import DrugsComScraper
from scrapers.MayoclinicScraper import MayoclinicScraper

# Clients
from clients.WebsiteClient import WebsiteClient

class MayoClinicThread(Thread):
    def run(self):
        print("[START] MayoClinicClient")
        mayoClinicClient = WebsiteClient(
            name="Mayoclinic",
            base_url="https://www.mayoclinic.org/",
            ext=["", "drugs-supplements", "drug-list?letter=A"],
            verbose=False
        ).run(MayoclinicScraper)
        print("[END] MayoClinicClient")

class DrugsComThread(Thread):
    def run(self):
        print("[START] DrugsComClient")
        drugsComClient = WebsiteClient(
            name="drugs.com",
            base_url="https://www.drugs.com",
            ext=["", "drug_information.html"],
            verbose=False
        ).run(DrugsComScraper)
        print("[START] DrugsComClient")

create your WebsiteThreading class. An example of one is already added by default in ClientThreads.py:

class WikiThread(Thread):
    def run(self):
        # Empty for now
        pass

Populate WebsiteThread::run() by calling the WebsiteClient class with your desired website's specific information. Note that ext(url extensions) could possibly be empty, depending on how you originally defined your parser's parse() function on "How to write Parser". Note: Ensure that etx has an empty string element at the beginning.

Your code will look like the following:

class WikiThread(Thread):
    def run(self):
        print("[START] WikiClient")
        drugsComClient = WebsiteClient(
            name="Wikipedia",
            base_url="https://en.wikipedia.org/wiki/Medicine",
            ext=[""], # example of empty extensions
            verbose=False
        ).run(WikiScraper)
        print("[START] WikiClient")

Notice the WikiScraper, which is the class you defined in the "How to write parser" section. Now your parser should be almost ready to run.

Step 3: Add your defined class to the THREAD list in main.py.

Before:

from ClientThreads.ClientThreads import *

THREADS = [DrugsComThread]

def main():
    # Runs every thread at the same time.
    for i in range(len(THREADS)):
        t = THREADS[i]()
        t.start()

if __name__ == "__main__":
    main()

After:

from ClientThreads.ClientThreads import *

THREADS = [DrugsComThread, WikiThread]

def main():
    # Runs every thread at the same time.
    for i in range(len(THREADS)):
        t = THREADS[i]()
        t.start()

if __name__ == "__main__":
    main()

How to run

Creating the parser and the modularization is the complicated part. To run we simply execute main.py:

(venv) PS C:\documents\project> python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
ClientThreads		ClientThreads
clients		clients
old		old
scrapers		scrapers
tutorialFiles		tutorialFiles
.gitignore		.gitignore
README.md		README.md
WrapperTutorial(TODO).md		WrapperTutorial(TODO).md
drugsComTest.py		drugsComTest.py
main.py		main.py
practiceScraper.py		practiceScraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PiLabs Medical Information Scraper

By Gwen Kiler and Ivan Neto

Table of contents

Introduction

Setup

How to write parser

How to modify

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PiLabs Medical Information Scraper

By Gwen Kiler and Ivan Neto

Table of contents

Introduction

Setup

How to write parser

How to modify

How to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages