Company Intelligence Platform

A Streamlit-based platform for collecting, exploring, and analyzing company data from Yellow Pages, enriched with website content extraction and AI-powered chat assistant.

Features

Scrape company listings from Yellow Pages Indonesia
Extract detailed company info and website content using NeuScraper
Save and manage datasets
Explore company data interactively
Chat with a Google Gemini-powered AI assistant about any company

Setup Instructions

1. Clone the Repository

git clone https://github.com/dejanazul/caprae_capital_interview_pre-work
cd caprae_capital_interview_pre-work

2. Install Python Dependencies

It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Set Up Environment Variables and Download Checkpoint

Create a [.env] file in the root directory and add your Google Gemini API key:

GEMINI_API_KEY=your_gemini_api_key_here

Download checkpoint for Neural Scraper

git lfs install
git clone https://huggingface.co/Vincero/neural_scrapper_fixed

4. Start the NeuScraper Service

1️⃣ Open the deployment directory

cd NeuScraper/app

2️⃣ Fill in the neural scraper checkpoint path in app

args.model_path = "path/to/your/model/fixed_training_state_checkpoint.tar"

3️⃣ Deploy NeuScraper

uvicorn app:app --reload --host 0.0.0.0 --port 1688

5. Run the Streamlit App

In a new terminal, from the project root:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NeuScraper		NeuScraper
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Company Intelligence Platform

Features

Setup Instructions

1. Clone the Repository

2. Install Python Dependencies

3. Set Up Environment Variables and Download Checkpoint

4. Start the NeuScraper Service

5. Run the Streamlit App

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Company Intelligence Platform

Features

Setup Instructions

1. Clone the Repository

2. Install Python Dependencies

3. Set Up Environment Variables and Download Checkpoint

4. Start the NeuScraper Service

5. Run the Streamlit App

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages