Skip to content

dejanazul/caprae_capital_interview_pre-work

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Company Intelligence Platform

A Streamlit-based platform for collecting, exploring, and analyzing company data from Yellow Pages, enriched with website content extraction and AI-powered chat assistant.

Features

  • Scrape company listings from Yellow Pages Indonesia
  • Extract detailed company info and website content using NeuScraper
  • Save and manage datasets
  • Explore company data interactively
  • Chat with a Google Gemini-powered AI assistant about any company

Setup Instructions

1. Clone the Repository

git clone https://github.com/dejanazul/caprae_capital_interview_pre-work
cd caprae_capital_interview_pre-work

2. Install Python Dependencies

It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Set Up Environment Variables and Download Checkpoint

Create a [.env] file in the root directory and add your Google Gemini API key:

GEMINI_API_KEY=your_gemini_api_key_here

Download checkpoint for Neural Scraper

git lfs install
git clone https://huggingface.co/Vincero/neural_scrapper_fixed

4. Start the NeuScraper Service

1️⃣ Open the deployment directory

cd NeuScraper/app

2️⃣ Fill in the neural scraper checkpoint path in app

args.model_path = "path/to/your/model/fixed_training_state_checkpoint.tar"

3️⃣ Deploy NeuScraper

uvicorn app:app --reload --host 0.0.0.0 --port 1688

5. Run the Streamlit App

In a new terminal, from the project root:

streamlit run app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors