EmbedHub

A powerful Next.js application that enables seamless content extraction and AI processing from GitHub repositories and Google Drive files. EmbedHub provides intuitive dashboards for browsing, selecting, and scraping content, then generates embeddings for advanced content analysis.

Features

🔗 Multi-Platform Integration

GitHub Integration: Browse repositories, select branches, and extract file contents 1
Google Drive Integration: Navigate folders, select files, and scrape document content 2

📊 Interactive Dashboards

GitHub Dashboard: Repository browsing with file type filtering and batch processing 3
Google Drive Dashboard: Folder navigation with MIME type detection and file selection 4

🤖 AI-Powered Processing

Content scraping with intelligent file type handling
Embeddings generation using Pinecone vector database 5
LangChain integration for advanced text processing 6

🔐 Secure Authentication

OAuth integration with NextAuth.js 7
Token-based API access for external services

Tech Stack

Frontend: Next.js 13, React 18, TypeScript
Styling: Tailwind CSS with custom animations
Authentication: NextAuth.js with OAuth providers
AI/ML: LangChain, Pinecone Vector Database
File Processing: JSZip for archive creation 8
UI Components: Radix UI primitives with custom styling

API Endpoints

Content Scraping

GET /api/drive - Fetch Google Drive file listings 9
POST /api/scrape-google-drive-file - Extract content from Google Drive files 10
POST /api/scrape-github - Scrape GitHub repository content 11

File Type Support

Google Drive

Google Docs (exported as plain text) 12
Google Sheets (exported as CSV) 13
Microsoft Word documents
Binary files with text extraction

GitHub

All text-based file formats
Intelligent filtering of binary files 14
Branch-specific content extraction

Getting Started

Prerequisites

Node.js 18+
Google Drive API credentials
GitHub API token
Pinecone API key

Installation

Clone the repository:

git clone https://github.com/Namit1867/EmbedHub.git
cd EmbedHub

Install dependencies:

npm install

Set up environment variables:

cp .env.example .env.local
# Configure your API keys and OAuth credentials

Run the development server:

npm run dev

Open http://localhost:3000 in your browser.

Usage

GitHub Workflow

Navigate to /github-dashboard
Select a repository from your accessible repos
Choose a branch for content extraction
Filter files by extension using the checkbox filters 15
Scrape repository content and generate embeddings

Google Drive Workflow

Navigate to /google-drive-dashboard
Browse folders using the navigation interface
Select compatible files (docs, sheets, etc.) 16
Scrape selected content and process with AI

Architecture

EmbedHub follows a modular architecture with clear separation between frontend dashboards, backend APIs, and external service integrations. The application uses OAuth for secure authentication and provides real-time feedback during content processing operations.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


## Notes

The codebase shows a well-structured Next.js application with comprehensive file type handling, OAuth authentication, and AI integration. The application name in `package.json` is currently "github-google-drive-integration" [17](#0-16)  but the repository is named EmbedHub, suggesting the project may have evolved from its original scope. The README reflects the current functionality based on the actual implementation rather than the package name.



Wiki pages you might want to explore:
- [User Interfaces (Namit1867/EmbedHub)](/wiki/Namit1867/EmbedHub#2)
- [Content Scraping APIs (Namit1867/EmbedHub)](/wiki/Namit1867/EmbedHub#3.1)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.bolt		.bolt
app		app
components		components
hooks		hooks
lib		lib
.DS_Store		.DS_Store
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
components.json		components.json
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmbedHub

Features

🔗 Multi-Platform Integration

📊 Interactive Dashboards

🤖 AI-Powered Processing

🔐 Secure Authentication

Tech Stack

API Endpoints

Content Scraping

File Type Support

Google Drive

GitHub

Getting Started

Prerequisites

Installation

Usage

GitHub Workflow

Google Drive Workflow

Architecture

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Namit1867/EmbedHub

Folders and files

Latest commit

History

Repository files navigation

EmbedHub

Features

🔗 Multi-Platform Integration

📊 Interactive Dashboards

🤖 AI-Powered Processing

🔐 Secure Authentication

Tech Stack

API Endpoints

Content Scraping

File Type Support

Google Drive

GitHub

Getting Started

Prerequisites

Installation

Usage

GitHub Workflow

Google Drive Workflow

Architecture

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages