A powerful Next.js application that enables seamless content extraction and AI processing from GitHub repositories and Google Drive files. EmbedHub provides intuitive dashboards for browsing, selecting, and scraping content, then generates embeddings for advanced content analysis.
- GitHub Integration: Browse repositories, select branches, and extract file contents 1
- Google Drive Integration: Navigate folders, select files, and scrape document content 2
- GitHub Dashboard: Repository browsing with file type filtering and batch processing 3
- Google Drive Dashboard: Folder navigation with MIME type detection and file selection 4
- Content scraping with intelligent file type handling
- Embeddings generation using Pinecone vector database 5
- LangChain integration for advanced text processing 6
- OAuth integration with NextAuth.js 7
- Token-based API access for external services
- Frontend: Next.js 13, React 18, TypeScript
- Styling: Tailwind CSS with custom animations
- Authentication: NextAuth.js with OAuth providers
- AI/ML: LangChain, Pinecone Vector Database
- File Processing: JSZip for archive creation 8
- UI Components: Radix UI primitives with custom styling
GET /api/drive- Fetch Google Drive file listings 9POST /api/scrape-google-drive-file- Extract content from Google Drive files 10POST /api/scrape-github- Scrape GitHub repository content 11
- Google Docs (exported as plain text) 12
- Google Sheets (exported as CSV) 13
- Microsoft Word documents
- Binary files with text extraction
- All text-based file formats
- Intelligent filtering of binary files 14
- Branch-specific content extraction
- Node.js 18+
- Google Drive API credentials
- GitHub API token
- Pinecone API key
- Clone the repository:
git clone https://github.com/Namit1867/EmbedHub.git
cd EmbedHub- Install dependencies:
npm install- Set up environment variables:
cp .env.example .env.local
# Configure your API keys and OAuth credentials- Run the development server:
npm run dev- Open http://localhost:3000 in your browser.
- Navigate to
/github-dashboard - Select a repository from your accessible repos
- Choose a branch for content extraction
- Filter files by extension using the checkbox filters 15
- Scrape repository content and generate embeddings
- Navigate to
/google-drive-dashboard - Browse folders using the navigation interface
- Select compatible files (docs, sheets, etc.) 16
- Scrape selected content and process with AI
EmbedHub follows a modular architecture with clear separation between frontend dashboards, backend APIs, and external service integrations. The application uses OAuth for secure authentication and provides real-time feedback during content processing operations.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
## Notes
The codebase shows a well-structured Next.js application with comprehensive file type handling, OAuth authentication, and AI integration. The application name in `package.json` is currently "github-google-drive-integration" [17](#0-16) but the repository is named EmbedHub, suggesting the project may have evolved from its original scope. The README reflects the current functionality based on the actual implementation rather than the package name.
Wiki pages you might want to explore:
- [User Interfaces (Namit1867/EmbedHub)](/wiki/Namit1867/EmbedHub#2)
- [Content Scraping APIs (Namit1867/EmbedHub)](/wiki/Namit1867/EmbedHub#3.1)