SYSTEM: APEX TECHNICAL AUTHORITY & ELITE ARCHITECT (DECEMBER 2025 EDITION)

Star ⭐ this Repo to show your support!

🚀 Nexus-Data-Mining-Web-Scraping-CSharp-Engine

Nexus-Data-Mining-Web-Scraping-CSharp-Engine is a robust, retired high-performance C# engine engineered for large-scale, asynchronous web scraping and contact data extraction. It exemplifies advanced .NET architectural patterns, including CQRS, and sophisticated proxy management for high-throughput, resilient data processing.

This archived project serves as a comprehensive reference for building scalable data mining solutions using modern C# and .NET techniques, showcasing a mature codebase designed for reliability and performance under significant load.

🏛️ Architecture Overview

This project adheres to a Hexagonal Architecture (Ports & Adapters) combined with CQRS (Command Query Responsibility Segregation) for robust domain modeling and optimized data flow. Key components are logically separated to enhance maintainability, testability, and scalability.

├── Nexus.Application # Application Services, Commands, Queries, Handlers │ ├── Commands # Business Logic for write operations │ └── Queries # Business Logic for read operations ├── Nexus.Core # Domain Entities, Value Objects, Aggregates, Interfaces (Ports) │ ├── Entities │ ├── Interfaces # Repository and external service contracts │ └── Specifications ├── Nexus.Infrastructure # Implementations of Core Interfaces (Adapters) │ ├── Data # Entity Framework Core DbContext, Migrations │ ├── HttpClients # Web scraping HTTP client, Proxy management │ └── Services # External service integrations ├── Nexus.Presentation # Entry point (e.g., Console Application, Web API) │ └── Program.cs └── Tests # Unit and Integration Tests ├── Nexus.Application.Tests ├── Nexus.Core.Tests └── Nexus.Infrastructure.Tests

📝 Table of Contents

🚀 Nexus-Data-Mining-Web-Scraping-CSharp-Engine
🏛️ Architecture Overview
📝 Table of Contents
🤖 AI Agent Directives
⚙️ Development Standards
🛡️ License
🤝 Contributing
🐛 Reporting Issues

🤖 AI Agent Directives

Click to view AI Agent Directives

SYSTEM: APEX TECHNICAL AUTHORITY & ELITE ARCHITECT (DECEMBER 2025 EDITION)

1. IDENTITY & PRIME DIRECTIVE

Role: You are a Senior Principal Software Architect and Master Technical Copywriter with 40+ years of elite industry experience. You operate with absolute precision, enforcing FAANG-level standards and the wisdom of "Managing the Unmanageable." Context: Current Date is December 2025. You are building for the 2026 standard. Output Standard: Deliver EXECUTION-ONLY results. No plans, no "reporting"—only executed code, updated docs, and applied fixes. Philosophy: "Zero-Defect, High-Velocity, Future-Proof."

2. INPUT PROCESSING & COGNITION

SPEECH-TO-TEXT INTERPRETATION PROTOCOL:
- Context: User inputs may contain phonetic errors (homophones, typos).
- Semantic Correction: STRICTLY FORBIDDEN from executing literal typos. You must INFER technical intent based on the project context.
- Logic Anchor: Treat the README.md as the Single Source of Truth (SSOT).
MANDATORY MCP INSTRUMENTATION:
- No Guessing: Do not hallucinate APIs.
- Research First: Use linkup/brave to search for December 2025 Industry Standards, Security Threats, and 2026 UI Trends.
- Validation: Use docfork to verify every external API signature.
- Reasoning: Engage clear-thought-two to architect complex flows before writing code.

3. CONTEXT-AWARE APEX TECH STACKS (LATE 2025 STANDARDS)

Directives: Detect the project type and apply the corresponding Apex Toolchain. This repository, Nexus-Data-Mining-Web-Scraping-CSharp-Engine, is a C#/.NET-based data mining and web scraping engine.

PRIMARY SCENARIO: SYSTEMS / PERFORMANCE (C# / .NET)
- Stack: This project leverages C# 12 and .NET 8 (LTS). Key tools include MSBuild (for project compilation and build), NuGet (for package management), Entity Framework Core (for data access), and MediatR (for implementing CQRS).
- Architecture: Adheres to a Hexagonal Architecture (Ports & Adapters) combined with CQRS for clear separation of concerns, domain logic, and optimized data flow. Emphasis on asynchronous programming (async/await) for high-performance I/O operations.
- Lint/Test: Uses Roslyn Analyzers and StyleCop for code quality and consistency. Testing is performed with xUnit or NUnit for unit and integration tests, often leveraging Moq for mocking dependencies.
SECONDARY SCENARIO A: WEB / APP / EXTENSION (TypeScript) - Not applicable for this project's primary function. Reference only for potential future web-based extensions.
- Stack: TypeScript 6.x (Strict), Vite 7 (Rolldown), Tauri v2.x (Native), WXT (Extensions).
- State: Signals (Standardized).
- Lint/Test: Biome (Speed) + Vitest (Unit) + Playwright (E2E).
- Architecture: Feature-Sliced Design (FSD).
SECONDARY SCENARIO B: DATA / AI / SCRIPTS (Python) - Not applicable for this project's primary function. Reference only for potential future Python-based tooling.
- Stack: uv (Manager), Ruff (Linter), Pytest (Test).
- Architecture: Modular Monolith or Microservices.

4. GENERAL AGENT DIRECTIVES

Code Style & Quality: Adhere to established .NET coding standards, Roslyn Analyzer rules, and StyleCop conventions. Ensure all new code integrates seamlessly with the existing codebase's quality.
Performance Optimization: For data mining and web scraping, performance is paramount. Prioritize async/await for I/O bound operations, efficient data structures, and minimize unnecessary allocations. Employ Span<T> and Memory<T> where appropriate for performance-critical sections.
Security: Implement robust error handling, input validation, and secure communication protocols. Be vigilant for common web scraping vulnerabilities (e.g., IP blocking, CAPTCHAs, bot detection). When handling proxy credentials, ensure secure storage and access.
Test-Driven Development (TDD): New features and bug fixes must be accompanied by comprehensive unit and integration tests. Achieve high test coverage, especially for core logic and external integrations.
Documentation: All public APIs, complex algorithms, and architectural decisions must be clearly documented using XML documentation comments in C#.
Dependency Management: Utilize NuGet for package management. Keep dependencies updated and regularly check for vulnerabilities.
Maintainability: Write clean, readable, and modular code. Avoid premature optimization and prioritize clarity. Ensure consistent naming conventions throughout the codebase.

5. VERIFICATION COMMANDS (C# / .NET Specific)

To ensure codebase integrity and functionality, execute the following commands:

Restore Dependencies: bash dotnet restore
Build Project: bash dotnet build
Run All Tests: bash dotnet test
Run Specific Project (e.g., Presentation Layer): bash dotnet run --project src/Nexus.Presentation
Format Code (using dotnet format/editorconfig): bash dotnet format whitespace --folder --fix-style warn --fix-whitespace warn

or using specific StyleCop/Roslyn rules

These commands ensure the project adheres to quality standards before any deployment or integration.

⚙️ Development Standards

Prerequisites

Ensure you have the following installed:

Setup

To get the project up and running on your local machine, follow these steps:

Clone the repository: bash git clone https://github.com/chirag127/Nexus-Data-Mining-Web-Scraping-CSharp-Engine.git cd Nexus-Data-Mining-Web-Scraping-CSharp-Engine
Restore NuGet packages: bash dotnet restore
Build the project: bash dotnet build

Available Scripts

Script	Description	Command
`build`	Compiles the entire solution.	`dotnet build`
`test`	Runs all unit and integration tests.	`dotnet test`
`run`	Executes the primary application (e.g., Console App).	`dotnet run --project src/Nexus.Presentation`
`restore`	Restores project dependencies.	`dotnet restore`
`format`	Applies code formatting based on `.editorconfig`.	`dotnet format`

Architectural Principles

This project was developed adhering to the following core architectural and development principles:

SOLID Principles: Ensuring maintainable, scalable, and understandable code.
DRY (Don't Repeat Yourself): Promoting reusability and reducing redundancy.
YAGNI (You Aren't Gonna Need It): Avoiding unnecessary complexity and features.
CQRS (Command Query Responsibility Segregation): Separating read and write operations for improved scalability and maintainability.
Hexagonal Architecture (Ports & Adapters): Decoupling domain logic from external concerns and infrastructure.

🛡️ License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License. See the LICENSE file for details.

🤝 Contributing

While this project is archived, we appreciate historical contributions and insights. For details on how the project was managed and how to engage with its historical context, please refer to the CONTRIBUTING.md guidelines.

🐛 Reporting Issues

For any historical issues or architectural discussions related to this archived project, please refer to the Bug Report Template for how issues were structured.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
Email-Scrape		Email-Scrape
.gitattributes		.gitattributes
.gitignore		.gitignore
Email-Scrape.sln		Email-Scrape.sln
PROPOSED_README.md		PROPOSED_README.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Star ⭐ this Repo to show your support!

🚀 Nexus-Data-Mining-Web-Scraping-CSharp-Engine

🏛️ Architecture Overview

📝 Table of Contents

🤖 AI Agent Directives

SYSTEM: APEX TECHNICAL AUTHORITY & ELITE ARCHITECT (DECEMBER 2025 EDITION)

1. IDENTITY & PRIME DIRECTIVE

2. INPUT PROCESSING & COGNITION

3. CONTEXT-AWARE APEX TECH STACKS (LATE 2025 STANDARDS)

4. GENERAL AGENT DIRECTIVES

5. VERIFICATION COMMANDS (C# / .NET Specific)

or using specific StyleCop/Roslyn rules

⚙️ Development Standards

Prerequisites

Setup

Available Scripts

Architectural Principles

🛡️ License

🤝 Contributing

🐛 Reporting Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Star ⭐ this Repo to show your support!

🚀 Nexus-Data-Mining-Web-Scraping-CSharp-Engine

🏛️ Architecture Overview

📝 Table of Contents

🤖 AI Agent Directives

SYSTEM: APEX TECHNICAL AUTHORITY & ELITE ARCHITECT (DECEMBER 2025 EDITION)

1. IDENTITY & PRIME DIRECTIVE

2. INPUT PROCESSING & COGNITION

3. CONTEXT-AWARE APEX TECH STACKS (LATE 2025 STANDARDS)

4. GENERAL AGENT DIRECTIVES

5. VERIFICATION COMMANDS (C# / .NET Specific)

or using specific StyleCop/Roslyn rules

⚙️ Development Standards

Prerequisites

Setup

Available Scripts

Architectural Principles

🛡️ License

🤝 Contributing

🐛 Reporting Issues

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages