🔬 AI System Testbed

An advanced AI-powered search platform featuring three core capabilities: Search & Recommendation, Context Engineering, and Image Search. Built with modern MLOps practices for production-ready deployment.

🌟 Features

🎯 Three Core Capabilities

1. 🔍 Search & Recommendation System

Intelligent Indexing: TF-IDF based inverted index with Chinese word segmentation
CTR Prediction: Advanced machine learning models (Logistic Regression & Wide & Deep) for click-through rate prediction
Real-time Ranking: Dynamic ranking strategy adjustment based on user behavior
Knowledge Graph: LLM-based NER technology for enhanced semantic search
A/B Testing: Experiment management for ranking algorithm comparison

2. 🤖 Context Engineering

Hybrid Retrieval: Combines inverted index and knowledge graph for comprehensive information retrieval
LLM Integration: Seamless integration with Ollama for local LLM inference
Prompt Engineering: Optimized prompt templates with full transparency
Context Management: Intelligent context selection and ranking for accurate responses
Multi-source Context: Retrieval from documents, knowledge graphs, and structured data

3. 🖼️ Image Search System

CLIP-powered: OpenAI CLIP model via Hugging Face Transformers
Multi-modal Search: Image-to-image and text-to-image search capabilities
Semantic Understanding: 512-dimensional embedding vectors for precise similarity matching
Real-time Processing: Sub-second search response with efficient similarity calculation
Scalable Storage: Unlimited image library with optimized storage management

🏗️ Shared Infrastructure

Microservice Architecture: Decoupled services (Data, Index, Model, Image, Experiment)
Unified Service Management: Centralized service discovery and management
MLOps Pipeline: Complete workflow from data collection to model deployment
Monitoring & Observability: Real-time performance tracking and health checks
Web Interface: Modern Gradio-based UI with responsive design
Production Ready: Comprehensive error handling, logging, and scalability features

📚 Documentation

Search & Recommendation: docs/SEARCH_GUIDE.md
Context Engineering: docs/CONTEXT_ENGINEERING_GUIDE.md
Image Search: docs/IMAGE_SEARCH_GUIDE.md

🚀 Quick Start

Requirements

Python 3.8+
Memory: At least 2GB
Storage: At least 1GB available space
GPU (optional): For better CLIP model performance

Optional Dependencies

Ollama (for Context Engineering/KG): local LLM inference service, default at http://localhost:11434
datasets (for data tools): pip install datasets, used by tools/wikipedia_downloader.py

Installation

# Clone the repository
git clone https://github.com/tylerelyt/test_bed.git
cd test_bed

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Preloaded Dataset (Read-Only)

If data/preloaded_documents.json exists, the system loads these Chinese Wikipedia documents as a read-only core dataset:

Immutable: Preloaded documents are read-only in the UI
Auto-loading: Automatically loads data/preloaded_documents.json at startup (if present)
User Documents: Importing/editing via the UI is not supported in this version
Data Source: Typically generated from Hugging Face fjcanyue/wikipedia-zh-cn via tooling

Note: If no preloaded file is present, the system will still start but the text index may be empty until data is provided offline.

Preloaded Knowledge Graph (Read-Only)

The system automatically loads a preloaded Chinese knowledge graph if available:

Primary Source: data/openkg_triples.tsv - Real OpenKG concept hierarchy data (290 entities, 254 relations)
Fallback: data/preloaded_knowledge_graph.json - Alternative format if TSV not available
Auto-generation: Run python tools/openkg_generator.py to download fresh OpenKG sample data
Format: TSV format with concept-category relationships (e.g., "移动应用属于软件")
Data Source: OpenKG OpenConcepts project from GitHub

The knowledge graph powers entity recognition and context engineering features.

Start the System

# Method 1: Using startup script
./quick_start.sh

# Method 2: Direct startup
python start_system.py

After the system starts, visit http://localhost:7861 to use the interface.

Configuration

Basic configuration is done in code. Optional environment variables include LLM provider credentials used by NER/RAG (see comments in src/search_engine/index_tab/ner_service.py).

System Architecture Overview

The platform is organized into three main functional areas with shared infrastructure:

🔍 Search & Recommendation Module

Index Building Tab: Offline index construction, document management, and knowledge graph building
Search Tab: Online retrieval and ranking with CTR-based optimization
Training Tab: CTR data collection and Wide & Deep model training

🤖 Context Engineering Module

Context Q&A Tab: Context‑augmented answering with Ollama integration
Knowledge Graph Integration: Semantic search with LLM-based entity recognition
Multi-source Retrieval: Documents, graphs, and structured data integration

Note: Context Engineering / KG rely on a locally running Ollama service and available models. If Ollama is not running or the model hasn't been pulled, the page will show a connection error, but other parts of the system remain available.

🖼️ Image Search Module

Image Search Tab: CLIP-based image retrieval supporting image-to-image and text-to-image search
Image Management: Upload, indexing, and library management
Multi-modal Understanding: Cross-modal semantic search capabilities

🏗️ Shared Infrastructure

Service Management: Unified service discovery and orchestration
Monitoring Tab: System performance monitoring and health checks
Data Pipeline: Centralized data processing and storage
Web Interface: Modern responsive UI with Gradio framework

🖼️ Image Search System

Overview

The image search system leverages OpenAI's CLIP model to provide intelligent image retrieval capabilities:

📤 Image Upload: Store images with descriptions and tags
🔍 Image-to-Image Search: Find visually similar images using query images
💬 Text-to-Image Search: Search images using natural language descriptions
📋 Image Management: Comprehensive image library management

Technical Details

Model: OpenAI CLIP ViT-B/32 via Hugging Face Transformers
Embedding Dimension: 512-dimensional vectors
Similarity Metric: Cosine similarity
Supported Formats: JPG, PNG, GIF, BMP, and more
Performance: Sub-second search response times

Usage Examples

Text-to-Image Search

# Examples of search queries
"a red car on the street"
"cat sleeping on a bed"
"beautiful sunset landscape"
"person running"  # Non-English queries are also supported

Upload and Index Images

Navigate to "🖼️ Image Search System" → "📤 Image Upload"
Select image files and add descriptions/tags
Click "📤 Upload Image" to index

Search Similar Images

Go to "🔍 Image-to-Image" tab
Upload a query image
Adjust the number of results (1-20)
View results in table and gallery format

For detailed usage instructions, see:

📖 User Guide

Basic Usage

Index Building: The system automatically loads preloaded documents (if present) and builds the index on startup; manual document addition via UI is not supported
Search Testing: Enter queries in the search box to retrieve relevant documents
Click Feedback: Clicking search results records user behavior for model training
Model Training: After collecting sufficient data, train CTR prediction models

Advanced Features

1. Batch Data Import

from src.search_engine.data_utils import import_ctr_data
result = import_ctr_data("path/to/your/data.json")

2. Custom Ranking Strategy

from src.search_engine.service_manager import get_index_service
index_service = get_index_service()
results = index_service.search("query terms", top_k=10)

3. Experiment Management

The system supports A/B testing with configurable ranking strategies for comparison in the monitoring interface.

🏗️ Architecture Design

System Architecture

graph TB
    subgraph "🖥️ Web Interface Layer"
        Portal["Portal<br/>🚪 Main Entry"]
    end
    
    subgraph "📱 Application Layer"
        SearchMod["🔍 Search & Recommendation<br/>• Index Building<br/>• Text Search<br/>• CTR Training"]
        RAGMod["🤖 Context Engineering<br/>• Context Q&A<br/>• Knowledge Graph<br/>• Multi-source Retrieval"]
        ImageMod["🖼️ Image Search<br/>• Image Upload<br/>• Image-to-Image<br/>• Text-to-Image"]
    end
    
    subgraph "🏗️ Service Layer"
        DataSvc["DataService<br/>📊 CTR Data Management"]
        IndexSvc["IndexService<br/>📚 Text Indexing & Search"]
        ModelSvc["ModelService<br/>🤖 ML Model Management"]
        ImageSvc["ImageService<br/>🖼️ CLIP-based Search"]
        ExpSvc["ExperimentService<br/>🧪 A/B Testing"]
    end
    
    subgraph "📊 Infrastructure Layer"
        Monitor["Monitoring<br/>📈 Performance Tracking"]
        Storage["Storage<br/>💾 Data Persistence"]
        ServiceMgr["ServiceManager<br/>🔧 Service Orchestration"]
    end
    
    Portal --> SearchMod
    Portal --> RAGMod
    Portal --> ImageMod
    
    SearchMod --> DataSvc
    SearchMod --> IndexSvc
    SearchMod --> ModelSvc
    
    RAGMod --> IndexSvc
    RAGMod --> ModelSvc
    
    ImageMod --> ImageSvc
    
    DataSvc --> ServiceMgr
    IndexSvc --> ServiceMgr
    ModelSvc --> ServiceMgr
    ImageSvc --> ServiceMgr
    ExpSvc --> ServiceMgr
    
    ServiceMgr --> Monitor
    ServiceMgr --> Storage

Data Flow

graph LR
    subgraph "🔍 Search & Recommendation Flow"
        A1[User Query] --> A2[Index Retrieval]
        A2 --> A3[Initial Ranking]
        A3 --> A4[CTR Prediction]
        A4 --> A5[Re-ranking]
        A5 --> A6[Results Display]
        A6 --> A7[User Click]
        A7 --> A8[Behavior Recording]
        A8 --> A9[Model Training]
        A9 --> A4
    end
    
    subgraph "🤖 Context Engineering Flow"
        B1[User Question] --> B2[Document Retrieval]
        B2 --> B3[Knowledge Graph Query]
        B3 --> B4[Context Assembly]
        B4 --> B5[LLM Generation]
        B5 --> B6[Response Display]
    end
    
    subgraph "🖼️ Image Search Flow"
        C1[Image/Text Query] --> C2[CLIP Encoding]
        C2 --> C3[Similarity Calculation]
        C3 --> C4[Result Ranking]
        C4 --> C5[Image Gallery Display]
        C5 --> C6[User Interaction]
        C6 --> C7[Usage Analytics]
    end

📊 Notes

This project is a testbed for learning and experimentation. Any performance numbers depend on environment, data size, and configuration and are not guaranteed.

🛠️ Development Guide

Project Structure

Testbed/
├── src/                          # Source code
│   └── search_engine/           
│       ├── data_service.py            # Data service (CTR data management)
│       ├── index_service.py           # Index service (text search & indexing)
│       ├── model_service.py           # Model service (CTR & Wide&Deep models)
│       ├── image_service.py           # Image service (CLIP-based image search)
│       ├── experiment_service.py      # Experiment management service
│       ├── service_manager.py         # Service manager (unified service access)
│       ├── data_utils.py              # Data processing utilities
│       ├── portal.py                  # Main UI entry point
│       ├── index_tab/                 # Index building & knowledge graph UI
│       │   ├── index_tab.py
│       │   ├── knowledge_graph.py
│       │   ├── ner_service.py
│       │   └── offline_index.py
│       ├── search_tab/                # Text search UI
│       │   ├── search_tab.py
│       │   └── search_engine.py
│       ├── image_tab/                 # Image search UI
│       │   └── image_tab.py
│       ├── training_tab/              # Model training UI
│       │   ├── training_tab.py
│       │   ├── ctr_model.py
│       │   ├── ctr_wide_deep_model.py
│       │   └── ctr_config.py
│       ├── rag_tab/                   # RAG Q&A system UI
│       │   ├── rag_tab.py
│       │   └── rag_service.py
│       └── monitoring_tab/            # System monitoring UI
│           └── monitoring_tab.py
├── models/                       # Model files and data storage
│   ├── ctr_model.pkl                 # Trained CTR model
│   ├── wide_deep_ctr_model.h5        # Wide & Deep model
│   ├── index_data.json               # Text search index
│   ├── knowledge_graph.pkl           # Knowledge graph data
│   └── images/                       # Image storage and embeddings
│       ├── image_index.json
│       └── image_embeddings.npy
├── data/                         # Training and experiment data
│   └── preloaded_documents.json     # Preloaded Chinese Wikipedia documents
├── docs/                         # Documentation (simplified)
│   ├── SEARCH_GUIDE.md              # Search & Recommendation guide
│   ├── CONTEXT_ENGINEERING_GUIDE.md # Context Engineering guide
│   └── IMAGE_SEARCH_GUIDE.md        # Image search guide
├── examples/                     # Example scripts
├── tools/                        # Utility and monitoring tools
├── test/ & tests/                # Test suites
├── start_system.py               # System startup script
├── quick_start.sh                # Quick start script
└── requirements.txt              # Python dependencies

Extension Development

Adding New Ranking Algorithms

Create new ranking module in src/search_engine/ranking/
Implement RankingInterface interface
Register new algorithm in IndexService

Adding New Features

Define new features in CTRSampleConfig
Calculate feature values in DataService.record_impression
Update model training logic

Adding New Image Search Features

Extend ImageService class with new methods
Update image_tab.py UI components
Test with various image types and queries

🧪 Testing

# Run unit tests (if present)
python -m pytest tests/

📈 Monitoring

The system provides multi-dimensional monitoring:

System Monitoring: CPU, memory, disk usage
Business Monitoring: Search QPS, click-through rate, response time
Data Monitoring: Data quality, model performance metrics
Image Search Monitoring: CLIP model performance, search accuracy
Alert Mechanism: Anomaly detection and automatic alerting

🤝 Contributing

Fork the project
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

jieba - Chinese word segmentation
scikit-learn - Machine learning library
Gradio - Web interface framework
pandas - Data processing
Hugging Face Transformers - CLIP model implementation
OpenAI CLIP - Original CLIP model

📞 Contact

Project Homepage: https://github.com/tylerelyt/test_bed
Issue Tracker: https://github.com/tylerelyt/test_bed/issues
Email: tylerelyt@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cursor/rules		.cursor/rules
.github		.github
checkpoints		checkpoints
configs/llmops		configs/llmops
data		data
docs		docs
docs_site		docs_site
examples		examples
logs		logs
models/images		models/images
src		src
test_images		test_images
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
check_versions.py		check_versions.py
debug_segfault.py		debug_segfault.py
image_generation_service.py		image_generation_service.py
image_generation_service_requirements.txt		image_generation_service_requirements.txt
quick_start.sh		quick_start.sh
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt
restart_system.sh		restart_system.sh
setup.py		setup.py
setup_python_env.sh		setup_python_env.sh
start_model_serving.py		start_model_serving.py
start_system.py		start_system.py
status_system.sh		status_system.sh
stop_system.sh		stop_system.sh
test_api.yaml		test_api.yaml
test_coordinate_annotation.py		test_coordinate_annotation.py
test_gui_agent_manual.py		test_gui_agent_manual.py
test_gui_agent_resolution_fix.py		test_gui_agent_resolution_fix.py
test_inference.yaml		test_inference.yaml
test_model_service_integration.py		test_model_service_integration.py
test_osworld_vm_screenshot.py		test_osworld_vm_screenshot.py
test_qwen_dpo.yaml		test_qwen_dpo.yaml
test_qwen_sft.yaml		test_qwen_sft.yaml
test_screen_resolution.py		test_screen_resolution.py

License

tylerelyt/test_bed

Folders and files

Latest commit

History

Repository files navigation

🔬 AI System Testbed

🌟 Features

🎯 Three Core Capabilities

1. 🔍 Search & Recommendation System

2. 🤖 Context Engineering

3. 🖼️ Image Search System

🏗️ Shared Infrastructure

📚 Documentation

🚀 Quick Start

Requirements

Optional Dependencies

Installation

Preloaded Dataset (Read-Only)

Preloaded Knowledge Graph (Read-Only)

Start the System

Configuration

System Architecture Overview

🔍 Search & Recommendation Module

🤖 Context Engineering Module

🖼️ Image Search Module

🏗️ Shared Infrastructure

🖼️ Image Search System

Overview

Technical Details

Usage Examples

Text-to-Image Search

Upload and Index Images

Search Similar Images

📖 User Guide

Basic Usage

Advanced Features

1. Batch Data Import

2. Custom Ranking Strategy

3. Experiment Management

🏗️ Architecture Design

System Architecture

Data Flow

📊 Notes

🛠️ Development Guide

Project Structure

Extension Development

Adding New Ranking Algorithms

Adding New Features

Adding New Image Search Features

🧪 Testing

📈 Monitoring

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages