Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 324 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,328 @@
# Vi-Notes
# Vi-Notes: Behavioral Authorship Verification System

A production-grade system that analyzes typing behavior to verify authorship and detect potential security threats through behavioral biometrics.

## 🚀 Features

- **Real-time Behavioral Analysis**: Extracts 8+ behavioral features from typing patterns
- **Statistical Baseline Tracking**: Learns user behavior using Welford's algorithm
- **Anomaly Detection**: Z-score based detection of behavioral deviations
- **Session Management**: Complete typing session lifecycle with persistent storage
- **Comprehensive Reporting**: Detailed analysis reports with risk assessments
- **Production Ready**: MongoDB persistence, error handling, and monitoring

## 🏗️ Architecture

### Backend (Node.js + Express)
- **Feature Engine**: Pure functions for behavioral feature extraction
- **Detection Engine**: Rule-based scoring system with confidence levels
- **Baseline Service**: Statistical profiling with anomaly detection
- **Database Layer**: MongoDB with Mongoose ODM
- **REST API**: Complete session and analysis endpoints

### Frontend (React + TypeScript)
- **ContentEditable Editor**: Real-time typing capture
- **Event Buffer**: Batched event transmission
- **Session Integration**: Automatic session lifecycle management

## 📊 Behavioral Features Analyzed

1. **Inter-Key Delays**: Timing between keystrokes
2. **Pause Patterns**: Long pauses indicating thinking/hesitation
3. **Backspace Rate**: Error correction frequency
4. **Paste Detection**: External content insertion
5. **Typing Speed**: Overall input velocity
6. **Rhythm Consistency**: Timing pattern stability
7. **Error Patterns**: Correction behavior analysis
8. **Session Duration**: Total typing time analysis

## 🛠️ Installation & Setup

### Prerequisites
- Node.js 18+
- MongoDB 4.4+
- npm or yarn

### Backend Setup
```bash
cd server
npm install
npm start
```

### Frontend Setup
```bash
cd client
npm install
npm run dev
```

**Vi-Notes** is an authenticity verification platform designed to distinguish genuine human-written content from AI-generated or AI-assisted text. The system focuses on analyzing **writing behavior** alongside **statistical and linguistic characteristics** of the text to establish reliable authorship verification.

This repository represents the **design and conceptual foundation** for the Vi-Notes system.

---

## Motivation

With the widespread availability of AI writing tools, verifying true human authorship has become increasingly challenging. Most existing detection methods rely primarily on textual analysis, which can be inconsistent and easy to bypass.

Vi-Notes approaches this problem by combining:
- Behavioral signals from the writing process
- Statistical analysis of the written content
- Correlation between how content is written and what is written

---

## Core Idea

Human writing naturally includes:
- Variable typing speeds
- Pauses during thinking
- Revisions during idea formation
- Irregular sentence structures
- A relationship between content complexity and editing frequency

AI-generated or pasted text often lacks these behavioral signatures.

Vi-Notes is designed to capture and analyze these characteristics to assess authorship authenticity.

---

## Key Features

### Writing Session Monitoring
- Capture keystroke timing metadata (not raw key content)
- Track pauses, deletions, edits, and writing flow
- Detect pasted or externally inserted text blocks

### Behavioral Pattern Analysis
- Pause distribution before sentences and paragraphs
- Typing speed variance
- Revision frequency relative to text complexity
- Micro-pauses around punctuation and structural boundaries

### Textual Statistical Analysis
- Sentence length variation
- Vocabulary diversity metrics
### Database
MongoDB will automatically create the `vi-notes` database and required collections.

## 🔌 API Endpoints

### Session Management
```http
POST /session/start
Content-Type: application/json

{
"userId": "string"
}

Response:
{
"status": "ok",
"sessionId": "string",
"baseline": {
"sessionCount": number,
"status": "no_baseline|new|developing|mature",
"features": {...}
}
}
```

```http
POST /session/end
Content-Type: application/json

{
"sessionId": "string"
}

Response:
{
"status": "ok",
"session": {
"sessionId": "string",
"userId": "string",
"duration": number,
"eventCount": number,
"startTime": "ISO string",
"endTime": "ISO string"
},
"finalBaseline": {...}
}
```

### Event Processing
```http
POST /events/batch
Content-Type: application/json

{
"events": [
{
"sessionId": "string",
"type": "keydown|keyup|paste|delete",
"key": "string",
"timestamp": number,
"pasteLength": number
}
]
}

Response:
{
"status": "ok",
"features": {...},
"detection": {
"score": number,
"confidence": "low|medium|high",
"flags": ["array of flags"],
"explanation": "string"
},
"baseline": {
"comparison": {...},
"summary": {...}
}
}
```

### Reporting
```http
GET /report/:sessionId

Response:
{
"status": "ok",
"report": {
"sessionId": "string",
"userId": "string",
"sessionInfo": {...},
"analysis": {
"features": {...},
"detection": {...},
"baselineComparison": {...},
"overallRisk": "low|medium|high",
"confidence": number
},
"events": number,
"generatedAt": "ISO string"
}
}
```

### Health Check
```http
GET /health

Response:
{
"status": "ok",
"database": "connected|disconnected"
}
```

## 🎯 Usage Example

```javascript
// Start a session
const sessionResponse = await fetch('/session/start', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ userId: 'alice' })
});
const { sessionId } = await sessionResponse.json();

// Send typing events
const events = [
{ sessionId, type: 'keydown', key: 'H', timestamp: Date.now() },
{ sessionId, type: 'keydown', key: 'i', timestamp: Date.now() + 150 },
// ... more events
];

await fetch('/events/batch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ events })
});

// End session and get report
await fetch('/session/end', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ sessionId })
});

const reportResponse = await fetch(`/report/${sessionId}`);
const { report } = await reportResponse.json();
console.log('Risk Level:', report.analysis.overallRisk);
```

## 🔍 Analysis Results

### Detection Scores
- **0-30**: Genuine behavior (high confidence)
- **31-60**: Mixed indicators (medium confidence)
- **61+**: Suspicious behavior (low confidence)

### Risk Levels
- **Low**: Authorship appears genuine
- **Medium**: Additional verification recommended
- **High**: Significant behavioral deviations detected

### Baseline Status
- **no_baseline**: First session, establishing profile
- **new**: Building initial behavioral profile
- **developing**: Profile maturing with more sessions
- **mature**: Stable profile for reliable anomaly detection

## 🧪 Testing

### Run All Tests
```bash
# Backend tests
cd server
node test-e2e.js # End-to-end integration
node test-integration.js # API endpoint tests
node test-db.js # Database operations

# Frontend development
cd client
npm run dev
```

### Manual Testing
1. Start backend: `cd server && npm start`
2. Start frontend: `cd client && npm run dev`
3. Open http://localhost:5173
4. Start typing in the editor
5. Check server logs for real-time analysis

## 📁 Project Structure

```
vi-notes/
├── client/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── hooks/ # React hooks
│ │ ├── services/ # API services
│ │ └── types/ # TypeScript types
│ └── package.json
├── server/ # Node.js backend
│ ├── database/ # MongoDB models & service
│ ├── detection-engine/ # Behavioral analysis
│ ├── feature-engine/ # Feature extraction
│ ├── baseline/ # Statistical profiling
│ ├── index.js # Express server
│ └── package.json
└── README.md
```

## 🔒 Security Considerations

- **Behavioral Biometrics**: Uses typing patterns as biometric signatures
- **Anomaly Detection**: Identifies deviations from established baselines
- **Session Isolation**: Each session is cryptographically unique
- **Data Persistence**: Secure storage of behavioral profiles
- **Privacy**: No sensitive content is stored, only behavioral metadata

## 🚀 Production Deployment

### Environment Variables
```bash
MONGODB_URI=mongodb://localhost:27017/vi-notes
NODE_ENV=production
PORT=5000
```

### Docker Deployment
```dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 5000
CMD ["npm", "start"]
```

### Scaling Considerations
- **Database**: MongoDB with proper indexing
- **Caching**: Redis for session caching (future enhancement)
- **Load Balancing**: Multiple backend instances
- **Monitoring**: Application performance monitoring

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- Built with behavioral biometrics research
- Uses Welford's algorithm for online statistical computation
- Inspired by keystroke dynamics and behavioral authentication literature
- Stylistic consistency analysis
- Linguistic irregularities typical of human writing

Expand Down
Loading