Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds migration verifier monitoring functionality to Mongosync Insights, allowing users to visualize and track data verification progress when using the MongoDB migration-verifier tool. The feature provides a dashboard showing verification task status, generation history, namespace statistics, and mismatch details.
Changes:
- Added migration verifier monitoring with real-time dashboard showing verification task status, failures, and mismatch details
- Integrated verifier metrics endpoint with session-based authentication and auto-refresh capabilities
- Enhanced home page with new form for verifier database connection configuration
Reviewed changes
Copilot reviewed 17 out of 40 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
migration_verifier.py |
Core logic for gathering and visualizing verifier metrics from MongoDB |
mongosync_insights.py |
Integration of verifier routes and session management |
verifier_metrics.html |
Dashboard template with auto-refresh and error handling |
home.html |
Added verifier form with client-side validation |
requirements.txt |
Updated dependencies (no new packages added) |
README.md, CONFIGURATION.md, etc. |
Comprehensive documentation for new feature |
Comments suppressed due to low confidence (6)
migration/mongosync_insights/templates/VALIDATION.md:196
- Documentation files (
VALIDATION.md,README.md,HTTPS_SETUP.md,CONFIGURATION.md) should not be placed in thetemplates/directory. These belong in the project root or a dedicateddocs/directory. Thetemplates/directory should only contain HTML template files.
# Connection String Validation
This document describes the connection string handling in Mongosync Insights.
## Overview
Mongosync Insights uses PyMongo's built-in validation for connection strings, which provides:
- URI format validation
- Connection testing
- Authentication verification
## Validation Process
### 1. Empty String Check
The application first checks if a connection string was provided:
```python
if not TARGET_MONGO_URI or not TARGET_MONGO_URI.strip():
return error("Please provide a valid MongoDB connection string.")
2. PyMongo URI Parsing
PyMongo's parse_uri() function validates the connection string format and raises InvalidURI if the format is invalid. This checks:
- Proper URI scheme (
mongodb://ormongodb+srv://) - Valid URI syntax
- Proper host and port format
- Valid URI components
3. Connection Test
The application attempts to connect to MongoDB using validate_connection(), which:
- Creates a MongoDB client
- Tests connectivity with a
pingcommand - Validates authentication credentials
- Raises
PyMongoErrorif connection fails
Display Sanitization
Connection strings are sanitized before display to protect credentials.
sanitize_for_display(connection_string)
This function removes credentials from connection strings for safe display in the UI.
Example:
# Input
connection_string = "mongodb+srv://user:password@cluster.mongodb.net/mydb"
# Output
sanitized = "cluster.mongodb.net:27017 (database: mydb)"Implementation:
- Parses the connection string to extract hosts and database
- Escapes HTML special characters
- Returns only non-sensitive information
- Returns
"[Connection String Provided]"if parsing fails
Error Handling
The application provides clear error messages for common issues:
Invalid URI Format
Error Title: "Invalid Connection String"
Error Message: "The connection string format is invalid. Please check your MongoDB connection string and try again."
Common causes:
- Incorrect URI scheme
- Missing required components
- Invalid characters in URI
Connection Failed
Error Title: "Connection Failed"
Error Message: "Could not connect to MongoDB. Please verify your credentials, network connectivity, and that the cluster is accessible."
Common causes:
- Incorrect username or password
- Network connectivity issues
- Firewall blocking connection
- MongoDB server not running
- Incorrect host or port
Unexpected Error
Error Title: "Connection Error"
Error Message: "An unexpected error occurred. Please try again."
Common causes:
- Timeout issues
- DNS resolution failures
- Unexpected server responses
Logging
All connection attempts and errors are logged to insights.log:
logger.error(f"Invalid connection string format: {e}")
logger.error(f"Failed to connect: {e}")
logger.error(f"Unexpected error during connection validation: {e}")
Note: Connection strings with credentials are not logged to prevent credential exposure.
Security Considerations
Credential Protection
- Never displayed: Credentials are always removed before displaying connection information
- Not logged: Connection strings with passwords are never written to logs
- Sanitized output: Only host, port, and database name are shown in the UI
HTTPS Recommended
For production deployments, always use HTTPS to protect connection strings in transit. See HTTPS_SETUP.md for setup instructions.
Secure Cookies
Enable secure cookies when using HTTPS:
MI_SECURE_COOKIES=trueThis ensures session cookies are only transmitted over encrypted connections.
Connection String Best Practices
MongoDB Atlas
Use the SRV connection string format:
mongodb+srv://username:password@cluster.mongodb.net/
Credentials in Environment Variables
For production, store the connection string in an environment variable:
export MI_CONNECTION_STRING="mongodb+srv://user:pass@cluster.mongodb.net/"
python3 mongosync_insights.pyThis prevents credentials from being entered through the web UI.
URL Encoding
Special characters in passwords must be URL-encoded:
@becomes%40:becomes%3A/becomes%2F?becomes%3F#becomes%23
Example:
# Password: p@ss:word
mongodb://user:p%40ss%3Aword@cluster.mongodb.net/
Troubleshooting
"Invalid Connection String" Error
- Check the URI format starts with
mongodb://ormongodb+srv:// - Verify all components are properly formatted
- Ensure special characters in password are URL-encoded
- Check for typos in the connection string
"Connection Failed" Error
- Verify credentials are correct
- Check network connectivity to MongoDB server
- Ensure MongoDB server is running
- Verify firewall allows outbound connections on MongoDB port
- For Atlas, ensure IP address is whitelisted
Connection Hangs
- Check for network timeouts (default: 5 seconds)
- Verify DNS resolution for hostname
- Ensure no proxy blocking MongoDB traffic
Support
For connection issues:
- Check logs:
insights.log - Verify connection string format
- Test connection using MongoDB shell or Compass
- Review MongoDB server logs for authentication failures
**migration/mongosync_insights/migration_verifier.py:606**
* This conditional expression has a logic error. The ternary operator at the end `if "dst:" in details_str else False` is applied to the entire expression rather than just the split operation. This causes the expression to evaluate to either the result of the complex check or `False`, which is then used in an `if` statement. The intended logic appears to be checking if "unique" is not in the second part after splitting by "dst:", but the current structure is confusing and potentially incorrect. Consider refactoring for clarity:
```python
has_dst = "dst:" in details_str
if "unique\": true" in details_str:
if has_dst and "unique" not in details_str.split("dst:")[1][:50]:
coll_details.append(f"Index '{idx_name}': unique constraint missing on {cluster}")
else:
coll_details.append(f"Index '{idx_name}' ({field_type}): property mismatch - {cluster}")
if "unique\": true" in details_str and "unique" not in details_str.split("dst:")[1] if "dst:" in details_str else False:
coll_details.append(f"Index '{idx_name}': unique constraint missing on {cluster}")
migration/mongosync_insights/mongosync_insights.py:339
- The session data is retrieved twice in this function. Lines 337-338 duplicate the session retrieval already performed on lines 328-329. This is inefficient and could lead to inconsistency if the session changes between calls (though unlikely with the current timeout). Consider retrieving the session once and reusing it:
session_id = request.cookies.get(SESSION_COOKIE_NAME)
session_data = session_store.get_session(session_id)
connection_string = session_data.get('verifier_connection_string')
db_name = session_data.get('verifier_db_name', 'migration_verification_metadata') # Get database name from session
session_id = request.cookies.get(SESSION_COOKIE_NAME)
session_data = session_store.get_session(session_id)
db_name = session_data.get('verifier_db_name', 'migration_verification_metadata')
migration/mongosync_insights/templates/migration_verifier.py:723
- The file
migration_verifier.pyis duplicated in both the rootmigration/mongosync_insights/directory and themigration/mongosync_insights/templates/directory. This creates code duplication and maintenance issues. Thetemplates/directory should typically only contain HTML template files, not Python source code. Consider removing the duplicate and keeping the Python source file only in the main directory.
migration/mongosync_insights/templates/mongosync_plot_utils.py:44 - Multiple Python source files (
mongosync_plot_utils.py,mongosync_plot_logs.py,mongosync_insights.py,file_decompressor.py,connection_validator.py,app_config.py) are located in thetemplates/directory. Thetemplates/directory should contain only template files (HTML). All Python source code should be moved to the parent directory or an appropriate subdirectory. This violates standard project structure conventions.
migration/mongosync_insights/templates/requirements.txt:31 - The
requirements.txtfile should not be placed in thetemplates/directory. This file belongs in the project root or the main application directory (migration/mongosync_insights/). Having it intemplates/violates standard Python project structure.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adding migration verifier-related modifications that allow data verification when embedded verifiers are not in use and mongosync is running. They also help with data verification when source and sync connectors between clusters are used in a rollback strategy.
it will show first page
when initial sync run
and during the recheck, the namespace colour was displayed at the bottom based on the final check.
once its green mean data match in source and destination
