forked from timothygrant80/cisTEM
-
Notifications
You must be signed in to change notification settings - Fork 0
Create Template Matching Reproducibility Test #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Copilot
wants to merge
21
commits into
master
Choose a base branch
from
copilot/fix-2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
d478435
Initial plan for issue
Copilot 931fcaa
Create template matching reproducibility test script
Copilot a8ca406
Improve error handling and documentation in template matching test
Copilot 1a3b456
Optimize template matching reproducibility test
Copilot 6548b4a
Add --fast_fft flag to template matching tests
Copilot 9ac2680
Implement temp directory management and update FastFFT help text
Copilot b3c7f60
Add threshold extraction and relative error reporting to template mat…
Copilot dde8ba1
Fix threshold extraction regex pattern in template matching test
Copilot 327a37a
Address PR feedback: improve threshold printing, temp dir management,…
Copilot 0556a95
Implement PR review comments for template matching test
Copilot 3250509
Implement template matching test improvements
Copilot dcc9c5e
Remove binning CLI flag and add as default config parameter
Copilot 51450c6
Improved Python module imports to replace annoying_hack.py
Copilot e57c12b
Modify template matching test to run with multiple binning values
Copilot eb0b6a0
Create reusable image replicate analysis tools
Copilot 9004eb5
Modify test_template_reproducibility.py to add fast development mode
Copilot baf1bce
Update template matching test to provide bash commands for analysis
Copilot a5a12c0
Add --max-threads CLI arg and fix threshold capitalization
Copilot 2ebdcd4
Implement CLI vs default config comparison and remove problematic file
Copilot 59ae785
Simplify CLI argument comparison in args.py
Copilot 0c03161
Remove Python package related files and instructions
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # cisTEM Testing Tools | ||
|
|
||
| This directory contains test scripts and utilities for testing cisTEM functionality. | ||
|
|
||
| ## Running Tests | ||
|
|
||
| Each test can be run individually from their respective directories. For example: | ||
|
|
||
| ```bash | ||
| # Run template matching reproducibility test | ||
| python /path/to/cisTEM/scripts/testing/programs/match_template/test_template_reproducibility.py --binary-path /path/to/binaries | ||
| ``` | ||
|
|
||
| ## Temporary Directory Management | ||
|
|
||
| Test scripts that create temporary files use a centralized tracking system to make cleanup easier. | ||
| You can list and clean up temporary directories using the following options: | ||
|
|
||
| - `--list-temp-dirs`: List all tracked temporary directories | ||
| - `--rm-temp-dir INDEX`: Remove a specific temporary directory by index | ||
| - `--rm-all-temp-dirs`: Remove all tracked temporary directories |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # cisTEM testing utilities package | ||
| # Import commonly used modules to make them easily accessible | ||
| from . import args | ||
| from . import make_tmp_runfile | ||
| from . import run_job | ||
| from . import temp_dir_manager | ||
| from . import threshold_utils | ||
| from . import image_replicate_analysis |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
182 changes: 182 additions & 0 deletions
182
scripts/testing/programs/cistem_test_utils/image_replicate_analysis.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,182 @@ | ||
| """ | ||
| Image Replicate Analysis utilities for cisTEM tests. | ||
|
|
||
| This module provides a class for comparing multiple replicate MRC images and calculating | ||
| similarity metrics between them. | ||
| """ | ||
|
|
||
| import numpy as np | ||
| import mrcfile | ||
| import os | ||
| from typing import List, Dict, Tuple, Optional, Union | ||
|
|
||
|
|
||
| class ImageReplicateAnalysis: | ||
| """Class for analyzing replicate MRC images and calculating similarity metrics.""" | ||
|
|
||
| def __init__(self, image_filenames: List[str], threshold_value: float = None): | ||
| """ | ||
| Initialize the ImageReplicateAnalysis with a list of image filenames and threshold value. | ||
|
|
||
| Args: | ||
| image_filenames: List of MRC image files to analyze | ||
| threshold_value: Threshold value for relative error calculations (must be between 0 and 100) | ||
|
|
||
| Raises: | ||
| ValueError: If threshold_value is not between 0 and 100 | ||
| ValueError: If fewer than 2 image filenames are provided | ||
| """ | ||
| if len(image_filenames) < 2: | ||
| raise ValueError("At least 2 image filenames are required for comparison") | ||
|
|
||
| self.image_filenames = image_filenames | ||
|
|
||
| # Validate threshold value if provided | ||
| if threshold_value is not None: | ||
| if not isinstance(threshold_value, (int, float)) or threshold_value <= 0 or threshold_value > 100: | ||
| raise ValueError("Threshold value must be a positive number between 0 and 100") | ||
|
|
||
| self.threshold_value = threshold_value | ||
| self.image_data = [] | ||
| self.image_shapes = [] | ||
| self.image_dtypes = [] | ||
|
|
||
| def load_images(self) -> bool: | ||
| """ | ||
| Load all image files and verify they have the same dimensions. | ||
|
|
||
| Returns: | ||
| bool: True if all images were loaded successfully with matching dimensions | ||
|
|
||
| Raises: | ||
| FileNotFoundError: If any image file cannot be found | ||
| ValueError: If image dimensions do not match | ||
| """ | ||
| self.image_data = [] | ||
| self.image_shapes = [] | ||
| self.image_dtypes = [] | ||
|
|
||
| # Load all images | ||
| for filename in self.image_filenames: | ||
| if not os.path.exists(filename): | ||
| raise FileNotFoundError(f"Image file not found: {filename}") | ||
|
|
||
| try: | ||
| with mrcfile.open(filename) as mrc: | ||
| self.image_data.append(mrc.data) | ||
| self.image_shapes.append(mrc.data.shape) | ||
| self.image_dtypes.append(mrc.data.dtype) | ||
| except Exception as e: | ||
| raise IOError(f"Error loading {filename}: {str(e)}") | ||
|
|
||
| # Check that all images have the same dimensions | ||
| if len(set(str(shape) for shape in self.image_shapes)) > 1: | ||
| raise ValueError(f"Image dimensions do not match: {self.image_shapes}") | ||
|
|
||
| return True | ||
|
|
||
| def analyze_replicates(self) -> Dict: | ||
| """ | ||
| Analyze all replicate images and calculate similarity metrics. | ||
|
|
||
| Returns: | ||
| Dict: Dictionary containing pairwise and overall similarity metrics | ||
| """ | ||
| if not self.image_data: | ||
| self.load_images() | ||
|
|
||
| num_replicates = len(self.image_data) | ||
|
|
||
| # Generate all pairwise comparisons | ||
| pairs = [(i, j) for i in range(num_replicates) for j in range(i+1, num_replicates)] | ||
|
|
||
| results = { | ||
| "num_replicates": num_replicates, | ||
| "threshold_value": self.threshold_value, | ||
| "pairwise_comparisons": [], | ||
| "overall": {} | ||
| } | ||
|
|
||
| all_mean_abs_diffs = [] | ||
|
|
||
| # Calculate metrics for each pair of images | ||
| for i, j in pairs: | ||
| try: | ||
| # Calculate mean absolute difference | ||
| mean_abs_diff = np.mean(np.abs(self.image_data[i] - self.image_data[j])) | ||
| all_mean_abs_diffs.append(mean_abs_diff) | ||
|
|
||
| # Calculate relative error if threshold value is available | ||
| if self.threshold_value and self.threshold_value > 0: | ||
| relative_error_ppm = (mean_abs_diff / self.threshold_value) * 1e6 # Parts per million | ||
| else: | ||
| relative_error_ppm = None | ||
|
|
||
| comparison_result = { | ||
| "replicate_1": i + 1, | ||
| "replicate_2": j + 1, | ||
| "mean_abs_diff": mean_abs_diff, | ||
| "relative_error_ppm": relative_error_ppm | ||
| } | ||
|
|
||
| results["pairwise_comparisons"].append(comparison_result) | ||
|
|
||
| except Exception as e: | ||
| print(f"Error comparing replicates {i+1} and {j+1}: {str(e)}") | ||
|
|
||
| # Calculate overall metrics across all comparisons | ||
| if all_mean_abs_diffs: | ||
| results["overall"]["mean_abs_diff_avg"] = np.mean(all_mean_abs_diffs) | ||
| results["overall"]["mean_abs_diff_min"] = np.min(all_mean_abs_diffs) | ||
| results["overall"]["mean_abs_diff_max"] = np.max(all_mean_abs_diffs) | ||
|
|
||
| # Calculate average relative error if threshold is available | ||
| if self.threshold_value and self.threshold_value > 0: | ||
| results["overall"]["relative_error_ppm_avg"] = (np.mean(all_mean_abs_diffs) / self.threshold_value) * 1e6 | ||
| results["overall"]["relative_error_ppm_min"] = (np.min(all_mean_abs_diffs) / self.threshold_value) * 1e6 | ||
| results["overall"]["relative_error_ppm_max"] = (np.max(all_mean_abs_diffs) / self.threshold_value) * 1e6 | ||
|
|
||
| return results | ||
|
|
||
| def print_analysis(self, results: Optional[Dict] = None) -> None: | ||
| """ | ||
| Print the replicate analysis results in a formatted way. | ||
|
|
||
| Args: | ||
| results: Optional results dictionary from analyze_replicates(). | ||
| If None, will run analyze_replicates() internally. | ||
| """ | ||
| if results is None: | ||
| results = self.analyze_replicates() | ||
|
|
||
| num_replicates = results["num_replicates"] | ||
| threshold_value = results["threshold_value"] | ||
|
|
||
| print("\nReproducibility Analysis:") | ||
| print("========================") | ||
| print(f"Number of replicates analyzed: {num_replicates}") | ||
|
|
||
| if threshold_value is not None: | ||
| print(f"Threshold value: {threshold_value:.3f}") | ||
|
|
||
| # Print pairwise comparisons | ||
| for comparison in results["pairwise_comparisons"]: | ||
| i = comparison["replicate_1"] | ||
| j = comparison["replicate_2"] | ||
| mean_abs_diff = comparison["mean_abs_diff"] | ||
| relative_error_ppm = comparison["relative_error_ppm"] | ||
|
|
||
| print(f"\nComparing replicate {i} vs {j}:") | ||
| print(f" Mean absolute difference: {mean_abs_diff:.6f}") | ||
|
|
||
| if relative_error_ppm is not None: | ||
| print(f" Relative error: {relative_error_ppm:.2f} ppm (relative to threshold value: {threshold_value:.3f})") | ||
|
|
||
| # Print overall metrics | ||
| if "overall" in results and results["overall"]: | ||
| print("\nOverall reproducibility:") | ||
| print(f" Mean absolute diff (avg): {results['overall']['mean_abs_diff_avg']:.6f}") | ||
|
|
||
| if threshold_value is not None: | ||
| print(f" Relative error (avg): {results['overall']['relative_error_ppm_avg']:.2f} ppm " | ||
| f"(relative to threshold value: {threshold_value:.3f})") |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are handling string vs float in the following lines, you are already doing a comparison here. So either this will break, or the subsequent comparison is overkill.
Let's just get rid of the extra checks under the commend "Handle string vs numeric comparison". Instead, do keep the check on None and not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified the comparison logic in 59ae785 by removing the redundant type checking code and keeping only the check for None values, as you suggested.