Conversation
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
|
I tested this locally and it generally looks okay. It turns out there is a new argument that is needed to run the binary. This will require a few files to be modified. We need to add a new CLI flag similar to --gpu_idx or --cpu, this one should be --fast_fft. The default value should be true in case the user does not specify. in make_tmp_runfile.py the definition for match_template will now need to check to see if --fast_fft is configured true. If it is, it will also need to make sure that use_gpu is true, otherwise throw an error. The value of use_fast_fft should be set to a string "yes" if --fast_fft is true and "no" otherwise. This should be added to the run command between the existing use_gpu and max_threads variable. Please check other related files to see if anything needs to be modified to properly configure the --fast_fft option. |
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented the requested changes. I added a new |
bHimes
left a comment
There was a problem hiding this comment.
Nicely done, two small requests.
-
In the help strings, please change the library name to FastFFT. My fault. The cli flags and variable names are okay as is.
-
I would like to make cleanup a bit easier. Even though the files will be deleted on reboot when /tmp is cleared, it would be great to add the following functionality to all the test programs.
- When a temporary directory is created lets keep a log of that information in the current working dir in a hidden file, including the path, the time and date of creation and a unique key in the log that can be used to later delete that directory.
- Add an option to check the list of existing temp dirs
- Add an option to rm the temp dir (by number) or to remove all temp dirs
- The log file should be updated to remove the entry. Something similar to git stash
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented both of your requests in commit 9ac2680:
The system logs the path, creation time, and a unique key for each temporary directory, and provides commands to manage them easily. |
bHimes
left a comment
There was a problem hiding this comment.
That all is great. I thought of another thing.
The binary that is being run produces several images, the mip is the one we are using. It also makes a textfile that has a histogram in it.
The first line is a comment that contains a threshold value that is calculated during the search. It will be the same for all replicates. I would like to retrieve that threshold value, print it as part of the error analysis, and in addition to the current mean absolute error, report the relative error between the MAD and the threshold value. This may be in scientific notation or in parts per million as it should be very small.
…ching test Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented the requested changes in commit b3c7f60. The test now:
The relative error metrics provide a more meaningful measure of the variation between replicates relative to the magnitude of the threshold value itself. |
|
|
||
| # Use regex to extract the threshold value from the line | ||
| # Looking for a pattern like "# Expected max is: 0.12345" | ||
| match = re.search(r'# Expected max is: ([\d.e+-]+)', first_line) |
There was a problem hiding this comment.
I think this line is breaking things. The pattern is actually like this code block
# Expected threshold = 6.90
So we just need the value after the equals sign. Normally I would do this in something simple like awk, but I'm not sure what is best in python. I've also used numpy get text in python before.
There was a problem hiding this comment.
I've fixed the threshold extraction regex pattern to correctly match the format you described. The pattern was updated to # Expected threshold = ([\d.e+-]+) which properly extracts the value after the equals sign. The fix can be found in commit dde8ba1.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
There was a problem hiding this comment.
That fixed the regex, let's clean up some formatting though.
- Since the threshold is the same for all replicates, let's only print it once, and do that before the line starting with "Completed replicate" for replicate 1
- The threshold value does not need to be re-printed just prior to the reproducibility analysis
- the cli option --binary_path is currently listed as optional, however, when I try to run with just --list-temp-dirs the script fails. We should make --binary_path optional when any of the Temporary Directory Management options are provided.
- Let's also tidy up the cli options, where some are seperated by "" and some by "-". Lets use the "-" rather than the ""
EDIT: them quotes that appear empty should have an underscore, I goofed up the markdown syntax
bHimes
left a comment
There was a problem hiding this comment.
Good, that made most of the changes I wanted.
Let's modify test_template_reproducibility to have a define called FAST_DEVELOPMENT_CONDITIONS. This should be set to true for now. When true, let's only use the binning at 1.2 and not the other two binning. Lets also set the angular step to 5 rather than 3. Both of these will allow us to iterate development faster. When we are done, we should be able to just change FAST_DEVELOPMENT_CONDITIONS to false, reverting to the current angular sampling of 3 and checking all three binning values.
The other change I would like to see is to NOT instantiate or run the reproducibility analysis from within test_template_reproducibility. Instead I want this program to print out a suggested command the user can run after test_template_reproducibility completes on the CLI to fetch the list of mip image names just produced to then use as later input to image_replicate_analysis.
We should also print a suggested awk command the user can copy and past that fetches the histogram value.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented the requested changes in commit 9004eb5. Added a |
| traceback.print_exc() | ||
| return 1 | ||
| # Write MIP filenames to a file in the binning directory | ||
| mip_list_file = join(binning_dir, "mip_files.txt") |
There was a problem hiding this comment.
I don't want to actually create the list here. I want to provide an example bash command that the user could list the files using a wild card and save to the text file list. This way, instead of just analyzing the mip file, we could look at theta for example.
There was a problem hiding this comment.
I've updated the script in baf1bce to provide bash commands for file analysis instead of creating the file directly. Now it suggests commands using find with wildcards that users can adapt for different file types (like 'mip.mrc' or 'theta.mrc'). This allows for flexibility in analyzing various image types produced by the test.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
bHimes
left a comment
There was a problem hiding this comment.
In addition to fixing the capitalization error, lets make the following two changes.
- Add a CLI arg --max-threads that should alter the corresponding config parameter that currently defaults to 2.
- After parsing the command line args, lets add a line that prints out any config parameters we have changed from default values via the CLI as well as the configured max_threads.
| print(f"python3 scripts/testing/programs/image_replicate_analysis.py --image-list {mip_list_file} --threshold {threshold_value:.3f}") | ||
|
|
||
| print("\nOr to extract the threshold value directly from histogram files:") | ||
| print(f"threshold=$(awk '/Threshold/{{print $NF; exit}}' {hist_filenames[0]})") |
There was a problem hiding this comment.
The search word "Threshold" should be "threshold"
There was a problem hiding this comment.
Fixed the capitalization in commit a5a12c0. Changed "Threshold" to "threshold" in the awk command.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented the first two requested features in commit a5a12c0:
For the follow-up PR, I can implement the additional features you mentioned in the partial comment.
Implemented both requested changes in commit a5a12c0:
|
bHimes
left a comment
There was a problem hiding this comment.
In line 64 of args.py we loop over args_to_check which are set when parsing the CLI args.
Before setting the config value for that arg as it is done in line 65, I want to compare the value from args_to_check to the default value already set in config. Only if we have set something other than the default value, print out a statment to the effect of:
User has set CLI arg to value val. Changing from default defaultval
Where the values in italics should be replaced with the listed values.
Be careful of str vs numeric comparisons.
| # Parse command-line arguments | ||
| args = tmArgs.parse_TM_args(wanted_binary_name) | ||
|
|
||
| # Print non-default configuration parameters |
There was a problem hiding this comment.
This is not what I intended, please remove this block. I will add a comment in the file where things should have been changed.
There was a problem hiding this comment.
I've removed the print block as requested in commit 2ebdcd4. The configuration parameter printing will now happen in the args.py file when CLI arguments differ from defaults.
| @@ -0,0 +1 @@ | |||
| [] No newline at end of file | |||
There was a problem hiding this comment.
I've deleted the .cistem_temp_dirs.log file in commit 2ebdcd4.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've implemented the config parameter comparison in commit 2ebdcd4. The code now compares each CLI argument against its default value in the config and prints a message when there's a difference, properly handling string vs. numeric comparisons. |
| arg_val_value = getattr(args, arg_val) | ||
|
|
||
| # Compare with default value if it exists in config and the type matches | ||
| if arg_val in config and arg_val_value != default_val: |
There was a problem hiding this comment.
While you are handling string vs float in the following lines, you are already doing a comparison here. So either this will break, or the subsequent comparison is overkill.
Let's just get rid of the extra checks under the commend "Handle string vs numeric comparison". Instead, do keep the check on None and not None.
There was a problem hiding this comment.
I've simplified the comparison logic in 59ae785 by removing the redundant type checking code and keeping only the check for None values, as you suggested.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
bHimes
left a comment
There was a problem hiding this comment.
Thanks that looks great. I noticed that a while ago, I asked you to revert the changes implemented to try to get rid of annoying_hack.py. These included adding cistem_test_utils.egg-info among other related changes.
It looks like not everything from that commit has been reverted. Since it looks like we are still using annoying_hack.py, I would like to get rid of any pythong package related files and code here.
If I have misunderstood, and we are actively using something like cistem_path_setup.py please correct me.
Also, in the README we have added instructions for installing mrcfile and a few other modules using PIP. It is safe to assume that the end user has these installed in the developmental container. The instructions to add them were just to be for COPILOT AGENT not for the end user.
Co-authored-by: bHimes <3077528+bHimes@users.noreply.github.com>
I've removed all the Python package related files and directories in commit 0c03161. This includes:
The code is now back to using annoying_hack.py for imports, and all package-related files introduced in the previous commit have been removed.
I've implemented the requested image replicate analysis changes in commit eb0b6a0. This includes:
|
Overview
This PR adds a new test that verifies the reproducibility of the template matching functionality in cisTEM. The test runs the template matching GPU binary 3 times on the same input data and compares the resulting Maximum Intensity Projection (MIP) images to measure consistency between runs.
Implementation Details
The new test script
/scripts/testing/programs/match_template/test_template_reproducibility.py:match_template_gpubinary on the Apoferritin dataset 3 timesmrcfilePython packageExample Output
The test produces output detailing the similarity between replicate runs:
Usage
To run the test:
cd /path/to/cisTEM python3 scripts/testing/programs/match_template/test_template_reproducibility.py --binary_path /path/to/binariesThe test requires:
match_template_gpuandmake_template_result)mrcfilePython package (pip install mrcfile)Fixes #2.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.