Stuff to clone onto Amazon AWS SageMaker instances for different models. Different notebooks with different models and different platforms (TF, Torch).
You might want to go down to the discussion of models, Jupyter Notebooks, drafts of papers to publish, etc. For now, though, you'd need to wait for me to write that part. A Good place to look until I get that done is my repo for rib-wrist-in-bin-din, one of the funding-attracting type names I've come up with, standing for Reused Information Bearing Writing Surface Traces in Bin-Din-gs. Especially interesting (though very drafty) are a Google Drive hosted draft of the more-technical paper—for now a chain-of-consciousness discussion that incorporates my literature search, the types of models I plan to use or have already scaffolded, and some goals on classification metrics. Something that goes even more into the "why" of the project, discussing how many documents I want to search through to find these RReused MManuscript Fragments in Bindings (RMFBs) and giving visualizations and references for some of the Deep Learning architectures I want to use, is a Jupyter notebook with some of my vision and the plans for my baseline (a Vanilla CNN for identification on the CIFAR-10 dataset).
For now, I'm putting in READMEs for the models to be used, analyzed, and perhaps voted on by a much later ensemble model. They're somewhat in the order of complexity, but don't quote me on that.
Just doing some review and geting something running on Google Colab, specifically one of what I believe are exact copies (with differences in whether output is included or not) in my rib-wrist-in-bin-din repository, which are best viewed using the big button at the top, GO TO THIS ONE → Paper_Code_Prep_01.ipynb ← GO TO THAT ONE! and Paper_Code_Prep_01_-_no_output.ipynb. The two should be the same as each other and the same as the notebook in this repo. The one here in this repo with a similar name doesn't have output, and is thus not recommended. I'm not even going to link it. Go for the first link under this zeroth step.)
- Timebox: 30–45 min, stop if any single blocker >15 min.
- Reuse-only: Minimal glue only; no net-new features.
- Deliverables: printed
[DONE] test_acc=…+ two artifacts inoutputs/:outputs/test_summary_seed137_<ts>.jsonoutputs/csv_logs/train_history_seed137_<ts>.csv
- The file,
requirements_vanilla_cnn.ymlfor thecondaenvironment - Conda env:
vanilla_cnn(kernel registered or pass-k). - Repo:
~/my_repos_dwb/fhtw-paper-code-prep/ - Scripts:
structure.sh,bin/start_cifar_lab.sh - Run once per machine:
python ~/my_repos_dwb/fhtw-paper-code-prep/verify_env.py
# 1) Scaffold tag
WITH_NB_STUBS=1 ./structure.sh test_project_bash p_03_e2e
# 2) Activate env & runtime vars
cd test_project_bash/p_03_e2e
conda activate vanilla_cnn
export OMP_NUM_THREADS=4
export TF_NUM_INTEROP_THREADS=2
export TF_NUM_INTRAOP_THREADS=4
export CUDA_VISIBLE_DEVICES=""
export TAGDIR="$(pwd)"
# (optional) seed project cache
[ -z "$(ls -A "$TAGDIR/datasets" 2>/dev/null)" ] && [ -d "$HOME/.keras/datasets" ] && cp -r "$HOME/.keras/datasets"/ "$TAGDIR/datasets"/# 3) Launch JupyterLab
cd ~/my_repos_dwb/fhtw-paper-code-prep
bin/start_cifar_lab.sh -p "$TAGDIR" -e vanilla_cnn -k
# open: test_project_bash/p_03_e2e/notebooks/02_training_p_03_e2e.ipynbimport os, sys
from pathlib import Path
tagdir = Path(os.environ.get("TAGDIR", Path.cwd().parent)).resolve()
os.environ["TAGDIR"] = str(tagdir)
(tagdir / "outputs" / "csv_logs").mkdir(parents=True, exist_ok=True)
for p in (tagdir, tagdir.parent):
sp = str(p)
if sp not in sys.path:
sys.path.insert(0, sp)
os.environ.setdefault("OMP_NUM_THREADS","4")
os.environ.setdefault("TF_NUM_INTEROP_THREADS","2")
os.environ.setdefault("TF_NUM_INTRAOP_THREADS","4")
os.environ.setdefault("CUDA_VISIBLE_DEVICES","")
print("TAGDIR =", tagdir)import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = x_train.astype("float32")/255.0
x_test = x_test.astype("float32")/255.0
from tensorflow.keras import layers, models
m = models.Sequential([
layers.Input((32,32,3)),
layers.Conv2D(32,3,activation="relu"),
layers.Conv2D(32,3,activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64,3,activation="relu"),
layers.Conv2D(64,3,activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="relu"),
layers.Dense(10, activation="softmax"),
])
m.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
m.fit(x_train, y_train, epochs=5, batch_size=128, validation_split=0.1, verbose=1)
loss, acc = m.evaluate(x_test, y_test, verbose=0)
print(f"[DONE] test_acc={acc:.4f}")
from scripts.py_utils_p_03_e2e import log_test_summary # ensure this function exists
import os
log_test_summary(acc, loss=float(loss), seed=137, tagdir=os.environ["TAGDIR"])- Mirror env
vanilla_cnnon the instance or SageMaker Studio. - Use tag:
aws_s3_cifar/transfer_try_1 - Dataset on S3 (example):
s3://<bucket>/datasets/cifar10/ - Minimal S3 download snippet (boto3) before loading:
import boto3, os
from pathlib import Path
s3 = boto3.client('s3')
cache = Path(os.environ.get('TAGDIR', '.'))/ 'datasets'
cache.mkdir(parents=True, exist_ok=True)
s3.download_file('<bucket>', 'datasets/cifar10/cifar-10-batches-py.tar.gz', str(cache/'cifar-10-batches-py.tar.gz'))
# proceed with keras CIFAR-10 load or your own loaderpython scripts/normalize_eol.py --root "$TAGDIR" --map "sh=lf,ps1=crlf,cmd=crlf,py=lf,ipynb=lf,md=lf"Coming soon!
identify \
-format "
\n\n%f\n%[magick] %[colorspace] %[type] \
%[extension] %[bit-depth] %[channels]\n
" \
*.tiff | tr -s ' ' # TIFF as exampleoften with the output redirected to a file, as shown below.
An example of my most-used 0 command, then (using aliases I detail, below), is
ttdate && \
time \
identify -format "
\n\n%f\n%[magick] %[colorspace] %[type] \
%[extension] %[bit-depth] %[channels]\n" \
*.tiff | tr -s ' ' \
> informative_filename_$(ttdate).out \
&& ttdatechkwhere the following two aliases have been created:
alias ttdate="date +'%s_%Y-%m-%dT%H%M%S%z'"
alias ttdatechk=' echo -e "$(ttdate)\nExitedNormally" '\
' || echo -e "$(ttdate)\nExitedNonZero" '\
'; ttdate; echo " (this may be a BONUS timestamp)";'
included in (something sourced by) my $HOME/.bashrc, though typing them at the
terminal prompt works, too.
mogrify -colorspace srgb -type truecolor *.jpgIMPORTANT!!!
ALWAYS USE A COMMAND LIKE THIS, WITHOUT ANY -quality PERCENT
FLAG, WHEN DOING CONVERSIONS FROM JPEG TO (DIFFERENT-FORMAT)
JPEG.
An example of my most-used 1 command, then (using aliases I detailed in '0', above), is
ttdate && \
time \
mogrify -colorspace srgb -type truecolor *.jpg \
&& ttdatechkmogrify -format jpg -quality 92 -colorspace srgb -type truecolor *.pngA quality of 92 is pretty standard1.
An example of my most-used 2 command, then (using aliases I detailed in '0', above), is
ttdate && \
time \
mogrify -colorspace srgb -type truecolor *.jpg \
&& ttdatechkmogrify -format jpg -quality 92 *.tiff # *.tiff for an exampleAn example of my most-used 3 command, then (using aliases I detailed in '0', above), is
ttdate && \
time \
mogrify -format jpg -quality 92 *.tiff \
&& ttdatechk# hopefully it's faster when not working on files on the external hard drive
#+ [ ... working with stuff ... ]
#+ yes, it goes much faster with the files on the local machine- I'm talking loose and fast by saying "pretty standard". I think the best way to see why I call this standard is to look at the default settings for a few widely- used programs.
I'll put in links to more formal research, later, though I'll note that I like the analysis at Lenspiration (that's an archived version) about JPEG quality in (Adobe) Lightroom (that's not an archived version). From what I can gather, the actual pixel output (in Lightroom) is the same (or more likely only trivially different) anywhere in the 93-100 range, and similarly the same in the 85-92 range. It keeps going like that for 11 more ranges or bands of quality value. Though I didn't see the details of the experimental method used for the "(Subjective) JPG quality at 100% zoom" for each quality setting, it seems very little different between 92 (or 93) and 100, while the file size grows markedly (following a previously visible exponential trend) between about 90 and 95.
92 documentation (archived version)
used to be 85 source from 2010, which is after the change had been made
90 Found just now, after I downloaded the newest stable GIMP 3.0.4, by exporting an
image as with a .jpg extension and pressing the Reset to FactoryDefaults button. Right after I finished,
I got the UNIX Timestamp as 1752957414 which is the same as Sat Jul 19 20:36:54 UTC 2025 (from date -u)
I found the same version when following similar steps in version 2.10.
85 Google Sites
Note that the formatting/style went a little wacky for the archived version (at least in my view of it). Scroll past the big images of an X and a magnifying glass, some other stuff, and another magnifying glass, and then you should be able to see the content just fine.
80 This isn't a standard/default value (it seems the user picks their default when installing), but the value I most often saw in Google results was 80.
92 GitHub issue on the Photoprism repo
No default.
100 Affinity Forum
Someone should get fired. That's taking up a bunch of extra, unneeded storage (though I guess it's less than the RAW version).
92 (sometimes stated as 0.92) ???
StackOverflow post 2 which gives source a as well as referencing the other SO post, above.
For best results, we recommend that you upload uncompressed JPEGs – at least 92% quality or above.
a [sic] for the intimation that "uncompressed" is equivalent to "92% ([sic] on the '%', too) quality or above"
Maybe <strong>@TODO</strong> : Look for suggestions from other websites.
Image Quality Assessment Using The Ssim And The Just Noticeable Difference Paradigm. PDF viewable here https://doi.org/10.1007/978-3-642-39360-0_3
Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions (arxiv) http://dx.doi.org/10.48550/arXiv.2503.00625
Multiple just-noticeable-difference-based no-reference stereoscopic image quality assessment https://doi.org/10.1364/AO.58.000340
Image Quality Assessment: From Error Visibility to Structural Similarity This link is for PDF download https://doi.org/10.1109/TIP.2003.819861
Post from fstoppers.com which I include solely because of the statement that begins the article, "You lose information when an image is saved in JPEG format. This is acceptable, unless you save the same image more than once. Let’s have a look at how much information you really lose."
You should, however, see Wikipedia's JPEG article, specifically the Lossless Editing section, archived version here
An Analysis of Lightroom JPEG Export Quality Settings
Post from darkroomphotos concerning Lighroom with scientific measurements
A discussion of what quality means
Quality has to do with matrix coefficients (sometimes; it depends how the encoders work) The matrix is called the Quantization Matrix or sometimes the Quantization Table. (matrix coefficients related to frequency in DFT?)
Photo StackExchange Discussion
Something else from Photo StackExchange about what quality means
There's a good discussion in the comments:
But possibly it is the case that quality number 0-100 isn't actually part of the jpeg standard (i.e. there aren't quantization tables specified in the standard by a given quality number) and so there IS no direct translation between adobes and libjpegs quality numbers because they actually use different quantization tables altogether. If that is the case, then there really isn't a translation between them and the answer you pointed to is as good as it is going to get. –John Robertson | Dec 3, 2014 at 19:15
It actually is defined in the standard but many encoders use a 0-100 scale which doesn't correspond to this. –mattdm | Dec 3, 2014 at 19:25
Link for 'the standard', https://datatracker.ietf.org/doc/html/rfc2435#section-4.2
FotoForensics Post, very understandable
https://superuser.com/questions/62730/how-to-find-the-jpg-quality#comment1346047_62730
Just to make sure that it is known: the quality setting of different applications is not comparable, in general: faqs.org/faqs/jpeg-faq/part1/section-5.html. Both GIMP and ImageMagick should use the IJG quality scale, though. –Michael Schumacher | Sep 29, 2015 at 12:36
https://www.numberanalytics.com/blog/jpeg-compression-essentials
https://www.adobe.com/uk/creativecloud/photography/discover/lossy-compression.html
https://mjanja.ch/2023/05/evaluating-jpeg-webp-and-avif-for-pdf-thumbnails/
https://flothemes.com/flothemes-image-sizes/ Possible calculator? Seems not.
LLM prompt, still rough
I remember having heard that a JPEG quality of 92 is ideal, because it gives a visual result not much different than anything from 93 to 100, but obviously uses less memory due to the higher compression that exists at the higher qualities. Is there some kind of standard that exists in a document or a well-respected study that backs up this advice?
I would appreciate something that discusses perceptual quality having some kind of sweet spot after which increased quality is barely visually perceptual, but file size continues to grow.
And let's finish with this comment from libjpeg's README, quoted
here
on Stoyan's phpied.com
FILE FORMAT WARS================The ISO JPEG standards committee actually promotes different formats likeJPEG-2000 or JPEG-XR which are incompatible with original DCT-based JPEGand which are based on faulty technologies. IJG therefore does not andwill not support such momentary mistakes (see REFERENCES).We have little or no sympathy for the promotion of these formats. Indeed,one of the original reasons for developing this free software was to helpforce convergence on common, interoperable format standards for JPEG files.Don't use an incompatible file format!(In any case, our decoder will remain capable of reading existing JPEGimage files indefinitely.) `
It turns out that comes from the
README of v.8,
though the current version is v.9. An active fork called libjpeg-turbo,
used by some programs, including ImageMagick, still uses v.8, though.