Hybrid Bash + Python pipeline for processing OneDrive-hosted videos with batching and cloud file handling.
I did my best to generalize it for public viewing, but many parts are hardcoded for our specific use case. For example the band we are most interested in is hardcoded in terms of the horn reference sound matching.
This project came out of a real workflow problem. A teammate a while back had built scaffolding for an automated video cutting pipeline which compared the raw waveform of a horn sound audio signal (using librosa) and the waveform of the video we were trying to detect the horn sound in.
The goal was to crop it and keep the 10 seconds before the horn sound played and the 120 seconds after.
I refactored the pipeline in various ways including orchestrating a bash-side to scale processing of videos hosted on OneDrive. I also engineered better features than comparing raw wavelengths, added a sliding window comparison component, and then trained a logistic regression using the same windows in the properly extracted videos compared to the improperly extracted videos to improve detection even further.
Before my method 194 out of 407 processed videos failed meaning a fail rate of about 47.6% of videos.
After my model 49 out of 407 processed videos failed meaning a fail rate of about 12.1%
Look in:
vid_processing_modules/feature_extraction.pyvid_processing_modules/vid_detection_utils.py
The core idea is: instead of comparing raw waveforms, compare the frequency content of a 1-second window in a specific band.
What’s hardcoded here (and where):
- Band-pass region:
640–3400 Hz(see band mask inside bothprepare_horn_template()anddetect_horn()). - Template: a 1.0s clip from the reference horn file (
reference_audio/reference_event.wav). - Harmonics: multipliers
[1, 2, 3]to build a “harmonic index set” from the strongest bins of the horn template.
What actually gets computed (features):
peak_match— dot product between horn-template FFT bins and window FFT bins at the selected harmonic indices.peak_energy— total energy in those harmonic bins.total_band_energy— total energy in the whole 640–3400 Hz band.concentration—peak_energy / total_band_energy.raw_score—peak_match * concentration(this is the “no-model” score).
(See extract_detector_features() in feature_extraction.py and extract_window_features() in vid_detection_utils.py.)
Look in:
vid_processing_modules/vid_detection_utils.py(sliding_window_detection()anddetect_horn())
Detection scans the audio with:
- window size: 1.0s
- hop size: 0.05s (50ms)
It scores every window and keeps the best one (highest raw_score if no model, or highest model probability if a model is loaded).
Look in:
vid_processing_modules/vid_detection_utils.py(score_window())
If model_path is provided, the model is loaded with joblib and score_window() switches from raw_score to:
model.predict_proba([[peak_match, peak_energy, total_band_energy, concentration]])
The feature columns it expects are hardcoded as:
FEATURE_COLUMNS = ["peak_match", "peak_energy", "total_band_energy", "concentration"]
Model file paths to pay attention to:
batch_vid_processing.shpointsMODELat:models/event_logistic_model.joblibvid_processing_modules/model_training.pysaves to:feature_sets/horn_logistic_model_v2.joblib
So if you’re training with model_training.py and then running the batch crop pipeline, either:
- move/rename the trained joblib into
models/event_logistic_model.joblib, or - update
MODEL=...inbatch_vid_processing.shto point atfeature_sets/horn_logistic_model_v2.joblib
Look in:
vid_processing_modules/feature_matrix_extraction.pybatch_feature_extraction.sh
feature_matrix_extraction.py is the thing that actually builds window-level rows + labels and writes a CSV.
batch_feature_extraction.sh is the “run it across everything in batches” wrapper; it calls:
python vid_processing_modules/feature_matrix_extraction.py "$BATCH" "$REF" "$TRAINING_CSV"
and appends into the master CSV at:
feature_sets/event_training_features_master.csv(set byTRAINING_CSV=...inbatch_feature_extraction.sh)
Hardcoded parts inside feature_matrix_extraction.py:
WINDOWS = [(9.5, 10.5), (10.5, 11.5)]TARGET_SR = 16000- the example “failure list” (
FAILED_CUTS_new) used to assignlabel
Look in:
vid_processing_modules/model_training.py
What it does:
- reads:
feature_sets/horn_training_features_master.csv - uses the same four features:
peak_match, peak_energy, total_band_energy, concentration train_test_split(..., test_size=0.2, random_state=42, stratify=y)- pipeline:
StandardScaler()+LogisticRegression(class_weight="balanced", max_iter=1000) - converts probabilities to a class label using a hard threshold:
pred = (prob > 0.3).astype(int)- The 0.3 threshold is used in the training/evaluation script to classify test windows, but the main detection code uses model probabilities as scores and selects the highest-scoring window.
- prints confusion matrix + classification report
- saves error slices for manual review:
feature_sets/false_negatives.csvfeature_sets/false_positives.csv
- saves the trained model:
feature_sets/horn_logistic_model_v2.joblib
Note: batch_feature_extraction.sh writes to feature_sets/event_training_features_master.csv by default, but model_training.py reads feature_sets/horn_training_features_master.csv. Either rename the file, or change csv_path in model_training.py (or change TRAINING_CSV in the bash script) so they match.
If you’ve ever worked with OneDrive in a production setting, you already know the main issue: files aren’t always actually local. Between cloud-only states, inconsistent syncing, and large file sizes, just “looping over files” stops being reliable pretty quickly.
My solution was to build a pipeline that treats OneDrive like a semi-remote storage layer and processes files locally in controlled batches.
The workflow looks like this:
- Force files to download locally (
attrib -U) - Copy them into a local working directory (scratch space)
- Process them in batches using the existing Python script
- Move results back to OneDrive
- Clean up local files to avoid storage issues
- Log any failures for later inspection
The pipeline uses:
- Bash for orchestration, batching, and file/system operations
- Python for the actual signal-based video processing
This split keeps the system simple while still handling a pretty messy environment.
Look in:
batch_vid_processing.sh(cropping pipeline)batch_feature_extraction.sh(feature extraction pipeline)
batch_vid_processing.sh does:
- collects videos with
find "$SRC" -type f -iname "*.mp4" - batches with
get_next_batch(frompipeline_utils.sh) - hydrates + copies each file into
local_batch/vialocal_download(usesattrib -U+wait_for_stable_file+cpretries) - runs the python entrypoint:
python vid_processing_modules/video_event_detection.py "$BATCH" "$REF" "$MODEL" - copies outputs from
$BATCH/cropped_videos/into$DEST_PROCESSED/viamove_and_wait_outputs - unpins outputs and input files (
attrib +U) - logs anything missing via
identify_unprocessed_filesintologs/failed_files.txt - cleans up
local_batch/(mp4 + horn_audios + cropped_videos + csv)
Important:
attrib -U/+Uis Windows-specific. This is meant to be run in something like Git Bash on Windows / a Windows environment where those commands exist.
batch_feature_extraction.sh does:
- batches over
input_videos/ - hydrates + copies to
local_batch/ - runs:
python vid_processing_modules/feature_matrix_extraction.py "$BATCH" "$REF" "$TRAINING_CSV" - appends into
feature_sets/event_training_features_master.csv
The goal here wasn’t just to “get it working,” but to make the workflow reliable when dealing with:
- cloud-backed file systems
- large datasets
- limited local storage
I also wanted to show how shell scripting can still be useful for system-level orchestration alongside Python.(and to demonstrate practical shell scripting for system-level orchestration)
-
The included Python script is a simplified version of the original.
-
In theory there would be a virtual environment inside of
venv/that would activate the proper python package installations needed to run modules such as librosa etc. (the bash scripts assumesource venv/Scripts/activate) -
ffmpegis required because cropping is done via a direct ffmpeg call (seecrop_video()invid_detection_utils.py). -
The file
failed_files.txt, and the directorylocal_batch/are meant to simulate the kind of output you would get when running the pipeline.
DISCLAIMER: I am not an audio expert. These features were based on methods I found were common practice and worked for my purpose. I am sure there are better alternatives.