From 66f361f1699661a43d4a02d6b6cd74629436ee07 Mon Sep 17 00:00:00 2001 From: Oscar V Date: Mon, 24 Nov 2025 23:32:23 -0800 Subject: [PATCH] docs: reorganize documentation structure and clean up examples - Replace outdated SUMMARY.md with comprehensive INDEX.md - Add organized documentation index with sections for different user types - Remove unused libcom_err.so binary from examples directory --- docs/INDEX.md | 62 ++++++++++++++++++++++ docs/SUMMARY.md | 113 ----------------------------------------- examples/libcom_err.so | Bin 22376 -> 0 bytes 3 files changed, 62 insertions(+), 113 deletions(-) create mode 100644 docs/INDEX.md delete mode 100644 docs/SUMMARY.md delete mode 100644 examples/libcom_err.so diff --git a/docs/INDEX.md b/docs/INDEX.md new file mode 100644 index 0000000..911bd32 --- /dev/null +++ b/docs/INDEX.md @@ -0,0 +1,62 @@ +# BinarySniffer Documentation + +Welcome to the BinarySniffer documentation. This index will help you find the information you need. + +## Getting Started + +- **[Installation Guide](INSTALLATION.md)** - Step-by-step installation instructions for BinarySniffer +- **[User Guide](USER_GUIDE.md)** - Complete guide to using BinarySniffer CLI and library API + +## Core Features + +- **[Detailed Features](DETAILED_FEATURES.md)** - Comprehensive overview of BinarySniffer's capabilities +- **[Architecture](ARCHITECTURE.md)** - System architecture and design principles +- **[Signature Management](SIGNATURE_MANAGEMENT.md)** - Managing and updating signature databases + +## Signature Creation + +- **[Creating Signatures](CREATING_SIGNATURES.md)** - Guide to creating new component signatures +- **[Signature Creation](SIGNATURE_CREATION.md)** - Advanced signature creation techniques and best practices + +## Advanced Topics + +- **[ML Security Analysis](ML_SECURITY.md)** - Security scanning and analysis for ML frameworks +- **[TLSH Fuzzy Matching](TLSH_FUZZY_MATCHING.md)** - Using TLSH for fuzzy hash matching +- **[TLSH Setup Guide](TLSH_SETUP_GUIDE.md)** - Installing and configuring TLSH support + +## Reference + +- **[API Reference](API_REFERENCE.md)** - Python API documentation and examples +- **[Package Verification](PACKAGE_VERIFICATION.md)** - Verifying BinarySniffer packages and integrity + +## Scripts and Examples + +- **[create_tlsh_example.py](create_tlsh_example.py)** - Example script for TLSH hash creation + +--- + +## Quick Links + +### For New Users +1. Start with [Installation Guide](INSTALLATION.md) +2. Follow the [User Guide](USER_GUIDE.md) +3. Explore [Detailed Features](DETAILED_FEATURES.md) + +### For Developers +1. Review [Architecture](ARCHITECTURE.md) +2. Check [API Reference](API_REFERENCE.md) +3. Learn about [Signature Creation](SIGNATURE_CREATION.md) + +### For Security Analysis +1. Read [ML Security Analysis](ML_SECURITY.md) +2. Understand [TLSH Fuzzy Matching](TLSH_FUZZY_MATCHING.md) +3. Configure with [TLSH Setup Guide](TLSH_SETUP_GUIDE.md) + +--- + +## Getting Help + +If you need help or have questions: +- Check the relevant documentation section above +- Review the main [README.md](../README.md) in the project root +- Report issues at [GitHub Issues](https://github.com/SemClone/binarysniffer/issues) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md deleted file mode 100644 index 709c655..0000000 --- a/docs/SUMMARY.md +++ /dev/null @@ -1,113 +0,0 @@ -# Semantic Copycat BinarySniffer - Implementation Summary - -## Overview - -Successfully restructured the xmonkey-curator project into a new, efficient implementation called **binarysniffer** v1.0.0. - -## Key Achievements - -### 1. **Complete Architecture Redesign** -- Replaced naive Trie-based string matching with progressive three-tier matching -- Implemented SQLite-based signature storage with compression (90% size reduction) -- Created memory-efficient analysis using <100MB RAM (vs 500MB-2GB previously) - -### 2. **Core Implementation** -- **Dual Interface**: CLI tool (`binarysniffer`) and Python library API -- **Progressive Matching**: - - Tier 1: Bloom filters for quick elimination (microseconds) - - Tier 2: MinHash LSH for similarity search (milliseconds) - - Tier 3: Detailed database matching (seconds) -- **Feature Extraction**: Binary string extraction with categorization (functions, constants, imports) - -### 3. **Performance Improvements** -- Analysis speed: 10-50ms per file (20-100x faster) -- Parallel processing support -- Streaming analysis for large files -- Efficient indexing with trigrams and MinHash - -### 4. **Testing & Validation** -- Created comprehensive test suite (32 tests, 29 passing) -- Successfully built distribution packages (wheel and tarball) -- Validated detection of real components: - - Test binary: Detected OpenSSL and curl signatures - - Real curl binary: Successfully identified curl with 82.5% confidence - -## Package Structure - -``` -binarysniffer/ -├── binarysniffer/ # Core library -│ ├── core/ # Analyzer, config, results -│ ├── extractors/ # Feature extraction -│ ├── matchers/ # Progressive matching -│ ├── storage/ # Database and updates -│ ├── index/ # Bloom filters, MinHash -│ └── utils/ # Hashing utilities -├── tests/ # Test suite -├── examples/ # Usage examples -└── dist/ # Built packages - ├── semantic_copycat_binarysniffer-1.0.0-py3-none-any.whl - └── semantic_copycat_binarysniffer-1.0.0.tar.gz -``` - -## Usage Examples - -### CLI -```bash -# Analyze single file -binarysniffer analyze /path/to/binary - -# Analyze directory -binarysniffer analyze /path/to/project -r - -# Update signatures -binarysniffer update -``` - -### Python API -```python -from binarysniffer import BinarySniffer - -sniffer = BinarySniffer() -result = sniffer.analyze_file("/path/to/binary") -for match in result.matches: - print(f"{match.component}: {match.confidence:.1%}") -``` - -## Technical Features - -1. **Smart Signature Storage** - - SQLite with ZSTD compression - - Trigram indexing for substring matching - - Pre-computed MinHash for similarity search - -2. **Efficient Matching Algorithms** - - MinHash with 128 permutations - - LSH with 16 bands for candidate selection - - Tiered bloom filters (0.1%, 1%, 10% false positive rates) - -3. **Extensible Design** - - Plugin-based extractor system - - Configurable analysis parameters - - Support for signature updates - -## Migration from XMonkey-Curator - -- Created migration script (`migrate_from_xmonkey.py`) -- Addresses all major pain points: - - Memory usage: 500MB-2GB → <100MB - - Analysis speed: 1-5s → 10-50ms per file - - False positives: ~25% → <5% - - Signature storage: 100MB+ JSON → 50MB SQLite - -## Next Steps - -The implementation is ready for: -- PyPI publication -- Integration with CI/CD pipelines -- Extension with additional extractors (source code, archives) -- Machine learning enhancements for signature quality - -## Conclusion - -Successfully transformed an inefficient prototype into a production-ready tool with 20-100x performance improvements while maintaining the core functionality of detecting open source components in binaries. \ No newline at end of file diff --git a/examples/libcom_err.so b/examples/libcom_err.so deleted file mode 100644 index 4a8cd177b0742ecae648885cb43634715bade41d..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 22376 zcmeHPeRNdSwLkfSRLD$FhEh=(B(#ajm;kY$MlxXncgg?}f*)0T8D=IQl4R2PK;p*( zR}vP}A)5BJulk*J6Uc*OofV59wn6Iq(2Bw`8P#>U^C-Pt#UHIYQ>*LH%u~drdByh+Htv= zzQdPEhlv$Q|Ux2GhW#a(-BCc7u<|ynOMTt;;RBx5I&c}5&E)M7M zO%6I0*txhS<2nP^JY3UpS%a!_L8hK%O~Cz`xV~h;rl|X`C`#p-forac6Pc~<_}JUdUeZE4PW!Gj3fpyRPat@49E4dh4vf$AP66l`P%S@z6+P ze{kCW%z5O|y_>36tRH`O;P2PH1hJf<1qM zevpU%>v`~p^5nZD4}K~S{y-l60{q#}4%zxh9zDOwgCEL+-<2od$#_7|6zJ9FQ*J^$ zYZ6hS@Pi89r*MbVKUI{ghweP;=e3yFIE;V71RQQnU;Qk$P3LOfsriT*$9Ub97Dk*A7vT6*5IzlmIQmcof zdS^Tq+5)Siu}HWl1B0qf89=;CLf{NVBH@UL#!%P@>}u(bz}gl)(6&j)B)u~fjrv>7 zJh}psvX~wb!2qmAwz0ISu}DXtyGKdK$Ypw$zcU18|Hg16Cj7ymZmO1vVq<$(kTe9t zo&NSN5eapMH-|DfEZkC+Xz7l}qN3%dNIMFz>zl1s>)tgwk^`cpr6V3~le_^D2zQc7 zeY2l$A&1%yY@(u?^$-cc%{p_og~OXfNBc%{cQoGJ4bxrGu&Y{Pp`LiGy(3CM4FQ2j zPj@U#(S;O+rf|n*iY}0;ba#ooGtDp=Ij4pCDsQvUydF>EYUxImon9>yR`|RP%k%}V zMS|}ZTJ9IPYDLAGwcZup)fcyw%1vGD@IcW%e6$R?*3h{UPZ9W^PQ9nk3(u(l+hdZ7x=J_V| zi5y7pOvHIye)z?2W6Cy3d@ChiB(78GBs0Ce9bM&QagV}cUuwq`vPAq?#p7pFyjc7~ z$@6;#&k0Y?F>@aDnUwK83co80Z*rP^qA8P~pbT!3vQji4OwvOJJkIaoSy^AW0wV|cRvohEjYbbaTvDX^q$D!84GTGmpW*{ zt?zFyTX5?;=fmK-I$ z2vRPZlJm;=2;$e>Fj}$|Otiq|zJ4SzYB-(%>`A=-BthDMqth;gH!-XwOI{+y$;CCp zhT}tgvl*igcO*deSOkJ8fqT;eqa<(v1X>2kkI?ojAR0ylarQsGd5LA)>Qx>Tl%SCjP)3XaiC-!;srO1jF8lkWV+7C?ixc&)<&Uv zw4{XCpmDThDJe!U90{Z(N{&$0)JD>>RNu1P17&a8gxIqMlG;G+Ps@nZ)xcx$JH8jC zC6u2A%6>%reI|d8$zLG(cV_XMZ3jjgjvgd=6Z^>*)5sEE;_wZq-4^?7j}T8`%@x^o zf=5d{O+2rnScwCM<4hQiRyD($IBJpNHY?lAO^`bBfYjr@+I^LKy}Nn6miRjgaSSQz z%(|_88~29eGW>pGOveS`ZO zZrzyoE|ToCMPtn|vovGr#1Y`Wfx4f<5MRPD9E-u~PCVT|Z0|ppvQL&giX`|Z?iddO zkoA`&z~0v^8QMyd>q~u0-ieP%1z$`nApT@y`I49I+p8q&dg!}+pD%gQAGChM-WNtm zOoNW^p>^N1RmQw)q{(IzY+6V7zd&lSY4p(c5mv*c$+`S*3 zwRy8U@eEbR3!cPYE%A(&cme!-wY_6=wI#1?)+EnbcVhU+YqH+yk-rQ1dJ^N@M|fe3 z*+)Et1l30z!uQg#$$vKtWsjRatc&qZ6kqD;9jQLBqCo7JJ$?g<*)k~o_*)3ciikU@ z{E#-wANTqO=6@H<)#KG*mhv8|etV`NW8g3i`EuHjm1M|+XhvrH_YQQHu@`RvzmN2^ zf=5dn_a#0s909Un+ihhcRwnD+czWAy=s7+(&?ubRz-7~e*=Dy)XRRa_IJ279JBG4TH8NEJn8rq#a^`c6;jhF`45>E zzvZcNBx3x4HuFKX_tEwKo$;p-FT4FwZv(^%g z0MSQ2i)Q!?nqX|2u}@1bI;i!(ADd)2njtS0tz$*;={3XDLd}+mXJs4o%C3G0ES9dm z=L_ua$B>}v{Z)L2A6xpjRJ~JsY0H%RQc7-XTJAoQ`|ppXYX`xwhYAbV{=Ae(-5wCb z@qIAuo(gNmuR}XZcb9mclVy9LtpC^c{y!t#-He{P9l&~31*>pxIDP_o^7~_8Nje%y z0Df^nrzN0g0kBRugDb*5XiWK$^~ zR1Mw5l0U@9wM2p8sD}!v>jX9gXQ|*s7&oI2kuzN?7(K|k=_oI*Fue%SBPi6lCfya@xF>f7oqkYk6=S@8C zZuTY)Va$7P9VTDHNTMfdu9jSAIKD%%9fTSFS>xOxKc^`4Z#4 z#GfUT;aDK$7Cwd19YwiS#n6=?)+J+?@gQN$3jvV!`H~A|mG7^hQjPi&?;DPCCF_*3 z`;c;EAT@qVf0w-pfARdIssy3~bw8Ibyd2!=DoW2kh+oY=J}Q>{w@d!1S^P8~@M6$` zlYU1|`owe+a%~+eM~_L5)*je}arc`rQ+D^m*yEEJL5(-ysp^&JTRs}Hj$`JTq;9r< zShIWf`4WFM4oZm!Az@B_m!`WY8t>)Yl{PFUUg2Ts7ckcx4Q2vY&9x2+#!f=Dd@_EE znB~d1KPB-OBq)g~BypUKNtV1uitv=XU?q%_L&jT}!48!D0KVMwPbeSSO|$9Ot3eFM z8|PDD)7)df#He=U`9baWqc0PC!dJ1_m+(~7`*5fA4adG}ukuth*{ggN8|+oh72Wo# zdn&ep(_VF7#h|1QR18V_aK$c3pQsoH4Qk2adFYa$(o-=LUHCv>#bFtJeYCiOJ~dJT zo;gp2lkUn#Jr%`xdw`8IVWXCWR8YIn)DbV$Zr@M#jo>{7sn+pKR%smuRI1Wh@6wY$ z|A*u+8jg<5;RwG zeD}B&D#k8O&nsj-PRsTBk6_w-KcVsz!PyAGlc_~Y& z9B-L0Ua4Ne9R1@zH2gS17EOfzR}Z*c7vq0>SD+&v3|+jbc7Cia67mOKZDM`b zrmpZ!T~4`&%ozv=Lz(;9T-anMH|Vv`zarQk)FbgO=bSocbxM##oNPQc+L^oTX#3oy z&b6UHsC{!NsI<=uIxFWzt1vjE_{@!VpTTKr=B;4ZJJ=p=)A2XPSxTB?Q(Q)R_MSR( z#~g3nJI$W7eR3NhI)#R1p2p=XH1F3|`c|!8({$z9bi(+uCp3 z)X~`$?*3*Z8jEkfX-m(|)inziE~>rctBaTD;!>wjMSwq%xwDR3m=0O~I4yrJKep$I zf>}kE4MUXnWq6Jo#;c&spv48~MnUWG+}#LTiSeKtGzQuS`dy?iI%yc^fW|;+DsmsF z26_nedQi{XhOr+sc*-y~K+mD~41@MS^?hI%hd|!|Jq23vp<$F3K_BQjpr=5u2d%(V zsvEQ!v=6ioQ{sc5i_lRW1ue#t?}0tMH>Kd_wW454dBLoylS>DYZw1Oi`#v@tGmLUj zVJly5oB1{Sl$%PoiKS;>cEQ4m^9d$-4=xRSn0KZE;n(9TKVcXP2=&;?J8h+M-xsx* z#lR^K!b$%QT+J425#smY>ISw6R5aMicNQ+Q&D>t(u{qbCVVeoT23zThDeb9ydV$0X zw>X4r?5JV9PjXkKZvryLg zbhNAc(eAD!o8ab-GnBIsrMg;#^f9FK-f6-#VAIh?Y4%Csa$MH~D+jil0N8#9**;Lz zXmjo?_Sh=7m%xj?lL|Xc>r=|Hk4|cl#kk3GI zBh?@B**?j`^>-br12vjee`pOeDPKGr54>*}58_^|upKTa9FnD{y0{2w^?2@{rLy60R#1r+nyQOsrrtKlUiw$A=jz^)?ZrEb1`1_+rRQ@u^zO@{w{RueEAm1`dB`th zTp;G>faPpWiUg_Xf+5}rSN*JZ|e4>RxUpJl@MJ+v8kFEj7&r#Te` znmBq8g>oJSq)7?V`WXegiqjZOfyQA9N{u+Y+KgkN zPKFJtAZ+KLV#JcP3@ndnxU}bfOJ&#zfhrEX8p zIgLuV)GLu^nu zKd(XHq=(k>C{&n4^neziSE;Ts0q$S!M3nfuv-p3k_??QM`-LYI|B&L}WU3P{D4hF4 z`3?!}Fs=gp;(eR4lgdfq7%sy3yas;W-;=malxNvF1&$^=tDRs#;Qr=3;8ZS=Whbo- z5PrDYLUhWs<8{Ko#_@(wG0?pJSDc3$E%4P^gpRruvu`1kXa_g=-X&rtUrQ~f4#CFQ5jy5_?d#o9p;CVrJs3xWqc~|0{k+V$1%n)1fHwDY9+r-)GPng zD}gH{K11wM_OO5Q`N1Rk%fO6il73prO}7I?C_U7I1TuuC;Ru{v~J+cR)Y9!$$xr6Lmr&g?nr<3@5DQS=lWfF zERUXl&4WLZ2meDJ{8%2G=0jv>_V3fb2t1da^A!Is)vkE$XtBbFzGC`Fu2tc7rNmG7 z+a|@I{k#4i;Lg+8#rN{)c_v36Hj1aJ(2 zA2w1grlYf_mDuyo*L2VH*t zP?4N@FU&4-xFeX8aY0VBCMR0W7U5$FKCbxV9qD5Sx+CEL&OpdMiNF=0j8QB2`qI)O z<@huSJ=UoQ=y-}KG7EC2sKG78`h`o^ zF%#DtHQi*?JZptCMd&MhYZ}}>ea-Ua>l&N&W_N?HF|C45W-x0`*Ksn3$^d1LhV?do zSCG!=@Lapvy~?{Rl`TFj$4Pz>CqK|hA(mr6a_G%GEh9s*)hhG5V-9)qV3ABYbwmk{^`MhG zatNu)OL@~gu_C)5IN>A5cS=~5J})GP?#xe>pLOsGl@EPL4jptJNFbw3boNRHAQkf? zlX;#?#`z&BA!-IzkE(N3a^$XO9Q%?|Ky}R}&j76lxUfYz)XGCnG6YkHtz@9t=a{4t zVZY_@8S|)^9QmBy*{I`SQi`Y~=3y%t5TCS?0h*_-WI$|%`f6Omjp*Gd^wFM#1F;m{`1BH~}dnosJ;_x1qDZVip73jA+JJF}+ z7~B6Z5S@pHp&xT`75GL`xs4Y|m|n`j?}eC7$w2FT_zV|jFw`=;C`OVeo@{wa=QP|3Ue9F8>zQmn%d`J( zgbeuv(>60WGbyjzC{*Uf_e8sKS>?-}gkoLSDM@O5*Nmmtd&?CN2eryOon``4wpX#I z%Fj6M!L!O&D|x0HvkJiZvcA47d44};%Im<)PrkN>Zy`eYr|Pd-r7BA2s#$~68o~W{ za9QQ^orlR{+4&D;$qz0tIhh{hidPEQZz?K&h8uDbv+=qIpO=}<&vB-YLe_4P=XI7r z+E{>)?QfQ2x(__m_BelDC*X5M>y=}<+$_g>&2`4KJg@H`zJhWW!kOiNmS_4K@DK~< zpIX1D2Z&H+RH{M+O;X*AU_m~qaRd8gqfm;A^YljAjJR5V-u Fe*pxUC`|wW