@@ -6,11 +6,12 @@ This file provides guidance to WARP (warp.dev) when working with code in this re
66
77** matchy-java** is a Java wrapper for [ matchy] ( https://github.com/matchylabs/matchy ) , providing JNA bindings to the native matchy library for fast IoC (Indicator of Compromise) matching.
88
9- ** Status** : 🚧 Work in Progress
9+ ** Status** : ✅ Core Features Complete
1010- JNA bindings implemented (NativeLoader, MatchyLibrary, NativeStructs)
1111- Core wrapper classes implemented (QueryResult, DatabaseStats, OpenOptions, MatchyException)
12- - Database and DatabaseBuilder classes pending
13- - No tests or CI/CD yet
12+ - Database and DatabaseBuilder classes implemented
13+ - Extractor API implemented (Extractor, ExtractedMatch, ExtractFlags, ItemType)
14+ - Unit tests for Database, DatabaseBuilder, and Extractor
1415
1516### Architecture
1617
@@ -23,8 +24,12 @@ matchy-java/
2324│ │ ├── NativeLoader.java # Platform detection & native library loading
2425│ │ ├── MatchyLibrary.java # JNA interface to matchy C API
2526│ │ └── NativeStructs.java # JNA structure mappings (MatchyResult, etc.)
26- │ ├── Database.java # Main public API (TODO)
27- │ ├── DatabaseBuilder.java # Database builder API (TODO)
27+ │ ├── Database.java # Main public API for querying
28+ │ ├── DatabaseBuilder.java # Database builder API
29+ │ ├── Extractor.java # IoC extraction from text
30+ │ ├── ExtractedMatch.java # Single extracted match
31+ │ ├── ExtractFlags.java # Extraction type flags
32+ │ ├── ItemType.java # Enum of extractable item types
2833│ ├── QueryResult.java # Query result wrapper
2934│ ├── DatabaseStats.java # Database statistics
3035│ ├── OpenOptions.java # Database open configuration
@@ -222,59 +227,42 @@ OpenOptions options = OpenOptions.defaults()
222227Database db = Database . open(" threats.mxy" , options);
223228```
224229
225- ## Key TODOs
226-
227- Based on README.md, these are the next implementation priorities:
228-
229- ### 1. Database Class (High Priority)
230-
231- Implement the main Database API:
232- - ` static Database open(String path) ` and ` open(String path, OpenOptions options) `
233- - ` static Database fromBuffer(byte[] buffer) `
234- - ` QueryResult query(String text) ` - main query method
235- - ` void clearCache() ` - clear LRU cache
236- - ` DatabaseStats getStats() ` - get query statistics
237- - ` String getMetadata() ` - database metadata
238- - ` boolean hasIpData() ` , ` hasStringData() ` , ` hasGlobData() ` , ` hasLiteralData() `
239- - ` String getFormat() ` - get database format version
240- - ` void close() ` - free native resources
241- - Implement ` AutoCloseable ` for try-with-resources
242-
243- Reference the C API in MatchyLibrary for all available functions.
244-
245- ### 2. DatabaseBuilder Class (High Priority)
246-
247- Builder for creating databases programmatically:
248- - ` DatabaseBuilder() ` constructor
249- - ` DatabaseBuilder add(String key, JsonObject data) ` - add entry with metadata
250- - ` DatabaseBuilder add(String key, String jsonData) ` - add entry with JSON string
251- - ` DatabaseBuilder setDescription(String description) ` - set database description
252- - ` Database build() ` - build in-memory database
253- - ` void save(String path) ` - save to file
254- - ` byte[] toBytes() ` - serialize to byte array
230+ ### Extractor Pattern
255231
256- ### 3. Unit Tests (High Priority)
232+ Use the Extractor to find IoCs (Indicators of Compromise) in text:
257233
258- Create test files in ` src/test/java/com/matchylabs/matchy/ ` :
259- - ` DatabaseTest.java ` - test open, query, close
260- - ` DatabaseBuilderTest.java ` - test database creation
261- - ` QueryResultTest.java ` - test result parsing
262- - ` OpenOptionsTest.java ` - test configuration
263- - ` ExceptionTest.java ` - test error handling
234+ ``` java
235+ // Extract all supported types
236+ try (Extractor extractor = Extractor . create(ExtractFlags . ALL )) {
237+ List<ExtractedMatch > matches = extractor. extract(
238+ " Contact user@example.com at 192.168.1.1 about evil.com" );
239+
240+ for (ExtractedMatch match : matches) {
241+ System . out. println(match. getItemType() + " : " + match. getValue());
242+ }
243+ }
264244
265- Use JUnit 5 (already in pom.xml dependencies).
245+ // Extract only specific types
246+ int flags = ExtractFlags . DOMAINS | ExtractFlags . IPV4 | ExtractFlags . IPV6 ;
247+ try (Extractor extractor = Extractor . create(flags)) {
248+ List<ExtractedMatch > matches = extractor. extract(text);
249+ // Only domains and IPs are extracted
250+ }
251+ ```
266252
267- ### 4. Processing API (Medium Priority)
253+ Supported extraction types (see ExtractFlags):
254+ - DOMAINS - domain names (e.g., "example.com")
255+ - EMAILS - email addresses
256+ - IPV4 / IPV6 - IP addresses
257+ - HASHES - file hashes (MD5, SHA1, SHA256, SHA384, SHA512)
258+ - BITCOIN / ETHEREUM / MONERO - cryptocurrency addresses
259+ - ALL - extract everything
268260
269- Implement batch processing utilities (matching Rust processing module):
270- - ` Worker.java ` - batch processing with extractor + multiple databases
271- - ` FileReader.java ` - streaming file I/O with gzip support
272- - ` MatchResult.java ` - match results without file context
273- - ` LineMatch.java ` - match results with line numbers
261+ ## Key TODOs
274262
275- See ` native/matchy/WARP .md` "Processing Module API" section for design.
263+ Based on README .md, these are the next implementation priorities:
276264
277- ### 5 . CI/CD (Medium Priority)
265+ ### 1 . CI/CD (Medium Priority)
278266
279267Create ` .github/workflows/ci.yml ` :
280268- Build native library (Rust) for Linux/macOS/Windows
@@ -283,7 +271,7 @@ Create `.github/workflows/ci.yml`:
283271- Generate Javadoc
284272- Create release artifacts with platform-specific native libraries
285273
286- ### 6 . Examples and Documentation (Low Priority)
274+ ### 2 . Examples and Documentation (Low Priority)
287275
288276Create example programs in ` examples/ ` :
289277- ` BasicQuery.java ` - simple query example
0 commit comments