Skip to content

Commit 4954672

Browse files
committed
Add Extractor API for IoC extraction
- Add Extractor, ExtractedMatch, ExtractFlags, ItemType classes - Add JNA bindings for extractor C API (MatchyLibrary, NativeStructs) - Add ExtractorTest with comprehensive test coverage - Update submodule to include new extractor C API - Update WARP.md and README.md documentation - Clean up .gitignore (remove vestigial entry)
1 parent 9075148 commit 4954672

11 files changed

Lines changed: 923 additions & 67 deletions

File tree

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,7 @@ hs_err_pid*
2929
# Java
3030
*.log
3131

32-
# Native libraries (bundled in CI, not committed)
33-
java/src/main/resources/native/
32+
# Native library files (bundled at build time into target/classes)
3433
*.dylib
3534
*.so
3635
*.dll

README.md

Lines changed: 150 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,162 @@
22

33
Java wrapper for [matchy](https://github.com/matchylabs/matchy) - fast IoC matching for threat intelligence.
44

5-
## Status
5+
## Installation
6+
7+
Download the JAR from [Releases](https://github.com/matchylabs/matchy-java/releases) and add it to your classpath.
8+
9+
## Quick Start
10+
11+
### Querying a Database
12+
13+
```java
14+
import com.matchylabs.matchy.*;
15+
import java.nio.file.Paths;
16+
17+
try (Database db = Database.open(Paths.get("threats.mxy"))) {
18+
// Query an IP address
19+
QueryResult result = db.query("192.168.1.1");
20+
21+
if (result.isMatch()) {
22+
System.out.println("Match found!");
23+
System.out.println("Data: " + result.getData());
24+
System.out.println("Prefix length: " + result.getPrefixLength());
25+
}
26+
27+
// Query a domain
28+
QueryResult domainResult = db.query("evil.example.com");
29+
}
30+
```
31+
32+
### Building a Database
33+
34+
```java
35+
import com.matchylabs.matchy.*;
36+
import java.nio.file.Paths;
37+
import java.util.Map;
38+
39+
try (DatabaseBuilder builder = new DatabaseBuilder()) {
40+
// Add IP addresses and CIDRs
41+
builder.add("1.2.3.4", Map.of("threat", "malware", "confidence", 95));
42+
builder.add("10.0.0.0/8", Map.of("type", "internal"));
43+
44+
// Add patterns (glob syntax)
45+
builder.add("*.evil.com", Map.of("category", "phishing"));
46+
builder.add("malware-*.example.org", Map.of("category", "c2"));
47+
48+
// Set metadata
49+
builder.setDescription("Threat database v1.0");
50+
51+
// Save to file
52+
builder.save(Paths.get("threats.mxy"));
53+
54+
// Or build in-memory
55+
try (Database db = builder.build()) {
56+
QueryResult result = db.query("1.2.3.4");
57+
}
58+
}
59+
```
60+
61+
### Configuration Options
62+
63+
```java
64+
import com.matchylabs.matchy.*;
65+
import java.nio.file.Paths;
66+
67+
// Custom cache size
68+
OpenOptions options = OpenOptions.defaults()
69+
.cacheCapacity(100_000); // LRU cache for query results
70+
71+
try (Database db = Database.open(Paths.get("threats.mxy"), options)) {
72+
// Queries are cached for faster repeated lookups
73+
}
74+
75+
// Auto-reload on file changes
76+
OpenOptions watchOptions = OpenOptions.defaults()
77+
.autoReload(true); // Automatically reload when file changes
78+
79+
// Disable caching entirely
80+
OpenOptions noCacheOptions = OpenOptions.defaults()
81+
.noCache();
82+
```
83+
84+
### Error Handling
85+
86+
```java
87+
import com.matchylabs.matchy.*;
88+
import java.nio.file.Paths;
689

7-
**Core functionality complete** - Ready for use. Download fat JARs from [Releases](https://github.com/matchylabs/matchy-java/releases).
90+
try (Database db = Database.open(Paths.get("threats.mxy"))) {
91+
QueryResult result = db.query("192.168.1.1");
92+
// ...
93+
} catch (MatchyException e) {
94+
// Handle database errors (file not found, corrupt data, etc.)
95+
System.err.println("Matchy error: " + e.getMessage());
96+
}
97+
```
898

9-
## Completed
99+
## API Reference
100+
101+
### Database
102+
103+
Main class for querying matchy databases.
104+
105+
| Method | Description |
106+
|--------|-------------|
107+
| `Database.open(Path)` | Open a database file |
108+
| `Database.open(Path, OpenOptions)` | Open with custom options |
109+
| `Database.fromBuffer(byte[])` | Open from memory |
110+
| `query(String)` | Query IP address or pattern |
111+
| `getStats()` | Get query statistics |
112+
| `clearCache()` | Clear the LRU cache |
113+
| `getMetadata()` | Get database metadata as JSON |
114+
| `hasIpData()` / `hasStringData()` | Check data types |
115+
| `close()` | Free resources (use try-with-resources) |
116+
117+
### DatabaseBuilder
118+
119+
Create databases programmatically.
120+
121+
| Method | Description |
122+
|--------|-------------|
123+
| `add(String, Map)` | Add entry with data |
124+
| `addJson(String, String)` | Add entry with JSON string |
125+
| `setDescription(String)` | Set database description |
126+
| `save(Path)` | Save to file |
127+
| `build()` | Build in-memory Database |
128+
| `toBytes()` | Build as byte array |
129+
130+
### QueryResult
131+
132+
| Method | Description |
133+
|--------|-------------|
134+
| `isMatch()` | Whether query matched |
135+
| `getData()` | Match data as JsonObject |
136+
| `getDataAsJson()` | Match data as JSON string |
137+
| `getPrefixLength()` | Network prefix (for IP matches) |
138+
139+
## Requirements
140+
141+
- **Java**: 11+
142+
- **Platforms**: Linux (x86_64, aarch64), macOS (x86_64, aarch64), Windows (x86_64)
143+
144+
## Thread Safety
145+
146+
- `Database` instances are thread-safe for queries
147+
- `DatabaseBuilder` is NOT thread-safe (use from a single thread)
148+
149+
## More Information
150+
151+
- [matchy documentation](https://matchylabs.github.io/matchy/) - concepts, file format, CLI
152+
- [matchy repository](https://github.com/matchylabs/matchy) - native library
153+
154+
## Status
10155

11-
- ✅ JNA bindings (NativeLoader, MatchyLibrary, NativeStructs)
12-
- ✅ Core wrapper classes (QueryResult, DatabaseStats, OpenOptions)
13-
- ✅ Database class (open, query, close, stats)
14-
- ✅ DatabaseBuilder class (create databases programmatically)
15-
- ✅ Exception handling (MatchyException)
16-
- ✅ Unit tests
17-
- ✅ GitHub Actions CI/CD (multi-platform, Java 11)
18-
- ✅ Fat JAR releases with bundled native libraries
156+
**Core functionality complete** - Ready for use.
19157

20158
## TODO
21159

22-
- [ ] Documentation and examples
160+
- [ ] Extractor API (extract IPs, domains, hashes from text)
23161
- [ ] Maven Central deployment
24162

25163
## License

WARP.md

Lines changed: 40 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ This file provides guidance to WARP (warp.dev) when working with code in this re
66

77
**matchy-java** is a Java wrapper for [matchy](https://github.com/matchylabs/matchy), providing JNA bindings to the native matchy library for fast IoC (Indicator of Compromise) matching.
88

9-
**Status**: 🚧 Work in Progress
9+
**Status**: ✅ Core Features Complete
1010
- JNA bindings implemented (NativeLoader, MatchyLibrary, NativeStructs)
1111
- Core wrapper classes implemented (QueryResult, DatabaseStats, OpenOptions, MatchyException)
12-
- Database and DatabaseBuilder classes pending
13-
- No tests or CI/CD yet
12+
- Database and DatabaseBuilder classes implemented
13+
- Extractor API implemented (Extractor, ExtractedMatch, ExtractFlags, ItemType)
14+
- Unit tests for Database, DatabaseBuilder, and Extractor
1415

1516
### Architecture
1617

@@ -23,8 +24,12 @@ matchy-java/
2324
│ │ ├── NativeLoader.java # Platform detection & native library loading
2425
│ │ ├── MatchyLibrary.java # JNA interface to matchy C API
2526
│ │ └── NativeStructs.java # JNA structure mappings (MatchyResult, etc.)
26-
│ ├── Database.java # Main public API (TODO)
27-
│ ├── DatabaseBuilder.java # Database builder API (TODO)
27+
│ ├── Database.java # Main public API for querying
28+
│ ├── DatabaseBuilder.java # Database builder API
29+
│ ├── Extractor.java # IoC extraction from text
30+
│ ├── ExtractedMatch.java # Single extracted match
31+
│ ├── ExtractFlags.java # Extraction type flags
32+
│ ├── ItemType.java # Enum of extractable item types
2833
│ ├── QueryResult.java # Query result wrapper
2934
│ ├── DatabaseStats.java # Database statistics
3035
│ ├── OpenOptions.java # Database open configuration
@@ -222,59 +227,42 @@ OpenOptions options = OpenOptions.defaults()
222227
Database db = Database.open("threats.mxy", options);
223228
```
224229

225-
## Key TODOs
226-
227-
Based on README.md, these are the next implementation priorities:
228-
229-
### 1. Database Class (High Priority)
230-
231-
Implement the main Database API:
232-
- `static Database open(String path)` and `open(String path, OpenOptions options)`
233-
- `static Database fromBuffer(byte[] buffer)`
234-
- `QueryResult query(String text)` - main query method
235-
- `void clearCache()` - clear LRU cache
236-
- `DatabaseStats getStats()` - get query statistics
237-
- `String getMetadata()` - database metadata
238-
- `boolean hasIpData()`, `hasStringData()`, `hasGlobData()`, `hasLiteralData()`
239-
- `String getFormat()` - get database format version
240-
- `void close()` - free native resources
241-
- Implement `AutoCloseable` for try-with-resources
242-
243-
Reference the C API in MatchyLibrary for all available functions.
244-
245-
### 2. DatabaseBuilder Class (High Priority)
246-
247-
Builder for creating databases programmatically:
248-
- `DatabaseBuilder()` constructor
249-
- `DatabaseBuilder add(String key, JsonObject data)` - add entry with metadata
250-
- `DatabaseBuilder add(String key, String jsonData)` - add entry with JSON string
251-
- `DatabaseBuilder setDescription(String description)` - set database description
252-
- `Database build()` - build in-memory database
253-
- `void save(String path)` - save to file
254-
- `byte[] toBytes()` - serialize to byte array
230+
### Extractor Pattern
255231

256-
### 3. Unit Tests (High Priority)
232+
Use the Extractor to find IoCs (Indicators of Compromise) in text:
257233

258-
Create test files in `src/test/java/com/matchylabs/matchy/`:
259-
- `DatabaseTest.java` - test open, query, close
260-
- `DatabaseBuilderTest.java` - test database creation
261-
- `QueryResultTest.java` - test result parsing
262-
- `OpenOptionsTest.java` - test configuration
263-
- `ExceptionTest.java` - test error handling
234+
```java
235+
// Extract all supported types
236+
try (Extractor extractor = Extractor.create(ExtractFlags.ALL)) {
237+
List<ExtractedMatch> matches = extractor.extract(
238+
"Contact user@example.com at 192.168.1.1 about evil.com");
239+
240+
for (ExtractedMatch match : matches) {
241+
System.out.println(match.getItemType() + ": " + match.getValue());
242+
}
243+
}
264244

265-
Use JUnit 5 (already in pom.xml dependencies).
245+
// Extract only specific types
246+
int flags = ExtractFlags.DOMAINS | ExtractFlags.IPV4 | ExtractFlags.IPV6;
247+
try (Extractor extractor = Extractor.create(flags)) {
248+
List<ExtractedMatch> matches = extractor.extract(text);
249+
// Only domains and IPs are extracted
250+
}
251+
```
266252

267-
### 4. Processing API (Medium Priority)
253+
Supported extraction types (see ExtractFlags):
254+
- DOMAINS - domain names (e.g., "example.com")
255+
- EMAILS - email addresses
256+
- IPV4 / IPV6 - IP addresses
257+
- HASHES - file hashes (MD5, SHA1, SHA256, SHA384, SHA512)
258+
- BITCOIN / ETHEREUM / MONERO - cryptocurrency addresses
259+
- ALL - extract everything
268260

269-
Implement batch processing utilities (matching Rust processing module):
270-
- `Worker.java` - batch processing with extractor + multiple databases
271-
- `FileReader.java` - streaming file I/O with gzip support
272-
- `MatchResult.java` - match results without file context
273-
- `LineMatch.java` - match results with line numbers
261+
## Key TODOs
274262

275-
See `native/matchy/WARP.md` "Processing Module API" section for design.
263+
Based on README.md, these are the next implementation priorities:
276264

277-
### 5. CI/CD (Medium Priority)
265+
### 1. CI/CD (Medium Priority)
278266

279267
Create `.github/workflows/ci.yml`:
280268
- Build native library (Rust) for Linux/macOS/Windows
@@ -283,7 +271,7 @@ Create `.github/workflows/ci.yml`:
283271
- Generate Javadoc
284272
- Create release artifacts with platform-specific native libraries
285273

286-
### 6. Examples and Documentation (Low Priority)
274+
### 2. Examples and Documentation (Low Priority)
287275

288276
Create example programs in `examples/`:
289277
- `BasicQuery.java` - simple query example
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
package com.matchylabs.matchy;
2+
3+
import com.matchylabs.matchy.jna.NativeStructs;
4+
5+
/**
6+
* Flags to configure what types of items to extract.
7+
*
8+
* <p>Combine multiple flags with bitwise OR:
9+
* <pre>{@code
10+
* int flags = ExtractFlags.DOMAINS | ExtractFlags.IPV4 | ExtractFlags.IPV6;
11+
* Extractor extractor = Extractor.create(flags);
12+
* }</pre>
13+
*/
14+
public final class ExtractFlags {
15+
16+
private ExtractFlags() {
17+
// Non-instantiable
18+
}
19+
20+
/** Extract domain names (e.g., "example.com") */
21+
public static final int DOMAINS = NativeStructs.MATCHY_EXTRACT_DOMAINS;
22+
23+
/** Extract email addresses (e.g., "user@example.com") */
24+
public static final int EMAILS = NativeStructs.MATCHY_EXTRACT_EMAILS;
25+
26+
/** Extract IPv4 addresses */
27+
public static final int IPV4 = NativeStructs.MATCHY_EXTRACT_IPV4;
28+
29+
/** Extract IPv6 addresses */
30+
public static final int IPV6 = NativeStructs.MATCHY_EXTRACT_IPV6;
31+
32+
/** Extract file hashes (MD5, SHA1, SHA256, SHA384, SHA512) */
33+
public static final int HASHES = NativeStructs.MATCHY_EXTRACT_HASHES;
34+
35+
/** Extract Bitcoin addresses */
36+
public static final int BITCOIN = NativeStructs.MATCHY_EXTRACT_BITCOIN;
37+
38+
/** Extract Ethereum addresses */
39+
public static final int ETHEREUM = NativeStructs.MATCHY_EXTRACT_ETHEREUM;
40+
41+
/** Extract Monero addresses */
42+
public static final int MONERO = NativeStructs.MATCHY_EXTRACT_MONERO;
43+
44+
/** Extract all supported types */
45+
public static final int ALL = NativeStructs.MATCHY_EXTRACT_ALL;
46+
}

0 commit comments

Comments
 (0)