Skip to content

Commit cbca7be

Browse files
committed
FWC scraper
1 parent 93b2169 commit cbca7be

11 files changed

Lines changed: 2837 additions & 471 deletions
Lines changed: 396 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,396 @@
1+
# Australian Legal Databases - Complete Roadmap
2+
3+
**Strategic Focus:** Legal evidence for worker advocacy
4+
**Timeline:** 4 weeks to complete priority databases
5+
**Current Status:** 2/7 databases complete (FWC + WorkSafe VIC in progress)
6+
7+
---
8+
9+
## 🎯 Priority Order (Based on Client Value)
10+
11+
### **TIER 1: CRITICAL (Weeks 1-2)** ⭐⭐⭐⭐⭐
12+
13+
#### 1. Fair Work Commission (FWC) ✅ **COMPLETE**
14+
- **Status:** Working, tested, integrated
15+
- **Coverage:** Employment law cases Australia-wide
16+
- **Value:** Unfair dismissal, discrimination, wage disputes
17+
- **Tool:** `tools/fwc.py`
18+
19+
#### 2. WorkSafe Victoria 🔄 **IN PROGRESS**
20+
- **Status:** Scraper built, needs testing
21+
- **Coverage:** Victoria safety prosecutions (2012+)
22+
- **Value:** Safety violations, penalties, incidents
23+
- **Tool:** `tools/worksafe_vic.py`
24+
- **Next:** Test and refine
25+
26+
#### 3. SafeWork NSW ⏳ **NEXT**
27+
- **URL:** https://www.safework.nsw.gov.au/resource-library/prosecutions
28+
- **Coverage:** NSW safety prosecutions (largest state)
29+
- **Value:** Same as WorkSafe VIC
30+
- **Approach:** Replicate WorkSafe VIC pattern
31+
- **Timeline:** 2-3 days after VIC complete
32+
33+
---
34+
35+
### **TIER 2: HIGH VALUE (Weeks 2-3)** ⭐⭐⭐⭐⭐
36+
37+
#### 4. AustLII (Australasian Legal Information Institute)
38+
- **URL:** https://www.austlii.edu.au/
39+
- **Coverage:** ALL Australian court cases (free database)
40+
- **Value:** Federal + State courts, comprehensive
41+
- **Unique:** One search covers all jurisdictions
42+
- **Search Types:**
43+
- Company name
44+
- Case type (employment, civil, commercial)
45+
- Date range
46+
- Court level (Federal, Supreme, etc.)
47+
48+
**Why CRITICAL:**
49+
- **Free comprehensive legal database**
50+
- **Covers gaps between FWC and courts**
51+
- **Includes appeals and major decisions**
52+
- **Historical coverage back decades**
53+
54+
**Implementation:**
55+
```python
56+
async def search_austlii(
57+
company_name: str,
58+
case_types: List[str] = ["employment", "workplace"],
59+
courts: List[str] = ["federal", "supreme"],
60+
years: int = 10
61+
) -> Dict:
62+
"""
63+
Search AustLII for court cases involving company
64+
"""
65+
# AustLII has search API
66+
# Returns: Case citations, summaries, full text links
67+
```
68+
69+
#### 5. Fair Work Ombudsman (FWO)
70+
- **URL:** https://www.fairwork.gov.au/about-us/our-role/enforcing-the-legislation
71+
- **Coverage:** Wage recovery actions, compliance notices
72+
- **Value:** Direct wage theft evidence
73+
- **Data:**
74+
- Enforcement outcomes
75+
- Penalties imposed
76+
- Amounts recovered
77+
- Court-enforceable undertakings
78+
79+
**Why HIGH VALUE:**
80+
- **Wage theft is common** - many investigations involve this
81+
- **Complements FWC** - Shows enforcement actions
82+
- **Public register** - Searchable database
83+
84+
#### 6. Federal Court of Australia
85+
- **URL:** https://www.fedcourt.gov.au/
86+
- **Coverage:** Federal litigation (employment law precedents)
87+
- **Value:** Major cases, important decisions
88+
- **Search:**
89+
- Case search by party name
90+
- Full text decisions
91+
- Judgment summaries
92+
93+
---
94+
95+
### **TIER 3: VALUABLE (Week 3-4)** ⭐⭐⭐⭐
96+
97+
#### 7. ASIC (Australian Securities & Investments Commission)
98+
- **Director Ban Register:** Disqualified directors
99+
- **Insolvency Notices:** Company liquidations
100+
- **Why:** Red flags for dodgy operators
101+
102+
**Searches:**
103+
```python
104+
# Director ban search
105+
asic_director_ban(director_name) → Is this person banned?
106+
107+
# Insolvency search
108+
asic_insolvency(company_name) → Has company been liquidated?
109+
110+
# Officer history
111+
asic_officers(company_name) → Who are/were the directors?
112+
```
113+
114+
#### 8. Additional SafeWork States
115+
- **WorkSafe Queensland**
116+
- **WorkSafe WA**
117+
- **SafeWork SA**
118+
119+
**Why Lower Priority:**
120+
- Smaller populations than VIC/NSW
121+
- Same data structure as VIC/NSW
122+
- Can add incrementally
123+
124+
---
125+
126+
## 📊 Coverage Analysis
127+
128+
### By Database Type:
129+
130+
| Type | Databases | Coverage | Status |
131+
|------|-----------|----------|--------|
132+
| Employment Law | FWC | Australia-wide | ✅ Complete |
133+
| Safety | WorkSafe VIC, NSW, QLD, WA, SA | State-by-state | 🔄 1/5 |
134+
| Courts | AustLII, Federal Court | All jurisdictions | ⏳ Planned |
135+
| Wage Enforcement | Fair Work Ombudsman | Australia-wide | ⏳ Planned |
136+
| Corporate | ASIC | Australia-wide | ⏳ Planned |
137+
138+
### By Worker Coverage:
139+
140+
| Database | Workers Covered | % of Australia |
141+
|----------|----------------|----------------|
142+
| FWC | All Australian workers | 100% |
143+
| WorkSafe VIC | ~3M workers | 25% |
144+
| SafeWork NSW | ~3.8M workers | 32% |
145+
| **VIC + NSW** | **~6.8M workers** | **57%** |
146+
| All SafeWork | ~12M workers | 100% |
147+
148+
**Strategy:** VIC + NSW gives us 57% coverage with just 2 databases!
149+
150+
---
151+
152+
## 🛠️ Implementation Pattern
153+
154+
### Reusable Code Structure:
155+
156+
```python
157+
# Pattern for all safety databases
158+
async def search_safety_database(
159+
database_name: str,
160+
database_url: str,
161+
company_name: str,
162+
years: int = 10
163+
) -> Dict:
164+
"""Generic safety database scraper"""
165+
166+
# 1. Navigate to database
167+
# 2. Find search interface
168+
# 3. Submit search
169+
# 4. Extract results
170+
# 5. Parse prosecution details
171+
172+
return {
173+
"database": database_name,
174+
"prosecutions": [...],
175+
"total_found": N
176+
}
177+
178+
# Customize selectors per database
179+
SELECTORS = {
180+
"worksafe_vic": {
181+
"search_box": "input[name='search']",
182+
"results": "article.prosecution"
183+
},
184+
"safework_nsw": {
185+
"search_box": "input#company",
186+
"results": "div.result-card"
187+
}
188+
}
189+
```
190+
191+
---
192+
193+
## 📅 4-Week Timeline
194+
195+
### **Week 1: Safety Databases (VIC + NSW)**
196+
197+
**Days 1-3: WorkSafe Victoria**
198+
- [x] Build scraper
199+
- [ ] Test and refine
200+
- [ ] Validate on 10 companies
201+
- [ ] Integration
202+
203+
**Days 4-7: SafeWork NSW**
204+
- [ ] Adapt VIC code for NSW
205+
- [ ] Test and validate
206+
- [ ] Integration
207+
208+
**Deliverable:** 57% of Australian workers covered for safety data
209+
210+
---
211+
212+
### **Week 2: Comprehensive Courts (AustLII + FWO)**
213+
214+
**Days 1-4: AustLII**
215+
- [ ] Research API/search interface
216+
- [ ] Build case search
217+
- [ ] Filter for employment cases
218+
- [ ] Test on 20 companies
219+
220+
**Days 5-7: Fair Work Ombudsman**
221+
- [ ] Research enforcement database
222+
- [ ] Build scraper
223+
- [ ] Test and validate
224+
225+
**Deliverable:** Court cases + wage theft enforcement
226+
227+
---
228+
229+
### **Week 3: Federal Court + ASIC**
230+
231+
**Days 1-3: Federal Court**
232+
- [ ] Build case search
233+
- [ ] Parse decisions
234+
- [ ] Test and validate
235+
236+
**Days 4-7: ASIC Integration**
237+
- [ ] Director ban register
238+
- [ ] Insolvency notices
239+
- [ ] Officer history
240+
241+
**Deliverable:** Corporate red flags
242+
243+
---
244+
245+
### **Week 4: Additional States + Polish**
246+
247+
**Days 1-3: More SafeWork States**
248+
- [ ] Queensland
249+
- [ ] Western Australia
250+
- [ ] South Australia
251+
252+
**Days 4-7: Testing & Documentation**
253+
- [ ] End-to-end testing
254+
- [ ] Performance optimization
255+
- [ ] Documentation
256+
- [ ] Sample investigations
257+
258+
**Deliverable:** Complete Australian legal evidence platform
259+
260+
---
261+
262+
## 🎯 Success Metrics
263+
264+
### Functionality:
265+
- [ ] Searches 7 databases in <2 minutes
266+
- [ ] Finds cases/prosecutions other tools miss
267+
- [ ] Accurate data extraction (>95%)
268+
- [ ] Handles "no results" gracefully
269+
- [ ] Error recovery works
270+
271+
### Client Value:
272+
- [ ] Answers "Has this employer been prosecuted?"
273+
- [ ] Shows safety violation patterns
274+
- [ ] Provides court case history
275+
- [ ] Identifies director red flags
276+
- [ ] Generates credible evidence for organizing
277+
278+
### Coverage:
279+
- [ ] 100% of Australian workers (employment law)
280+
- [ ] 100% of Australian workers (safety)
281+
- [ ] Federal + state court coverage
282+
- [ ] Wage theft enforcement
283+
- [ ] Corporate compliance
284+
285+
---
286+
287+
## 💰 Client Value Proposition
288+
289+
### What We Can Tell Clients:
290+
291+
**Before (just FWC):**
292+
> "We found 1 employment law case against this company."
293+
294+
**After (7 databases):**
295+
> "We searched 7 legal databases and found:
296+
> - 3 Fair Work Commission cases (unfair dismissal)
297+
> - 2 SafeWork prosecutions ($150K in fines)
298+
> - 1 Federal Court case (pending)
299+
> - 1 Fair Work Ombudsman wage recovery ($85K)
300+
> - Clean ASIC record (no banned directors)
301+
>
302+
> Total penalties: $235,000
303+
> Pattern: Repeat safety violations (2021, 2023)
304+
> Risk Assessment: HIGH"
305+
306+
---
307+
308+
## 🔍 Investigation Workflow
309+
310+
### Complete Australian Investigation:
311+
312+
```python
313+
async def investigate_company_australia(company_name: str, abn: str):
314+
"""Complete legal investigation - Australia"""
315+
316+
results = {}
317+
318+
# PHASE 1: Employment Law
319+
results["fwc"] = await search_fwc(company_name)
320+
results["fwo"] = await search_fwo(company_name)
321+
322+
# PHASE 2: Safety
323+
results["worksafe_vic"] = await search_worksafe_vic(company_name)
324+
results["safework_nsw"] = await search_safework_nsw(company_name)
325+
326+
# PHASE 3: Courts
327+
results["austlii"] = await search_austlii(company_name)
328+
results["federal_court"] = await search_federal_court(company_name)
329+
330+
# PHASE 4: Corporate
331+
results["asic"] = await search_asic(company_name)
332+
333+
# Generate risk assessment
334+
risk = assess_compliance_risk(results)
335+
336+
return {
337+
"legal_findings": results,
338+
"risk_assessment": risk,
339+
"summary": generate_summary(results)
340+
}
341+
```
342+
343+
---
344+
345+
## 🚀 Current Status
346+
347+
**Completed:** 1/7 databases (14%)
348+
**In Progress:** 1/7 databases (WorkSafe VIC)
349+
**Next Up:** SafeWork NSW
350+
351+
**Timeline to Complete:** 4 weeks
352+
**Current Week:** Week 1, Day 1
353+
354+
---
355+
356+
## 📝 Notes for Implementation
357+
358+
### Technical Decisions:
359+
360+
1. **Playwright vs Requests**
361+
- Using Playwright for all (consistency)
362+
- Handles JavaScript-heavy sites
363+
- Screenshot debugging built in
364+
365+
2. **Error Handling**
366+
- Continue on individual database failures
367+
- Return partial results
368+
- Log all errors for debugging
369+
370+
3. **Rate Limiting**
371+
- 2-3 second delays between requests
372+
- Respect robots.txt
373+
- Random user agents
374+
375+
4. **Data Storage**
376+
- JSON format for now
377+
- Consider SQLite if volume grows
378+
- Evidence preservation format
379+
380+
### Testing Strategy:
381+
382+
1. **Unit Tests** - Each database individually
383+
2. **Integration Tests** - Full investigation workflow
384+
3. **Known Cases** - Validate against public records
385+
4. **Edge Cases** - No results, name variations, etc.
386+
387+
---
388+
389+
**Next Session Start:** Test WorkSafe VIC scraper!
390+
391+
```bash
392+
cd C:\Projects\osint-mcp-server
393+
python tools/worksafe_vic.py
394+
```
395+
396+
🎯 **Let's build this!**

0 commit comments

Comments
 (0)