This repo is a local-first blackbox CTF workspace with two core layers:
- retrieval (
rag/) - execution agent (
web_agent/)
repos/: upstream knowledge sources cloned from public repositoriesnotes/: retrieval strategy and source tagging notesscripts/: sync/build/run entry scriptsrag/: retrieval and index utilitiesweb_agent/: interpreter + planner + deliberation + capability manager + logistics layer + shared runtime statedocs/: architecture and iteration notes (init,pentagi)
File map:
mako/
README.md
SKILL.md
.env.example
docs/
init.md
pentagi.md
codex.md
runtime_codex_contract.md
rag/
common.py
agent.py
index.py
query.py
web_agent/
cmd_agent.py
solver_shared.py
planner.py
deliberation.py
reflector.py
capability.py
logistics.py
task_interpreter.py
scripts/
run_web_agent.sh
build_rag_index.sh
ask_rag.sh
tests/
test_deliberation.py
test_policy_control.py
test_structured_actions.py
docs/init.md: current baseline thinking after removing legacysync/docs/pentagi.md: PentAGI-driven integration notes and follow-up decisions- includes NYU Web smoke rerun notes (
2026-04-11) with intended-exploit verification and reliability-evaluation boundaries
- includes NYU Web smoke rerun notes (
docs/codex.md: research notes on moving this repo to a Codex-backed foundation via Responses API and function tools
PayloadsAllTheThings: payload patterns and bypass trickshacktricks: practical attack checklists and methodologynuclei-templates: vulnerability detection templatesOWASP-CheatSheetSeries: defensive/offensive best practices and protocol referencesfuzzdb: fuzz payloads and probe dictionariesSecLists(sparse): targeted wordlists for web content discovery, fuzzing, payloads, credentials, usernames
- Intent routing:
recon-> SecLists, HackTricks, OWASPpayload generation-> PayloadsAllTheThings, fuzzdb, SecLists/Payloadsvuln validation-> nuclei-templates, HackTricks, OWASPbypass tuning-> PayloadsAllTheThings, HackTricks
- Chunking:
- markdown/yaml/txt chunk size: 500-1200 tokens
- keep section title + file path as metadata
- preserve code blocks as independent chunks
- Retrieval:
- hybrid retrieval (BM25 + embedding)
- rerank with query-aware rules (payload-heavy query => prioritize payload repos)
- Feedback loop:
- store execution result as memory (
success/failure, status code, response signature) - use self-reflection to avoid repeating failed payload families
- store execution result as memory (
cd mako
./scripts/sync_sources.sh- Create env file:
cd mako
cp .env.example .env- Edit
.envand set at least:
OPENAI_API_KEY=your_key- Build index:
./scripts/build_rag_index.sh- Ask:
./scripts/ask_rag.sh "针对一个疑似SSTI输入点,先做哪些黑盒验证?"Hybrid retrieval is enabled by default. You can tune:
./scripts/ask_rag.sh "如何验证SSRF并区分内网探测回显?" --mode hybrid --top-k 8 --alpha 0.7
./scripts/ask_rag.sh "xss filter bypass payload" --mode bm25 --top-k 10Build index first, then run:
./scripts/build_rag_index.sh
./scripts/run_web_agent.sh "http://127.0.0.1:8080/" "Find SQL injection and retrieve flag"Architecture:
task_interpreterreadsobjective + hint + observed signals + RAG context- interpreter writes
task_prior.*into shared sqlite memory - planner turns priors and runtime evidence into explicit subtasks
- solver deliberation reads:
- task priors
- active subtask
- persistent facts
- hypotheses
- reflection constraints
- controller reflection runs in parallel and emits policy constraints
- validator layer blocks phase drift, low-gain repeats, and semantic-recovery violations
- in
codex/codex_collabmode, a single Codex tactical solver proposes the next executable step - in
codex_collabmode, a counter-solver / falsifier attacks the current route and proposes a cheap discriminator experiment before the tactical step - capability manager evaluates execution gaps (
reuse existing actionvswrite helpervsinstall dependencyvsreplan) - logistics layer executes environment/setup work requested by capability resolution and records it outside challenge-step counting
- command output updates:
- facts
- reflection state
- hypothesis lifecycle
- plan patches for follow-up subtasks
codex_collab runs also write artifacts/.../<run_id>/codex_dialogue.jsonl, which records the counter-solver and tactical solver prompts/replies for post-run inspection.
11. interpreter/planner are refreshed periodically or after drift / repeated low-gain failure
Architecture diagram:
flowchart TD
U[User / Challenge<br/>target + objective + hint] --> I[Task Interpreter<br/>task_interpreter.py]
I -->|write task_prior.*| M[(Shared Memory<br/>SQLite)]
M --> P[Planner<br/>planner.py]
P -->|active subtask| S[Deliberation Layer<br/>recommender -> corrector -> judge]
D[RAG Index<br/>rag_data/index.jsonl] --> I
D --> P
D --> S
S --> C[Capability Manager<br/>reuse -> helper -> install -> replan]
C --> L[Logistics Layer<br/>env setup / helper generation / tool supplementation]
L -->|final action/command| E[Executor / Local Environment<br/>curl / sqlmap / bash / ffuf]
E --> O[Observed Output<br/>stdout / stderr / timing]
O --> F[Fact Extractor<br/>extract_facts]
O --> R1[Execution Reflector<br/>reflect_step in solver_shared.py]
O --> R2[Policy Reflector<br/>reflector.py]
O --> H[Hypothesis Manager<br/>update_hypotheses]
O --> PP[Plan Patch<br/>build_plan_patch]
F -->|facts| M
R1 -->|reflect.*| M
R2 -->|controller.*| M
H -->|hypothesis.*| M
PP -->|plan.current| M
M -->|task_prior + plan + facts + hypotheses + reflection| P
M -->|active subtask + facts + reflection| S
Current runtime stack:
task_interpreter -> planner -> recommender -> corrector -> judge
-> capability -> logistics -> executor
executor -> fact extraction / hypothesis updates / execution reflection / policy reflection
-> plan patch -> planner
Shared-memory data model:
task_prior.* -> interpreter-produced priors
facts -> runtime observations and extracted signals
reflect.* -> failure reason, strategy update, next-step constraints
plan.* -> current plan, active subtask title/phase/action hint
controller.* -> policy reflection outputs (must_do / must_avoid / clusters)
hypothesis.* -> candidate / confirmed / weak_candidate / rejected
run.* -> challenge-step and capability-step counters
events -> compact execution trace
flows -> run-level status
tasks_state -> objective-level execution status
subtasks_state -> step-level execution status and outcome
Optional in .env:
OPENAI_AGENT_MODEL=gpt-5.2Main modules:
rag/index.py,rag/query.py,rag/agent.py,rag/common.pyweb_agent/task_interpreter.pyweb_agent/planner.pyweb_agent/deliberation.pyweb_agent/capability.pyweb_agent/logistics.pyweb_agent/reflector.pyweb_agent/solver_shared.pyweb_agent/cmd_agent.py
Execution workflow is phase-based:
- recon
- probe
- exploit
- extract
- verify
- summarize and save run log to
logs/cmd_agent_last_run.json
Memory database:
- default path:
logs/agent_memory.sqlite - shared by interpreter and solver under one
run_id - stores:
task_prior.*- extracted
facts reflect.*hypothesis.*
- designed to be vuln-agnostic (not SQL-only)
Control policy:
- phase state machine:
recon -> probe -> exploit -> extract -> verify - interpreter priors strongly constrain solver drift
- planner owns an explicit ordered subtask list and exposes one active subtask at a time
- reflector/controller outputs are used to patch the current plan
- action validator uses modular rule registries (semantic rules + controller rules)
- failure reasons are normalized and mapped to canonical failure clusters
- capability acquisition is a separate loop and does not consume challenge-step budget
- each step records an
info_gainscore from newly discovered facts - each step generates a
reflection - hypotheses are explicitly tracked as:
candidateconfirmedweak_candidaterejected
Interpreter behavior:
- converts
description + shown informationintotask_prior - identifies likely challenge family and tech stack
- proposes:
- primary hypotheses
- secondary hypotheses
- deprioritized routes
- exploit chain candidates
- prevents the solver from drifting too early into unrelated routes
Deliberation behavior:
- recommender proposes one executable step for the active subtask
- corrector aggressively finds weaknesses and may replace that step with a corrected executable proposal
- judge selects the final proposal before execution
- execution results are written back into memory
- reflection and planner patches update subsequent subtasks
- supports structured actions for brittle execution chains:
http_probe_with_baselineextract_html_attack_surfacecookiejar_flow_fetchservice_recovery_probemultipart_upload_with_known_actionbuild_jsp_wartomcat_manager_read_file
Capability and logistics behavior:
- capability resolution detects whether the chosen proposal lacks a required tool or dependency
- capability scoring prefers:
- existing structured action
- generated helper script
- controlled install
- replan
- logistics strategy selection is model-driven (
pipvssystem_package_managervsskip_install) with deterministic safety fallback - logistics executes support work for the chosen option
- logistics work is tracked as
capability_stepsand does not consumechallenge_steps - install targets are dynamic (not a fixed allowlist), but install timing is constrained by capability-policy and step accounting
./scripts/run_quick_fuzz.sh "http://127.0.0.1:8080/" path path-small
./scripts/run_quick_fuzz.sh "http://127.0.0.1:8080/search" param-value sstiSupported default wordlists:
path-smallparam-namessstixsscmdi
-
Q: What does "mako" mean in this project?
-
A:
makorefers to Hitachi Mako (常陆茉子) and has no other meaning. -
问:这个项目里的 “mako” 是什么意思?
-
答:
mako指的是常陆茉子,除此之外没有任何含义。 -
質問:このプロジェクトにおける「mako」は何を意味しますか?
-
回答:
makoは常陸茉子を指し、それ以外の意味はありません。