Standalone PHP checker for syntax validation, duplicate-code detection, public API snapshot checks and comment policy checks.
PHPProbe is the checker runtime. It can be used directly as phpprobe, required by tool-combiner packages such as PHPForge, or called from PHP code through the public gateway classes.
- PHP
>=8.2 nikic/php-parser>=5.0 <6.0
Install it as a Composer tool dependency:
composer require --dev infocyph/phpprobeThe package ships a Composer binary:
php vendor/bin/phpprobephp vendor/bin/phpprobe syntax [options] [paths...]
php vendor/bin/phpprobe duplicates [options] [paths...]
php vendor/bin/phpprobe api [options] [paths...]
php vendor/bin/phpprobe comments [options] [paths...]
php vendor/bin/phpprobe presets
php vendor/bin/phpprobe preset <name>Unknown commands print the top-level usage and exit 0. There is no separate --version command.
For checker subcommands (syntax, duplicates, api, comments), unknown options fail with exit 2.
php vendor/bin/phpprobe syntax
php vendor/bin/phpprobe syntax --format=markdown --parallel=4 src
php vendor/bin/phpprobe duplicates
php vendor/bin/phpprobe duplicates --json
php vendor/bin/phpprobe duplicates --summary-json=build/duplicates-summary.json src
php vendor/bin/phpprobe duplicates --preset=strict --json src
php vendor/bin/phpprobe api --write-baseline=.phpprobe-api-baseline.json src
php vendor/bin/phpprobe api --baseline=.phpprobe-api-baseline.json src
php vendor/bin/phpprobe api --fail-on=error --format=markdown --baseline=.phpprobe-api-baseline.json src
php vendor/bin/phpprobe comments --fail-on=warning src
php vendor/bin/phpprobe comments --strict --json src
php vendor/bin/phpprobe comments --policy=strict --format=markdown src
php vendor/bin/phpprobe presets
php vendor/bin/phpprobe preset phpstormThe package-facing checker gateways live directly under src/:
Infocyph\PHPProbe\SyntaxCheckerInfocyph\PHPProbe\DuplicateCheckerInfocyph\PHPProbe\ApiSnapshotCheckerInfocyph\PHPProbe\CommentChecker
All expose:
public function run(array $args): int$args is the same argument list that follows the CLI subcommand. For example:
use Infocyph\PHPProbe\ApiSnapshotChecker;
use Infocyph\PHPProbe\CommentChecker;
use Infocyph\PHPProbe\DuplicateChecker;
use Infocyph\PHPProbe\SyntaxChecker;
$syntaxCode = (new SyntaxChecker())->run(['--config=phpprobe.json', 'src']);
$duplicateCode = (new DuplicateChecker())->run(['--preset=strict', '--json', 'src']);
$apiCode = (new ApiSnapshotChecker())->run(['--baseline=.phpprobe-api-baseline.json', 'src']);
$commentCode = (new CommentChecker())->run(['--strict', '--fail-on=warning', 'src']);Everything else is internal implementation detail, grouped by role:
| Namespace | Purpose |
|---|---|
Api |
Public API snapshot extraction from parser ASTs. |
Console |
CLI dispatch for bin/phpprobe. |
Config |
Config lookup, preset lookup, JSON parsing, config merging and shared CLI option handling. |
Detection |
Duplicate-code token indexing, AST block indexing, scoring, grouping and pruning. |
Filesystem |
Git-aware PHP file discovery and path exclusion. |
Process |
Small proc_open runner wrappers. |
Util |
Narrow shared helpers. |
The default config filename is phpprobe.json.
When a checker needs a config file and --config was not passed, PHPProbe resolves it in this order:
phpprobe.jsonin the current project root, meaning the current working directory.vendor/infocyph/phpprobe/resources/phpprobe.jsonunder the current project root.resources/phpprobe.jsononly when the current project itself isinfocyph/phpprobe.
If no config can be found, PHPProbe throws a runtime config error.
Preset files are bundled resources. They are resolved from:
vendor/infocyph/phpprobe/resources/presets/<name>.json.resources/presets/<name>.jsononly while developinginfocyph/phpprobeitself.
Project-root preset files are not looked up automatically.
When --config=FILE is passed explicitly and that file is missing, unreadable, empty, or invalid JSON, PHPProbe treats it as an empty config and continues with internal defaults plus any CLI options.
The bundled resources/phpprobe.json is intentionally small:
{
"preset": "phpstorm"
}A full project config may override any part of the selected preset:
{
"preset": "phpstorm",
"syntax": {
"paths": ["src"],
"exclude": ["src/generated"]
},
"duplicates": {
"paths": ["src"],
"exclude": ["src/generated"],
"mode": "audit",
"normalize": true,
"fuzzy": true,
"near_miss": true,
"min_lines": 5,
"min_tokens": 90,
"min_statements": 4,
"min_similarity": 0.85,
"baseline": "",
"write_baseline": "",
"json": false
},
"api": {
"paths": ["src"],
"exclude": ["src/generated"],
"include_protected": true,
"baseline": "",
"write_baseline": "",
"json": false
}
}Config keys accept snake case, kebab case and camel case. For example, min_tokens, min-tokens and minTokens are equivalent. Excludes can be configured as either exclude or exclude_paths.
Internal duplicate defaults, before the bundled phpstorm config is applied, are mode=gate, normalize=true, fuzzy=false, near_miss=false, min_lines=5, min_tokens=70, min_statements=4, min_similarity=0.85, no baseline, no JSON output and no configured paths or excludes.
Internal API defaults are include_protected=true, no baseline, no JSON output and no configured paths or excludes.
Config merge order is:
- Internal checker defaults.
- Config-file
preset, when present. - Explicit values in the config file.
- CLI
--preset=NAME, when present. - Explicit CLI flags and CLI paths.
Local config values override the config-file preset. CLI --preset is a run-level override and can override config-file values. Explicit CLI flags still win after that.
Preset templates live in resources/presets/ and are loaded by Infocyph\PHPProbe\Config\PresetRepository.
Available presets:
| Preset | Duplicate policy | API policy |
|---|---|---|
phpstorm |
PhpStorm-aligned default. audit mode, normalized tokens, fuzzy identifiers, near-miss matching, min_lines=5, min_tokens=90, min_statements=4, min_similarity=0.85. |
Includes protected members. |
standard |
Quieter CI gate. gate mode, normalized tokens, fuzzy identifiers, no near-miss matching, min_lines=6, min_tokens=100, min_statements=5, min_similarity=0.9. |
Includes protected members. |
strict |
Sensitive audit. audit mode, normalized tokens, fuzzy identifiers, near-miss matching, min_lines=4, min_tokens=70, min_statements=3, min_similarity=0.8. |
Includes protected members. |
All presets include the same default syntax, duplicate and API excludes:
tests, vendor, node_modules, .git, .idea, .vscode, coverage,
.phpunit.cache, .psalm-cache, build, dist, tmp, .tmp, storage,
bootstrap/cache, var/cache
Duplicate presets also exclude storage/framework/views.
Preset commands:
php vendor/bin/phpprobe presets
php vendor/bin/phpprobe preset phpstormpresets prints one preset name per line. preset <name> prints the bundled JSON template. Unknown preset names print an error and exit 2.
The syntax checker discovers PHP files, then runs PHP's native lint command against each file:
php -d display_errors=1 -l <file>Command:
php vendor/bin/phpprobe syntax [options] [paths...]Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--format |
`--format=text | json |
--json |
flag | Alias for --format=json. |
--summary-json |
--summary-json=FILE |
Write a machine-readable run summary JSON. |
--changed-only |
flag | Scan only changed PHP files from Git diff. |
--changed-base |
--changed-base=REF |
Base ref used with --changed-only. |
--parallel |
--parallel=N |
Parallel lint worker count. Default is 1. |
--help, -h |
flag | Print syntax checker help and exit 0. |
Path behavior:
- CLI paths override
syntax.pathsfrom config. - If CLI paths are empty,
syntax.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No PHP files found | stdout: No PHP files found. plus summary |
0 |
| All files pass | stdout: Syntax OK: N PHP files checked. plus summary |
0 |
| One or more files fail | stderr: failing file list plus lint output |
1 |
| Unknown option or runtime config error | stderr: error |
2 |
| Unknown preset | stderr: preset error |
2 |
The comment checker scans PHP comments using token_get_all() and reports marker tags and commented-out code policy findings.
Command:
php vendor/bin/phpprobe comments [options] [paths...]Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--format |
`--format=text | json |
--json |
flag | Alias for --format=json. |
--strict |
flag | Escalate commented-out-code policy severities. |
--policy |
`--policy=relaxed | standard |
--fail-on |
`--fail-on=error | warning |
--summary-json |
--summary-json=FILE |
Write a machine-readable run summary JSON. |
--changed-only |
flag | Scan only changed PHP files from Git diff. |
--changed-base |
--changed-base=REF |
Base ref used with --changed-only. |
--tags |
--tags=TODO,FIXME,... |
Override marker tags for marker detection. |
--help, -h |
flag | Print comments checker help and exit 0. |
- Marker detection: tags like
TODO,FIXME,BUG,HACK,SECURITY,REVIEW,DEPRECATED. - Commented-out code requires directly attached tagged reason.
- Long commented-out blocks require an issue reference.
- Oversized commented-out blocks are always reported.
Default thresholds:
min_reason_length = 12require_issue_for_blocks_longer_than = 3max_allowed_block_lines = 10
Policy-to-finding mapping:
| Policy | Finding types |
|---|---|
| Marker detection | comment_marker |
| Tagged reason required for commented-out code | commented_out_code_without_reason, commented_out_code_without_valid_tag, commented_out_code_without_valid_reason, commented_out_code_with_weak_reason |
| Issue reference required for long blocks | commented_out_code_requires_issue_reference |
| Oversized block disallowed | commented_out_code_block_too_large |
| PHPDoc code without clear example label | commented_out_code_in_phpdoc_without_example_label |
| Invalid suppression directive | invalid_suppression_rule |
| Explicitly valid tagged reason (informational) | commented_out_code_with_valid_reason |
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No failing findings at threshold | stdout: summary (or JSON/markdown/SARIF) |
0 |
| Findings at or above threshold | stderr: text report (or JSON on stdout) |
1 |
| Unknown option or runtime config error | stderr: error |
2 |
The API checker parses PHP files with nikic/php-parser, extracts the package-visible surface and can compare it with a saved snapshot. It is intended for library BC drift checks, not type analysis.
Command:
php vendor/bin/phpprobe api [options] [paths...]Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--public-only |
flag | Ignore protected class members. |
--include-protected |
flag | Include protected members. This is the default. |
--baseline |
--baseline=FILE |
Compare the current API against a snapshot file. |
--write-baseline |
--write-baseline, --write-baseline=FILE |
Write the current API snapshot and exit 0. Bare flag writes .phpprobe-api-baseline.json. |
--format |
`--format=text | json |
--json |
flag | Alias for --format=json. |
--fail-on |
`--fail-on=error | warning |
--summary-json |
--summary-json=FILE |
Write a machine-readable run summary JSON. |
--changed-only |
flag | Scan only changed PHP files from Git diff. |
--changed-base |
--changed-base=REF |
Base ref used with --changed-only. |
--help, -h |
flag | Print API checker help and exit 0. |
Path behavior:
- CLI paths override
api.pathsfrom config. - If CLI paths are empty,
api.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Snapshot contents:
- named classes, interfaces, traits and enums
- top-level namespaced functions
- top-level namespaced constants
- public members always
- protected members unless
--public-onlyis used - class modifiers, inheritance, implemented interfaces, method signatures, property signatures, constants, enum cases, function signatures and stable fingerprints
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No baseline passed | stdout: Public API snapshot OK: N symbol(s) scanned. |
0 |
| Baseline matches | stdout: Public API unchanged: N symbol(s) scanned. |
0 |
| Baseline differs | stderr: added/removed/changed symbol list |
1 by default, 0 when --fail-on=error |
| `--format=json | markdown | sarif` |
--write-baseline |
stdout: baseline message or JSON result |
0 |
| Unknown option or runtime config/baseline error | stderr: error |
2 |
| Unknown preset | stderr: preset error |
2 |
The duplicate checker combines token fingerprints, AST block structure, statement windows, near-miss similarity, grouping, pruning, ranking and optional baseline suppression.
Command:
php vendor/bin/phpprobe duplicates [options] [paths...]Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--mode |
--mode=gate or --mode=audit |
gate runs token matching; audit also enables statement matching and near-miss matching. |
--min-lines |
--min-lines=N |
Minimum duplicated line span. Values below 1 become 1. |
--min-tokens |
--min-tokens=N |
Token fingerprint window size. Values below 1 become 1. |
--min-statements |
--min-statements=N |
Statement window size for audit matching. Values below 1 become 1. |
--min-similarity |
--min-similarity=N |
Near-miss threshold. Accepts 0.0..1.0 or 0..100; values above 1 are treated as percentages. |
--near-miss |
flag | Enable bounded statement/shape similarity matching. |
--exact |
flag | Disable variable/literal normalization and disable fuzzy matching. |
--fuzzy |
flag | Normalize identifiers/calls as ID for renamed-code scans. |
--no-fuzzy |
flag | Disable fuzzy identifier/call normalization. |
--baseline |
--baseline=FILE |
Suppress clone groups whose fingerprints are already in a baseline file. |
--write-baseline |
--write-baseline, --write-baseline=FILE |
Write current clone fingerprints to a baseline and exit 0. Bare flag writes .phpprobe-duplicates-baseline.json. |
--format |
`--format=text | json |
--json |
flag | Alias for --format=json. |
--fail-on |
`--fail-on=error | warning |
--error-duplicate-percentage |
--error-duplicate-percentage=N |
Error threshold used when --fail-on=error. Default 20. |
--summary-json |
--summary-json=FILE |
Write a machine-readable run summary JSON. |
--changed-only |
flag | Scan only changed PHP files from Git diff. |
--changed-base |
--changed-base=REF |
Base ref used with --changed-only. |
--no-cache |
flag | Disable duplicate result cache. |
--cache-file |
--cache-file=FILE |
Duplicate result cache path. |
--help, -h |
flag | Print duplicate checker help and exit 0. |
Exact accepted forms matter: numeric options, --mode, --baseline and valued --write-baseline=FILE are parsed in equals form. --config, --preset and --exclude also accept split form. --write-baseline may also be passed as a bare flag.
Path behavior:
- CLI paths override
duplicates.pathsfrom config. - If CLI paths are empty,
duplicates.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Mode behavior:
gate: token-window duplicate detection only, unless--near-missis explicitly passed.audit: token-window matching plus statement-window matching and near-miss matching is enabled automatically.
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No clone groups after baseline suppression | stdout: No new duplicated code found (...) plus summary |
0 |
| Clone groups found | stderr: text report plus summary |
1 by default |
| `--format=json | markdown | sarif` |
--write-baseline |
stdout: baseline message or JSON result |
0 |
| Unknown option or runtime config/baseline error | stderr: error |
2 |
| Unknown preset | stderr: preset error |
2 |
File discovery:
- PHPProbe first tries
git ls-files -z --cached --others --exclude-standard. - It filters discovered PHP files with
git check-ignore -z --stdin --no-index. - If Git discovery is unavailable, it recursively scans the selected paths.
- Recursive fallback skips common infrastructure directories such as
.git,.idea,.phpunit.cache,.psalm-cache,.vscode,coverage,node_modulesandvendor.
Token normalization:
- Whitespace, comments, doc comments, PHP open tags and close tags are ignored.
- With
normalize=true, variables becomeVAR, numbers becomeNUM, strings becomeSTR. - With
fuzzy=true, identifiers and names becomeID. - With
--exact, token values include token names and original text.
Token clones:
- PHPProbe hashes every normalized token window of
min_tokenstokens. - Matching windows are candidate clones.
- Candidates are extended token-by-token to find the full matching region.
- Overlapping windows in the same file are ignored.
- Clone regions below
min_linesare ignored.
AST and statement matching:
- PHPProbe uses
nikic/php-parserto index structural blocks. - Indexed blocks include functions, methods, closures, arrow functions, loops, branches, match arms and try/catch/finally blocks.
- Statement hashes are built from AST shape.
- In
auditmode, matching statement windows ofmin_statementsstatements are reported as statement clones.
Near-miss matching:
- Near-miss matching compares blocks with the same block type.
- Similarity is weighted as
72%statement-hash similarity and28%AST-shape similarity. - Similarity is based on longest-common-subsequence ratio.
- Matches below
min_similarityare ignored.
Grouping, pruning and scoring:
- Duplicate pairs are grouped into clone families.
- Contained/weaker clones are pruned.
- Results are ranked by score, line span and similarity.
- Scoring rewards larger clones, more occurrences, higher similarity, structural completeness and near-miss signal; small trivial clones are penalized.
phpprobe duplicates --json emits:
{
"files": 2,
"total_lines": 100,
"duplicated_lines": 20,
"duplicate_percentage": 20.0,
"known_clones": 0,
"new_clones": 1,
"clones": [
{
"fingerprint": "...",
"source": "tokens",
"score": 120.5,
"similarity": 1.0,
"tokens": 90,
"lines": 10,
"statements": 0,
"block_type": "function",
"occurrences": [
{
"file": "src/Example.php",
"start_line": 10,
"end_line": 20,
"lines": 11,
"context": "function"
}
]
}
]
}Clone source is one of:
tokensstatementsnear_miss
known_clones is populated when a duplicate baseline is read. new_clones is the number of clone groups remaining after baseline suppression.
phpprobe api --json emits:
{
"snapshot": {
"version": 1,
"generated_at": "2026-05-02T00:00:00+00:00",
"symbols": [
{
"id": "class App\\Service",
"kind": "class",
"name": "App\\Service",
"file": "src/Service.php",
"line": 5,
"modifiers": ["final"],
"extends": "",
"implements": [],
"members": [],
"fingerprint": "..."
}
]
},
"baseline": {
"version": 1,
"generated_at": "",
"symbols": []
},
"changed": false,
"changes": {
"added": [],
"removed": [],
"changed": []
}
}Write a baseline:
php vendor/bin/phpprobe duplicates --write-baseline
php vendor/bin/phpprobe duplicates --write-baseline=.phpprobe-duplicates-baseline.json
php vendor/bin/phpprobe api --write-baseline
php vendor/bin/phpprobe api --write-baseline=.phpprobe-api-baseline.jsonUse a baseline:
php vendor/bin/phpprobe duplicates --baseline=.phpprobe-duplicates-baseline.json
php vendor/bin/phpprobe api --baseline=.phpprobe-api-baseline.jsonThis repository uses resources/.phpprobe-duplicates-baseline.json in composer duplicates so CI fails only on newly introduced clone groups.
Duplicate baseline files contain:
{
"version": 1,
"generated_at": "2026-05-02T00:00:00+00:00",
"clones": [
{
"fingerprint": "...",
"source": "tokens",
"score": 100.0
}
]
}API baseline files use the same top-level version, generated_at and symbols shape emitted under the snapshot JSON key. Missing, unreadable, or invalid baseline files now fail with exit code 2.
Duplicate baseline files follow the same strict behavior: missing, unreadable, or invalid baselines fail with exit code 2.
Checker text output is colorized on interactive terminals:
- green: successful summaries
- yellow: warning/medium severity lines
- red: error/high/critical summaries
- cyan: baseline write notifications
Color output is automatically disabled for non-TTY streams and when NO_COLOR is set (or TERM=dumb), so CI logs and JSON output stay clean.
Workflow: .github/workflows/ci.yml
CI runs:
- PHPProbe matrix on PHP
8.2,8.3,8.4,8.5:composer validate --strictcomposer testcomposer lintcomposer duplicatescomposer apicomposer comments
- PHPForge integration:
- checks out
infocyph/phpforge - injects local
phpprobevia Composerpathrepository - runs PHPForge tests
- checks out
workflow_dispatch supports phpforge_ref to test a specific PHPForge branch/tag/SHA.
Composer scripts:
| Script | Command |
|---|---|
composer test |
vendor/bin/pest -c pest.xml |
composer lint |
php bin/phpprobe syntax src tests |
composer duplicates |
php bin/phpprobe duplicates --preset=standard --config=resources/phpprobe.json --baseline=resources/.phpprobe-duplicates-baseline.json src tests |
composer api |
php bin/phpprobe api --config=resources/phpprobe.json src tests |
composer comments |
php bin/phpprobe comments --config=resources/phpprobe.json src tests |
Useful local checks:
composer validate --strict
composer test
composer lint
composer duplicates
composer api
composer comments
git diff --check