Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .github/workflows/pr-fast-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ defaults:
jobs:
changes:
name: Detect Relevant Changes
runs-on: ['self-hosted', 'synology', 'shell-only', 'public']
runs-on: ubuntu-latest
outputs:
app: ${{ steps.filter.outputs.app }}
ci: ${{ steps.filter.outputs.ci }}
Expand Down Expand Up @@ -63,7 +63,7 @@ jobs:

fast-checks:
name: Fast Checks
runs-on: ['self-hosted', 'synology', 'shell-only', 'public']
runs-on: ubuntu-latest
timeout-minutes: 15
needs: changes
if: >-
Expand All @@ -74,12 +74,16 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}

- uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Run fast checks
run: bash scripts/ci/run-fast-checks.sh

validate-secrets:
name: Validate Secrets
runs-on: ['self-hosted', 'synology', 'shell-only', 'public']
runs-on: ubuntu-latest
timeout-minutes: 10
if: github.event.pull_request.draft == false
steps:
Expand All @@ -91,7 +95,7 @@ jobs:

ci-gate:
name: CI Gate
runs-on: ['self-hosted', 'synology', 'shell-only', 'public']
runs-on: ubuntu-latest
if: always()
needs:
- changes
Expand Down
7 changes: 7 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ Run the fast validation gate before opening a pull request:
bash scripts/ci/run-fast-checks.sh
```

Before staging local MailPlus fixtures, generated metadata, or cache-related
changes, run the broader local scan so untracked non-ignored files are checked:

```bash
bash scripts/check-detect-secrets.sh --all-local
```

## Pull Requests

- Keep changes focused and reviewable.
Expand Down
6 changes: 6 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ available for this repository. If private reporting is unavailable, contact the
repository owner with a minimal description of the issue and the affected
version or commit.

Expected response: the maintainer will acknowledge a complete report within
seven calendar days and will provide a remediation or disclosure plan once the
impact is understood. Please include enough reproduction detail to verify the
issue without including live credentials, raw mail, mailbox exports, or other
private payloads.

## Data Handling Scope

MailPlus Intelligence must not store raw mail bodies, attachment payloads,
Expand Down
18 changes: 18 additions & 0 deletions docs/privacy-redaction-boundaries.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,24 @@ Forbidden in the repo:
- Production selected text caches or semantic output exports that have not passed promotion review.
- Machine-local caches, logs, database files, or generated stores that may contain raw message content.

## Local Secret Scan Guardrail

`scripts/check-detect-secrets.sh` is a fast baseline guardrail for CI and local preflight checks. It is not comprehensive DLP and does not replace operator review before live MailPlus integration, selected-text-cache work, or public release.

The default CI mode scans tracked files only:

```bash
bash scripts/check-detect-secrets.sh --all-files
```

Before staging or opening a PR that may have generated local artifacts, use the broader local mode:

```bash
bash scripts/check-detect-secrets.sh --all-files-with-untracked
```

The broader mode includes untracked, non-ignored files and checks for common local leak shapes, including `.eml` and `.mbox` mailbox exports, MailPlus metadata/cache database filenames, and live OAuth, reset, magic-login, recovery, checkout, invoice, billing, or payment links with token-like query parameters. Synthetic documentation and fixtures should use reserved domains such as `example.com` and redaction markers such as `[REDACTED_TOKEN]` so the scanner can distinguish examples from live artifacts.

## Fixture Redaction Rules

Fixtures must be synthetic by default. If a real-world shape is needed to reproduce parsing behavior, reduce it to the minimum structure and redact before committing.
Expand Down
9 changes: 9 additions & 0 deletions fixtures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Fixtures

All fixtures in this tree are synthetic, metadata-only or derived-output examples. Do not add real MailPlus exports, raw message bodies, attachment payloads, credentials, live links, personal names, real domains, generated databases, caches, or logs.

Before extending a fixture corpus, review `docs/privacy-redaction-boundaries.md`, use reserved domains such as `example.com` or `example.test`, and run:

```bash
bash scripts/check-detect-secrets.sh --all-local
```
203 changes: 128 additions & 75 deletions scripts/check-detect-secrets.sh
Original file line number Diff line number Diff line change
@@ -1,81 +1,134 @@
#!/usr/bin/env bash
set -euo pipefail

mode="${1:-"--all-files"}"
ignore_globs=("scripts/check-detect-secrets.sh")
if [[ -f .detect-secrets-ignore ]]; then
while IFS= read -r ignore_glob; do
if [[ -z "$ignore_glob" ]]; then
continue
fi
if [[ "${ignore_glob:0:1}" == "#" ]]; then
continue
fi
ignore_globs+=("$ignore_glob")
done < .detect-secrets-ignore
#!/usr/bin/env bash
set -euo pipefail

mode="${1:-"--all-files"}"
ignore_globs=("scripts/check-detect-secrets.sh")
if [[ -f .detect-secrets-ignore ]]; then
while IFS= read -r ignore_glob; do
if [[ -z "$ignore_glob" ]]; then
continue
fi
if [[ "${ignore_glob:0:1}" == "#" ]]; then
continue
fi
ignore_globs+=("$ignore_glob")
done < .detect-secrets-ignore
fi

should_skip_file() {
local candidate="$1"
local ignore_glob
for ignore_glob in "${ignore_globs[@]}"; do
case "$candidate" in
$ignore_glob)
return 0
;;
esac
done
return 1
}

append_tracked_files() {
while IFS= read -r -d '' file; do
files+=("$file")
done < <(git ls-files -z)
}

append_staged_files() {
while IFS= read -r -d '' file; do
files+=("$file")
done < <(git diff --cached --name-only --diff-filter=ACMR -z)
}

append_untracked_files() {
while IFS= read -r -d '' file; do
files+=("$file")
done < <(git ls-files --others --exclude-standard -z)
}

files=()
case "$mode" in
--staged)
append_staged_files
;;
--all-files)
append_tracked_files
;;
--all-files-with-untracked | --all-local | --local)
append_tracked_files
append_untracked_files
;;
*)
echo "Usage: $0 [--all-files|--staged|--all-files-with-untracked|--all-local|--local]" >&2
exit 2
;;
esac

if [[ "${#files[@]}" -eq 0 ]]; then
echo "No files to scan."
exit 0
fi

content_patterns=(
'ghp_'
'github_pat_'
'sk-live-'
'sk-proj-'
'AKIA[0-9A-Z]{16}'
'BEGIN (RSA|OPENSSH|EC) PRIVATE KEY'
'ANTHROPIC_API_KEY='
'OPENAI_API_KEY='
'SUDO_PASS='
'BW_SESSION='
)

mailplus_content_patterns=(
'https?://[^[:space:])>"]*(oauth|authorize|token)[^[:space:])>"]*(token|code|key|secret|signature|sig|jwt|session|auth)='
'https?://[^[:space:])>"]*(reset|recovery|recover-password|password-reset|magic|login)[^[:space:])>"]*(token|code|key|secret|signature|sig|jwt|session|auth)='
'https?://[^[:space:])>"]*(pay|payment|checkout|invoice|billing)[^[:space:])>"]*(token|code|key|secret|signature|sig|jwt|session|auth)='
)

mailplus_path_patterns=(
'\.eml$'
'\.mbox$'
'(^|/)(mailplus|selected-text|semantic|metadata)[^/]*\.(db|sqlite|sqlite3)$'
'(^|/)(mailplus|selected-text|semantic|metadata)[^/]*\.(cache|log)$'
)

allowed_synthetic_pattern='(example\.(com|org|net|test)|\[REDACTED_[A-Z_]+\])'

tmp_file="$(mktemp)"
trap 'rm -f "$tmp_file"' EXIT

for file in "${files[@]}"; do
if [[ ! -f "$file" ]] || should_skip_file "$file"; then
continue
fi
printf '%s\n' "$file" >>"$tmp_file"
done

failed=0
while IFS= read -r file; do
for pattern in "${mailplus_path_patterns[@]}"; do
if [[ "$file" =~ $pattern ]]; then
echo "Potential MailPlus export/cache artifact '$pattern' found at $file" >&2
failed=1
fi
done

should_skip_file() {
local candidate="$1"
local ignore_glob
for ignore_glob in "${ignore_globs[@]}"; do
case "$candidate" in
$ignore_glob)
return 0
;;
esac
done
return 1
}

files=()
if [[ "$mode" == "--staged" ]]; then
while IFS= read -r -d '' file; do
files+=("$file")
done < <(git diff --cached --name-only --diff-filter=ACMR -z)
else
while IFS= read -r -d '' file; do
files+=("$file")
done < <(git ls-files -z)
for pattern in "${content_patterns[@]}"; do
if grep -E -n "$pattern" "$file" >/dev/null 2>&1; then
echo "Potential secret pattern '$pattern' found in $file" >&2
failed=1
fi
done

if [[ "${#files[@]}" -eq 0 ]]; then
echo "No files to scan."
exit 0
for pattern in "${mailplus_content_patterns[@]}"; do
if grep -E -n "$pattern" "$file" | grep -Ev "$allowed_synthetic_pattern" >/dev/null 2>&1; then
Comment thread
jmcte marked this conversation as resolved.
echo "Potential MailPlus link leak pattern '$pattern' found in $file" >&2
failed=1
fi
done
done <"$tmp_file"

patterns=(
'ghp_'
'github_pat_'
'sk-live-'
'sk-proj-'
'AKIA[0-9A-Z]{16}'
'BEGIN (RSA|OPENSSH|EC) PRIVATE KEY'
'ANTHROPIC_API_KEY='
'OPENAI_API_KEY='
'SUDO_PASS='
'BW_SESSION='
)

tmp_file="$(mktemp)"
trap 'rm -f "$tmp_file"' EXIT

for file in "${files[@]}"; do
if [[ ! -f "$file" ]] || should_skip_file "$file"; then
continue
fi
printf '%s
' "$file" >>"$tmp_file"
done

failed=0
while IFS= read -r file; do
for pattern in "${patterns[@]}"; do
if grep -E -n "$pattern" "$file" >/dev/null 2>&1; then
echo "Potential secret pattern '$pattern' found in $file" >&2
failed=1
fi
done
done <"$tmp_file"

exit "$failed"
exit "$failed"
16 changes: 16 additions & 0 deletions src/mailplus_intelligence/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,30 @@

from __future__ import annotations

import os
import sqlite3
from pathlib import Path


def _create_owner_only_database_file(database: Path) -> None:
if database.exists():
return

try:
file_descriptor = os.open(database, os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0o600)
except FileExistsError:
return

os.close(file_descriptor)


def connect_sqlite(database: str | Path = ":memory:") -> sqlite3.Connection:
"""Open a SQLite connection with project defaults for index work."""

database_name = str(database)
if database_name != ":memory:":
_create_owner_only_database_file(Path(database))

connection = sqlite3.connect(database_name)
connection.row_factory = sqlite3.Row
connection.execute("PRAGMA foreign_keys = ON")
Expand Down
11 changes: 11 additions & 0 deletions tests/test_runtime.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import stat
import tempfile
import unittest
from pathlib import Path
Expand Down Expand Up @@ -40,6 +41,16 @@ def test_sqlite_connection_supports_index_style_round_trip(self) -> None:
finally:
connection.close()

def test_sqlite_file_is_owner_only_when_created(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
database = Path(tmpdir) / "mailplus-intelligence.db"
connection = connect_sqlite(database)
try:
mode = stat.S_IMODE(database.stat().st_mode)
self.assertEqual(mode, 0o600)
finally:
connection.close()


if __name__ == "__main__":
unittest.main()
Loading
Loading