Add malicious control character coverage#2
Conversation
| self.text_malicious = "pre\u202e.js\u200b\x00post\n" | ||
| self.text_malicious_sanitized = "pre_.js__post\n" | ||
| self.text_malicious_file_sanitized = "pre___.js____post\n" |
There was a problem hiding this comment.
Why... are there two different sanitized end results for sanitizing a string vs sanitizing a file? These should have the exact same result.
| def __init__(self, *args: Any, **kwargs: Any) -> None: | ||
| self.text_dirty = "\x1b[0mTest\x1b[2Kor\x1b]1;is\x1b\n[m" | ||
| self.text_dirty_sanitized = "\x1b[0mTest_[2Kor_]1;is_\n[m" | ||
| self.text_malicious = "pre\u202e.js\u200b\x00post\n" |
There was a problem hiding this comment.
Right-to-left override (U+202e) is interesting, but it's been known about for a bit longer than the Trojan Source disclosure, and isn't as concerning as the use of isolate shuffling. I would argue we should use isolate shuffling here instead.
| pytest_module = sys.modules.pop("pytest", None) | ||
| try: | ||
| with patch.object(sys, "argv", ["stsponge.py", str(output_path)]), patch( | ||
| "sys.stdin", | ||
| io.TextIOWrapper(io.BytesIO(invalid_bytes), encoding="utf-8"), | ||
| ), patch("sys.stdout", stdout_capture): | ||
| self._del_module() | ||
| importlib.import_module("stdisplay.stsponge").main() | ||
| finally: | ||
| if pytest_module is not None: | ||
| sys.modules["pytest"] = pytest_module |
There was a problem hiding this comment.
We have self._test_util() that does a roughly equivalent job to this. No need to rewrite this over and over.
| if len(argv) > 1: | ||
| untrusted_text = " ".join(argv[1:]) | ||
| stdout.write(stdisplay(untrusted_text)) | ||
| stdout.write("\n") | ||
| stdout.flush() | ||
| if len(sys.argv) > 1: | ||
| untrusted_text = " ".join(sys.argv[1:]) | ||
| sys.stdout.write(stdisplay(untrusted_text)) | ||
| sys.stdout.write("\n") | ||
| sys.stdout.flush() |
| for file in argv[1:]: | ||
| shutil.copy2(temp_file.name, file) | ||
| temp_file.close() | ||
| temp_path = temp_file.name | ||
| try: | ||
| with open(temp_path, "rb") as source_file: | ||
| content = source_file.read() | ||
| for file in argv[1:]: | ||
| fd = os.open( | ||
| file, | ||
| os.O_CREAT | os.O_WRONLY | os.O_TRUNC | os.O_NOFOLLOW, | ||
| 0o600, | ||
| ) | ||
| with os.fdopen(fd, "wb") as destination_file: | ||
| destination_file.write(content) | ||
| destination_file.flush() | ||
| os.fchmod(destination_file.fileno(), 0o600) | ||
| finally: | ||
| os.unlink(temp_path) |
There was a problem hiding this comment.
What's wrong with copy2 here? It's almost certainly faster than doing this manually (thanks to https://docs.python.org/3/library/shutil.html#shutil-platform-dependent-efficient-copy-operations). Furthermore, why are we changing the file permissions here? Isn't whatever the active umask provides good here?
I note in similar code below, following symlinks is considered a bad thing, but I disagree; sponge follows symlinks, and stsponge is supposed to mimic that API.
| """Safely print stdin to stdout and file.""" | ||
| """Safely print stdin to stdout and file without following symlinks.""" | ||
| import os |
There was a problem hiding this comment.
tee follows symlinks, this is supposed to mimic that API, thus following symlinks is not a bad thing.
| stdin.reconfigure(errors="ignore") # type: ignore | ||
| untrusted_text_list = [] | ||
| if stdin is not None: | ||
| for untrusted_text in stdin: | ||
| untrusted_text_list.append(untrusted_text) | ||
| stdout.write(stdisplay(untrusted_text)) | ||
| stdout.flush() | ||
| if len(argv) > 1: | ||
| for file_arg in argv[1:]: | ||
| with open(file_arg, mode="w", encoding="ascii") as file: | ||
| file.write(stdisplay("".join(untrusted_text_list))) | ||
| stdin.reconfigure(errors="replace") # type: ignore | ||
| output_files = [] | ||
| try: | ||
| if len(argv) > 1: | ||
| for file_arg in argv[1:]: | ||
| fd = os.open( | ||
| file_arg, | ||
| os.O_CREAT | os.O_WRONLY | os.O_TRUNC | os.O_NOFOLLOW, | ||
| 0o600, | ||
| ) | ||
| output_files.append(os.fdopen(fd, mode="w", encoding="ascii")) | ||
| if stdin is not None: | ||
| for untrusted_text in stdin: | ||
| rendered_text = stdisplay(untrusted_text) | ||
| stdout.write(rendered_text) | ||
| for output_file in output_files: | ||
| output_file.write(rendered_text) | ||
| stdout.flush() | ||
| for output_file in output_files: | ||
| output_file.flush() | ||
| finally: | ||
| for output_file in output_files: | ||
| output_file.close() |
There was a problem hiding this comment.
I like the memory optimization and multiple streaming here. I don't agree with the "don't follow symlinks" approach, but otherwise the basic idea here seems useful.
There was a problem hiding this comment.
This will effectively implement a fix for Kicksecure#18. Will integrate.
| NO_COLOR="" COLORTERM="" TERM="xterm-direct" "${pytest[@]}" "${@}" | ||
| NO_COLOR="" COLORTERM="" TERM="xterm" "${pytest[@]}" "${pytest_args[@]}" |
| pytest_args=() | ||
| for arg in "$@"; do | ||
| test_path="${pythonpath}/stdisplay/tests/${arg}.py" | ||
| if [[ "${arg}" != -* && "${arg}" != */* && -f "${test_path}" ]]; then | ||
| pytest_args+=("tests/${arg}.py") | ||
| else | ||
| pytest_args+=("${arg}") | ||
| fi | ||
| done | ||
|
|
||
| if [[ ${#pytest_args[@]} -eq 0 ]]; then | ||
| pytest_args=("${@}") | ||
| fi |
There was a problem hiding this comment.
I don't see why this is needed. It changes the API of run-tests for no apparent reason (except maybe greater convenience? I personally don't see much convenience benefit here, especially if one wants to run tests for more than just stdisplay). IMO this is undesirable.
|
Useful changes from PR added: ArrayBolt3@e20e454 |
|
Updated stecho to reference sys.argv and sys.stdout directly so its output is captured correctly by test harnesses. Enhanced run-tests to map tool names to their test files and use a supported terminal setting, improving targeted test runs for utilities like stecho. Hardened sttee to write output using O_NOFOLLOW and restrictive permissions, preventing writes through symlinks with world-readable defaults. Updated stsponge to disallow symlink destinations during copies and normalize resulting file permissions to 0o600. Updated sttee to stream input to both stdout and output files while maintaining symlink protections, matching tee-style behavior without buffering the entire input first. Ensured stsponge always cleans up its temporary buffer file after copying to destinations, preventing leftover artifacts. Secured stsponge by copying buffered content through os.open(..., O_NOFOLLOW) and restricted descriptors, preventing writes through symlinked destinations while enforcing 0600 permissions. Switched stdin decoding in stcat, stcatn, stsponge, and sttee to use errors="replace" so malformed bytes remain visible for sanitization instead of being silently dropped. Decoding in stcatn now replaces invalid UTF-8 bytes so malformed file content remains visible for sanitization rather than raising errors. Test fixtures now include an invalid-byte sample file and assertions that stcat and stcatn surface sanitized output instead of hiding undecodable characters. Added regression tests proving sttee and stsponge sanitize invalid stdin bytes both when echoing to stdout and when writing to files. Added shared malicious control/bidi fixtures and corresponding temporary file creation to the test base for reuse across terminal utilities. Expanded sanitization coverage to ensure stcat, stcatn, stecho, sttee, and stsponge handle malicious control and bidirectional characters safely in both stdout and file outputs. |
Rejected, because this was unnecessary for output to be captured correctly by test harnesses.
Rejected, because the Python ncurses library has no trouble understanding
Rejected, because this fundamentally misunderstands what the sttee tool is for and how it will be used. It is the responsibility of the caller to ensure proper permissions of files it uses sttee to write, sttee is not supposed to do this. The caller may also have a legitimate reason to write to a symlink, and is thus responsible for making sure that it isn't writing to a symlink if it doesn't want to write to a symlink.
Rejected, for the same reason as the sttee changes described above were rejected.
Partially accepted; symlink protections were rejected as described above, but simultaneous streaming to stdout and output files was accepted.
Semi-accepted; I elected to rewrite the write routine to omit the use of a temporary file entirely since it did not appear
This is a duplicate of an above rejected change.
Accepted. The Unicode replacement characters end up being converted to underscores, which is good.
Accepted, even though this shouldn't be necessary because stdisplay strips all Unicode and thus won't have malformed file content to write.
Accepted.
Accepted.
Accepted, but with tweaks so that we're using isolate shuffling in the malicious test string rather than bidi overrides. Additions implemented in ArrayBolt3@1967b0d and ArrayBolt3@e20e454. This PR can be closed. |
Summary
Testing
Codex Task