Skip to content

Log Anonymizer Script

anon edited this page Jun 2, 2025 · 1 revision

Log Anonymizer Script (log_anonymizer.py)

Purpose

The log_anonymizer.py script is a utility designed to anonymize sensitive data within log files, with a particular focus on Postfix mail logs. This is useful for sharing log excerpts for troubleshooting purposes or for archiving logs while minimizing privacy concerns.

Features

  • Temporary Copy: Creates a temporary copy of the input log file before processing to ensure the original file remains untouched.
  • Data Anonymization: Identifies and anonymizes various types of sensitive information:
    • Postfix and Amavis server names (from log line prefixes).
    • Usernames (extracted from email addresses).
    • IP Addresses (IPv4).
    • Hostnames (FQDNs).
    • Full sasl_username= values.
    • Entire content of email subjects found in Subject: "..." lines (e.g., to anon_subject_1).
    • Sender identifiers within angle brackets (<...>) in NOQUEUE: reject: RCPT from ... log lines (e.g., to anon_rejected_sender_1).
  • Consistent Replacement: Ensures that each unique sensitive value is consistently replaced with the same anonymized identifier (e.g., "1.1.1.1" always becomes "anon_ip_1", "user@example.com" always becomes "anon_user_1@anon_hostname_1").
  • Data Preservation: Timestamps and other non-sensitive parts of log lines are preserved.

Usage

python3 bin/log_anonymizer.py -i <input_log_file> -o <output_anonymized_file> [options]

Command-line Options

  • -i/--input-file <path>: (Required) Path to the original log file.
  • -o/--output-file <path>: (Required) Path where the anonymized log file will be saved.
  • -t/--temp-dir <path>: (Optional) Path to a directory to use for temporary file storage. Defaults to the system's standard temporary directory.
  • --config <json_file_path>: (Optional) Path to a JSON configuration file. The --config option is available for future enhancements, such as customizing anonymization rules. Currently, its main use would be for advanced placeholder formatting if this feature is expanded in the script.
  • --log-level <level>: (Optional) Set the logging level for script output. Options: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default: INFO.
  • --script-log-file <path>: (Optional) Path to a file where script execution logs (INFO, DEBUG, ERROR, etc.) will be saved. If not provided, logs are only output to the console.

Example

python3 bin/log_anonymizer.py -i /var/log/mail.log -o /tmp/mail.log.anonymized --script-log-file anonymized_log --log-level DEBUG

How it Works

The script uses a series of regular expressions to identify different categories of sensitive data (IPs, hostnames, usernames, etc.). It maintains internal mapping tables for each category. When a sensitive piece of data is found, it's looked up in its category's map. If found, the existing anonymized replacement is used. If not, a new unique anonymized identifier (e.g., "anon_ip_1", "anon_hostname_5") is generated, stored in the map, and used for replacement. This ensures that the same original value is always replaced by the same anonymized value throughout the log file.

Dependencies

  • Python 3
  • Standard Python libraries: argparse, logging, re, shutil, tempfile, json.

Clone this wiki locally