-
-
Notifications
You must be signed in to change notification settings - Fork 9
Log Anonymizer Script
The log_anonymizer.py script is a utility designed to anonymize sensitive data within log files, with a particular focus on Postfix mail logs. This is useful for sharing log excerpts for troubleshooting purposes or for archiving logs while minimizing privacy concerns.
- Temporary Copy: Creates a temporary copy of the input log file before processing to ensure the original file remains untouched.
-
Data Anonymization: Identifies and anonymizes various types of sensitive information:
- Postfix and Amavis server names (from log line prefixes).
- Usernames (extracted from email addresses).
- IP Addresses (IPv4).
- Hostnames (FQDNs).
- Full
sasl_username=values. - Entire content of email subjects found in
Subject: "..."lines (e.g., toanon_subject_1). - Sender identifiers within angle brackets (
<...>) inNOQUEUE: reject: RCPT from ...log lines (e.g., toanon_rejected_sender_1).
- Consistent Replacement: Ensures that each unique sensitive value is consistently replaced with the same anonymized identifier (e.g., "1.1.1.1" always becomes "anon_ip_1", "user@example.com" always becomes "anon_user_1@anon_hostname_1").
- Data Preservation: Timestamps and other non-sensitive parts of log lines are preserved.
python3 bin/log_anonymizer.py -i <input_log_file> -o <output_anonymized_file> [options]-
-i/--input-file <path>: (Required) Path to the original log file. -
-o/--output-file <path>: (Required) Path where the anonymized log file will be saved. -
-t/--temp-dir <path>: (Optional) Path to a directory to use for temporary file storage. Defaults to the system's standard temporary directory. -
--config <json_file_path>: (Optional) Path to a JSON configuration file. The--configoption is available for future enhancements, such as customizing anonymization rules. Currently, its main use would be for advanced placeholder formatting if this feature is expanded in the script. -
--log-level <level>: (Optional) Set the logging level for script output. Options: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default: INFO. -
--script-log-file <path>: (Optional) Path to a file where script execution logs (INFO, DEBUG, ERROR, etc.) will be saved. If not provided, logs are only output to the console.
python3 bin/log_anonymizer.py -i /var/log/mail.log -o /tmp/mail.log.anonymized --script-log-file anonymized_log --log-level DEBUGThe script uses a series of regular expressions to identify different categories of sensitive data (IPs, hostnames, usernames, etc.). It maintains internal mapping tables for each category. When a sensitive piece of data is found, it's looked up in its category's map. If found, the existing anonymized replacement is used. If not, a new unique anonymized identifier (e.g., "anon_ip_1", "anon_hostname_5") is generated, stored in the map, and used for replacement. This ensures that the same original value is always replaced by the same anonymized value throughout the log file.
- Python 3
- Standard Python libraries:
argparse,logging,re,shutil,tempfile,json.