Skip to content

Latest commit

 

History

History
334 lines (237 loc) · 7.06 KB

File metadata and controls

334 lines (237 loc) · 7.06 KB

RSFE - Rust Source File Encoding Fixer

Read in English | Leia em Português

A Rust command-line tool for automatic validation and correction of file encodings in projects, designed to run as a pre-commit hook via Husky.

Features

  • Automatic encoding detection: Automatically detects file encoding using chardetng
  • Smart conversion: Converts only when necessary to avoid unnecessary changes
  • Flexible configuration: Supports configuration file with glob patterns
  • Optimized performance: Parallel processing using Rayon
  • Git integration: Processes only staged files or the entire project
  • Interactive interface: Allows encoding selection when there's no configuration
  • Respects .gitignore: Automatically ignores files listed in .gitignore

Installation

Prerequisites

  • Rust 1.70+ (install via rustup)
  • Git (for pre-commit integration)
  • Node.js and npm (optional, for Husky integration)

Quick installation

# Clone or copy the project
git clone <your-repository>
cd rs-precommit-fix-encode

# Run the installation script
chmod +x install.sh
./install.sh

The installation script will:

  1. Compile the project in release mode
  2. Automatically configure Husky (if Node.js project detected)
  3. Create the pre-commit hook

Manual installation

# Compile
cargo build --release

# Binary will be at: ./target/release/rsfe

Configuration

rsfe.conf file

Create an rsfe.conf file in your project root. Format:

# Comments start with #
<glob-pattern> <ENCODING>

# Example:
** UTF-8                    # Default for all files
**/*.js UTF-8               # JavaScript files
**/*.py UTF-8               # Python files
legacy/** AUTO              # Auto-detection for legacy folder
old-data/**/*.txt WINDOWS-1252  # Specific encoding

Supported encodings

  • UTF-8 (recommended default)
  • UTF-16LE (UTF-16 Little Endian)
  • UTF-16BE (UTF-16 Big Endian)
  • ISO-8859-1 (Latin-1)
  • WINDOWS-1252 (CP1252)
  • AUTO (automatic detection)
  • Any encoding supported by the encoding_rs library

Configuration example

Copy the example file:

cp rsfe.conf.example rsfe.conf

Edit as needed for your project.

Usage

Standalone mode

# Process entire project
./target/release/rsfe

# If no rsfe.conf exists, you'll be prompted to choose the default encoding

With Git (staged files)

# Stage files
git add .

# Run rsfe (will process only staged files)
./target/release/rsfe

As pre-commit hook with Husky

If you installed via install.sh, the hook is already configured. Each commit will:

  1. Run rsfe on staged files
  2. Convert encodings if necessary
  3. Re-stage modified files
  4. Proceed with the commit
# Normal git usage
git add .
git commit -m "My message"
# rsfe will run automatically

Manual Git hook configuration

If not using Husky, add to .git/hooks/pre-commit:

#!/bin/sh

# Run rsfe
./target/release/rsfe

# If it fails, cancel the commit
if [ $? -ne 0 ]; then
    echo "❌ rsfe failed. Fix errors before committing."
    exit 1
fi

# Re-stage modified files
git add -u

exit 0

Make the hook executable:

chmod +x .git/hooks/pre-commit

How it works

Execution flow

  1. Load configuration: Reads rsfe.conf or prompts for default encoding
  2. Collect files:
    • If in git repository with staged files: process only staged files
    • Otherwise: process entire project
  3. Filter files: Ignore binaries and respect .gitignore
  4. Process in parallel: Uses Rayon for multi-threaded processing
  5. For each file:
    • Determine target encoding based on rules
    • Detect current encoding
    • Check if conversion is needed
    • Convert if necessary
  6. Report results: Show how many files were converted

Conversion detection

RSFE only converts when truly necessary, by checking:

  1. If there are errors when decoding with the target encoding
  2. If re-encoding produces different content than the original

This avoids unnecessary commits of files already in the correct encoding.

Performance

RSFE is optimized for performance:

  • Parallel processing: Uses all CPU cores via Rayon
  • Smart reading: Automatically ignores binary files
  • Detection cache: chardetng is efficient for encoding detection
  • Conditional conversion: Only converts when truly necessary

Typical benchmark (hardware dependent):

  • ~10,000 files processed in ~2-5 seconds
  • Necessary conversions: adds ~1-2ms per file

Ignored files

By default, RSFE ignores:

Directories

  • node_modules
  • target
  • dist
  • build
  • .git
  • .idea
  • .vscode
  • Any directory in .gitignore

Extensions (binaries)

  • Images: .png, .jpg, .jpeg, .gif
  • Documents: .pdf
  • Compressed files: .zip, .tar, .gz
  • Executables: .exe, .dll, .so, .dylib

Usage examples

JavaScript/TypeScript project

# rsfe.conf
** UTF-8
**/*.js UTF-8
**/*.ts UTF-8
**/*.jsx UTF-8
**/*.tsx UTF-8
**/*.json UTF-8

Python project

# rsfe.conf
** UTF-8
**/*.py UTF-8
**/*.md UTF-8
requirements.txt UTF-8

Project with legacy files

# rsfe.conf
** UTF-8
legacy/** AUTO           # Auto-detect
docs/old/*.txt WINDOWS-1252

Troubleshooting

"Rust is not installed"

Install Rust via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

"Compilation failed"

Check Rust version:

rustc --version  # Should be 1.70+

Update if necessary:

rustup update

Binary files being processed

Add the extension to the ignore list at src/main.rs:251.

Unsupported encoding

Check the encoding_rs documentation for supported encodings.

Development

Project structure

rs-precommit-fix-encode/
├── src/
│   └── main.rs          # Main code
├── Cargo.toml           # Dependencies
├── rsfe.conf.example    # Configuration example
├── install.sh           # Installation script
├── .husky/
│   └── pre-commit       # Husky hook
├── README.md            # English documentation
└── LEIAME.md            # Portuguese documentation

Run in debug mode

cargo run

Tests

cargo test

Optimized build

cargo build --release --target x86_64-unknown-linux-musl  # Static Linux binary

Contributing

  1. Fork the project
  2. Create a branch for your feature (git checkout -b feature/MyFeature)
  3. Commit your changes (git commit -m 'Add MyFeature')
  4. Push to the branch (git push origin feature/MyFeature)
  5. Open a Pull Request

License

MIT License - see the LICENSE file for details.

Author

Created to ensure encoding consistency in projects and avoid special character issues in commits.

Useful links