Skip to content

Latest commit

 

History

History
153 lines (112 loc) · 4.84 KB

File metadata and controls

153 lines (112 loc) · 4.84 KB

Performance & Best Practices

Tuning Chunk Size

The chunk size is the most important performance lever:

Workload Recommended Chunk Size
Lightweight items (small DTOs) 500 – 5000
Medium items (with relations) 100 – 500
Heavy items (with file I/O) 10 – 100
Single-row tasks 1

Rule of thumb: balance commit frequency (durability/restart granularity) against transaction overhead. Profile to find the sweet spot.

Memory Management

Technique Notes
Cursor-based readers PdoItemReader uses unbuffered queries — O(1) memory
Streaming file readers CsvItemReader reads line-by-line; never file_get_contents()
Generators in IteratorItemReader Pass a generator to avoid loading all data into memory
Periodic gc_collect_cycles() Call after every N chunks for very long-running jobs
Avoid object retention Don't hold references to processed items — let GC reclaim them

PDO Tuning

For unbuffered queries (MySQL):

$pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

For batch INSERTs, prefer prepared statements; for very high volumes consider PdoBatchItemWriter to issue the same statement once per item with optional update assertion.

Transaction Boundaries

Each chunk is wrapped in a single transaction. Keep transactions short:

  • read() is not inside the chunk transaction.
  • Only the write() call is wrapped in a transaction.

Parallel Processing

For CPU-bound or I/O-bound workloads, use PartitionStep:

$partitionStep = $stepBuilderFactory->get('parallelImport')
    ->partitioner($partitioner)
    ->workerStep($workerStep)
    ->gridSize(8)
    ->build();

Choose the appropriate task executor (FiberTaskExecutor, ProcessTaskExecutor, SimpleAsyncTaskExecutor, SyncTaskExecutor) when configuring the partition handler.

Executor Best for
FiberTaskExecutor I/O-bound (HTTP, DB, file) — light context switches
ProcessTaskExecutor CPU-bound — true parallelism via pcntl_fork
SimpleAsyncTaskExecutor Simple async wrapper with concurrency limit
Symfony Messenger Distributed — each partition processed on a worker

Database Indexing

The metadata schema generated by PdoJobRepositorySchema includes the indexes required for the framework's queries. For very large execution histories you may benefit from additional indexes based on your monitoring queries (e.g. on create_time, (job_instance_id, status)).

Cleanup Old Data

Schedule regular cleanup of old executions:

# Symfony
php bin/console batch:cleanup

# Laravel — add a custom command or use deleteJobExecution() directly

PHP OPcache (CLI)

For long-running CLI batch jobs:

; php.ini
opcache.enable_cli=1
opcache.validate_timestamps=0

For jobs running > 30 minutes prefer running them via a worker pool (Symfony Messenger / Laravel Queue) rather than direct CLI to avoid memory fragmentation.

Async Job Launcher

Use AsyncJobLauncher to dispatch jobs to a queue:

$env = BatchProcessing::asyncEnvironment(
    dispatcher: function (int $execId, string $jobName, JobParameters $params): void {
        $messageBus->dispatch(new RunJobMessage($execId, $jobName, $params));
    },
);

Workers process the actual execution, freeing the request thread.

Listener Best Practices

  • Keep listeners fast — they run synchronously inside the chunk loop.
  • Aggregate metrics in the listener and emit at afterStep/afterJob rather than per-item.

Quality Gates

The repository ships with:

  • PHPStan at the highest level (phpstan analyse)
  • PHP-CS-Fixer for consistent style
  • PHPUnit with high coverage requirements (composer test)
  • Infection for mutation testing (infection.json)
  • PHPMD for complexity checks (phpmd.xml)
composer stan   # PHP-CS-Fixer + PHPStan
composer test   # PHPUnit

Next Steps