Skip to content

[Performance] Optimize CLI startup time via lazy loading and deferred initialization #25

@albatrosary

Description

@albatrosary

Startup Performance Improvement Ideas

The CLI does not feel critically slow, but its startup has a slight delay.
At this stage, rewriting the whole project in Go would likely be excessive.
A better approach is to optimize the Python startup path first.

The main idea is simple:

Do not prepare everything at process startup.
Only prepare what is actually needed for the current command.

Goal

Reduce startup latency for multi-ai-cli, especially for lightweight commands and single-shot CLI usage.

Examples:

  • --help
  • --version
  • simple one-agent runs
  • shell-style usage with stdin/stdout
  • editor mode

1. Use lazy imports for heavy modules

Heavy SDK imports should not run unless the selected engine actually needs them.

Candidates:

  • google.generativeai
  • openai
  • anthropic
  • dotenv

Current problem

These modules may be imported at process startup even when they are not used.

Proposed change

Move imports into engine-specific methods or factory functions.

Example

class OpenAIEngine(BaseEngine):
    def _get_client(self):
        if self.client is None:
            from openai import OpenAI
            self.client = OpenAI(api_key=self.api_key)
        return self.client

Expected benefit

  • Faster startup for commands that do not use that engine
  • Faster --help / --version
  • Less unnecessary initialization work

2. Lazily initialize SDK clients

Do not create API clients in __init__() if they are not immediately needed.

Current problem

Client creation may trigger unnecessary internal setup during startup.

Proposed change

Store configuration only in __init__(), and instantiate the client only at first use.

Example

class ClaudeEngine(BaseEngine):
    def __init__(self, api_key):
        self.api_key = api_key
        self.client = None

    def _get_client(self):
        if self.client is None:
            from anthropic import Anthropic
            self.client = Anthropic(api_key=self.api_key)
        return self.client

Expected benefit

  • Lower startup overhead
  • Engine creation becomes cheaper
  • Only the active engine pays the initialization cost

3. Avoid unconditional .env loading at startup

Loading .env on every process launch may add avoidable overhead.

Proposed change

  • Load .env only when needed
  • Or make it optional
  • Or load it once inside a dedicated helper

Example

_ENV_LOADED = False

def ensure_env_loaded():
    global _ENV_LOADED
    if _ENV_LOADED:
        return
    try:
        from dotenv import load_dotenv
        load_dotenv()
    except ImportError:
        pass
    _ENV_LOADED = True

Expected benefit

  • Faster startup in environments where shell env vars are already set
  • Less unnecessary file-system work

4. Keep logging initialization minimal

Heavy logging setup at startup can slow down short-lived CLI runs.

Current problem

File handlers, rotating log files, directory creation, and verbose formatters may run even for trivial commands.

Proposed change

  • Use minimal logging by default
  • Enable file-based logging only in debug / verbose mode
  • Delay expensive handler setup until necessary

Example

logger = logging.getLogger("multi_ai")
logger.addHandler(logging.NullHandler())

Expected benefit

  • Faster startup for normal usage
  • Cleaner behavior for shell pipelines

5. Parse arguments before expensive initialization

Do not initialize configs, SDKs, or logging before knowing what command was actually requested.

Proposed change

Move expensive setup after parse_args().

Bad

load_config()
setup_logging()
init_all_engines()
args = parse_args()

Better

args = parse_args()

if args.version:
    print(VERSION)
    return

setup_logging_if_needed(args)
config = load_config_if_needed(args)
engine = create_engine_if_needed(args, config)

Expected benefit

  • Lightweight commands return faster
  • Avoids wasted initialization work

6. Instantiate only the selected engine

There is no need to create all engine objects on every launch.

Proposed change

Keep a registry of engine classes and instantiate only the requested one.

Example

ENGINE_MAP = {
    "gpt": OpenAIEngine,
    "claude": ClaudeEngine,
    "gemini": GeminiEngine,
}
engine_cls = ENGINE_MAP[engine_name]
engine = engine_cls(...)

Expected benefit

  • Lower startup cost
  • Easier scaling when more engines are added later

7. Delay config, history, and persona loading

Files should be read only when required for the current command.

Proposed change

  • Load history only for the active engine
  • Load persona only when specified
  • Skip config reads for commands that do not need engine execution

Expected benefit

  • Less file-system overhead
  • Better startup for lightweight commands

8. Reduce startup-time file checks and scans

Repeated file existence checks and repeated reads can add unnecessary overhead.

Proposed change

  • Avoid repeated Path.exists() checks
  • Do not scan all related files up front
  • Cache paths or decisions where appropriate

Expected benefit

  • Lower I/O overhead at startup

9. Avoid compiling unnecessary regex patterns at startup

If some regex patterns are only used in specific modes, they should not be compiled eagerly.

Proposed change

Compile rarely used regexes lazily.

Example

_CODE_FENCE_RE = None

def get_code_fence_re():
    global _CODE_FENCE_RE
    if _CODE_FENCE_RE is None:
        _CODE_FENCE_RE = re.compile(...)
    return _CODE_FENCE_RE

Expected benefit

  • Slightly lower startup overhead
  • Keeps hot path smaller

10. Reduce startup-time output and formatting work

Verbose banners and repeated startup messages may affect perceived responsiveness.

Proposed change

  • Keep normal startup quiet
  • Print detailed status only in debug / verbose mode
  • Avoid expensive preview formatting unless needed

Expected benefit

  • Better perceived performance
  • Cleaner stdout behavior for piping

11. Isolate editor / shell / agent paths

Different modes should not pay for each other’s initialization costs.

Proposed change

Keep main() as a lightweight dispatcher and isolate mode-specific setup inside dedicated functions.

Example

def main():
    args = parse_args()

    if args.version:
        return show_version()

    if args.editor:
        return run_editor_mode(args)

    return run_agent_mode(args)

Expected benefit

  • Editor mode does not need to import AI SDKs
  • Shell-only workflows stay lightweight

12. Measure import cost explicitly

Before rewriting structure aggressively, measure which imports are actually expensive.

Commands

python -X importtime multi_ai_cli.py --version
python -X importtime -c "import multi_ai_cli"

Expected benefit

  • Optimization becomes evidence-based
  • Avoids premature or irrelevant refactoring

13. Add simple startup profiling

Import cost is only part of the story. Initialization code may also be slow.

Candidates to measure

  • config loading
  • history loading
  • persona loading
  • log setup
  • temp file setup
  • engine creation

Example

t0 = time.perf_counter()
config = load_config()
t1 = time.perf_counter()
logger.debug("load_config: %.3f ms", (t1 - t0) * 1000)

Expected benefit

  • Makes bottlenecks visible
  • Helps prioritize the highest-impact fixes

14. Consider a lightweight startup mode

For shell-style usage, a dedicated fast path may be useful.

Possible options

  • --fast
  • --no-history
  • --no-log-file
  • --no-dotenv

Expected benefit

  • Better performance for pipeline-oriented or short-lived runs
  • More UNIX-like behavior when desired

Recommended implementation order

Highest priority

  1. Lazy imports for heavy SDKs
  2. Lazy client initialization
  3. Parse args before expensive setup
  4. Minimal logging by default

Next

  1. Delay config / history / persona loading
  2. Instantiate only the selected engine
  3. Reduce startup-time messages

Later if needed

  1. Lazy regex compilation
  2. Mode isolation improvements
  3. Lightweight startup mode

Summary

The key optimization is not “rewrite everything for speed.”
It is this:

Stop preparing everything up front.
Prepare only what the current command actually needs.

That approach should improve startup latency while preserving the current Python codebase and keeping the design mainta

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions