Releases: SomeOddCodeGuy/WilmerAI
v0.6 - Multi-user improvements, more memory and consistency improvements, and lots of bug fixes
v0.6 - March 2026
Major New Features
-
ContextCompactor Workflow Node — New node type that summarizes conversation messages into two rolling summaries (Old + Oldest) using token-based windowing. Separate from the memory system; designed for recency-aware conversation compaction. Uses XML-style tags and configurable via a settings file.
-
Automatic Memory Condensation — Optional condensation layer for file-based memories. After enough new memories accumulate (configurable threshold), the oldest batch is LLM-summarized into a single condensed entry, reducing file bloat over long conversations.
-
Per-Message Image Association — Major refactor replacing synthetic
{"role": "images"}messages with a per-message"images"key. Images now stay associated with their originating message from ingestion through to LLM dispatch. Includes OpenAI multimodal content parsing on ingestion. -
Claude API Image Support — Full image support for the Claude handler. Supports base64, data URIs, and HTTP URLs. Uses PIL/Pillow for format detection (optional, falls back to JPEG). Images placed before text per Anthropic recommendation.
-
Per-User Encryption — When an API key is provided via
Authorization: Bearer, files are stored in isolated per-key directories. Optional Fernet encryption (AES-128-CBC + HMAC-SHA256, PBKDF2 key derivation) can be enabled per user. Transparent plaintext-to-encrypted migration. Includes a re-keying script. -
Multi-User Support — A single WilmerAI instance can now serve multiple users via repeated
--Userflags. Full per-user isolation: per-user config reads, request-scoped user identification, per-user log directories, aggregated models/tags endpoints. -
WSGI Concurrency Limiting Middleware — New
--concurrency(default: 1) and--concurrency-timeout(default: 900s) CLI flags on all entry points. Requests exceeding the limit queue until a slot opens or timeout (503). Implemented at the WSGI layer so the semaphore holds across streaming responses.
Bug Fixes
-
SillyTavern Streaming Hang — Fixed streaming hang when using SillyTavern as a front end.
-
Open WebUI Streaming Error — Restored JSON heartbeat format (was changed to bare newline, causing JSONDecodeError in Open WebUI's NDJSON parser).
-
Memory Generation Stalling — Fixed memory generation never triggering after the first run due to empty-message hash collision when front-end injects Author's Note with only a
[DiscussionId]tag. -
GetCurrentMemoryFromFile Returning Wrong Data — Was sharing a code path with
GetCurrentSummaryFromFileand returning rolling chat summary instead of memory chunks. Now correctly returns memory chunks. -
Image Lookback Default Regression — Restored default lookback window from 5 back to 10 (was silently halved).
-
Multi-Word Prefix Detection in Streaming — Fixed
StreamingResponseHandlerfailing to strip multi-word response prefixes (e.g., "AI: "). -
Data URI Stripping Before LLM Dispatch — Hardened image key stripping to cover all image formats when
llm_takes_imagesis False.
Hardening and Security
-
Dependency Pinning — All dependencies pinned to exact versions (
==) to mitigate supply chain attacks. Updated several packages includingrequests(CVE fix),cryptography(reverted to 46.0.5, pre-supply-chain-attack window). -
Thread Safety — Per-discussion locks in timestamp service, context compactor, and RAG tool. Thread-safe globals via
threading.local(). Lock dictionaries capped at 500 with LRU eviction. Atomic file writes (temp + rename). -
Sensitive Logging / Prompt Redaction — New
sensitive_logging_utilsmodule. All log statements exposing user content converted to redactable versions. Redaction activates when encryption is enabled orredactLogOutput: true. -
JSON Parsing Hardening — Incoming API handlers now use
get_json(force=True, silent=True)returning 400 instead of unhandled 500 on invalid JSON. -
Configurable Categorization Retries — Removed hardcoded 4-retry loop; now configurable via
maxCategorizationAttempts(default: 1).
Code Quality
-
Optimized variable generation — Conversation-slice variables only computed when referenced in the prompt.
-
Lazy-load
time_context_summary— Skips file I/O when the variable isn't referenced.
v0.5 - Better message variables for prompts, some new nodes, and memory fixes
Summary
NOTE: This introduces new variables to help deprecate variables like "chat_user_prompt_last_twenty". I'm not getting rid of those, for backwards compatibility purposes, but going forward we don't need them as much.
New Workflow Nodes
- JsonExtractor node: extracts fields from JSON in LLM responses without an additional LLM call
- TagTextExtractor node: extracts content between XML/HTML-style tags without an additional LLM call
Configurable Prompt Variables
- nMessagesToIncludeInVariable: node property to control how many messages are included in chat/templated prompt variables
- estimatedTokensToIncludeInVariable: token-budget-based message selection, accumulates recent messages up to a token limit
- minMessagesInVariable + maxEstimatedTokensInVariable: combo mode pulling a minimum message count then filling up to a token budget
Token Estimation
- Recalibrated rough_estimate_token_length word ratio (1.538 -> 1.35 tokens/word)
- Added configurable safety_margin parameter (default 1.10)
Memory System Fixes
- Fixed file_exists check that was permanently disabling message-threshold triggers for new conversations
- Fixed off-by-one in trigger comparisons (> to >=)
- Added HTTP session cleanup via close() to prevent keep-alive connections from blocking llama.cpp slots
- Split timeouts into (connect, read) tuples
- Added diagnostic logging for memory trigger decisions
Code Quality
- Fixed bare except clauses to except Exception in cancellation paths
- Added prompt-aware info logging for configurable variable slicing
Example Workflow Configs
- Updated all example workflow JSON files to use new configurable variable syntax
v0.4.1 - Small hotfix for memories
What's Changed
- Corrected an issue with memory system due to recent change removing the imagespecific handlers. by @SomeOddCodeGuy in #82
v0.4 - Workflow collections, bug fixes, test UI, and some simplification
What's Changed
- Fix oldest message chunk being silently discarded in memory generation
- Fix incorrect new message count causing duplicate processing of memorized messages
- Fix pytest.ini test path case sensitivity
Features:
- Add shared workflow collections and workflow selection via API model field (/v1/models and /api/tags endpoints)
- Add workflow node execution summary logging with timing info
- Add workflowConfigsSubDirectoryOverride for shared workflow folders
- Add sharedWorkflowsSubDirectoryOverride for custom shared folder names
- Add {Discussion_Id} and {YYYY_MM_DD} variables for file paths
- Add variable substitution support for maxResponseSizeInTokens
- Add web-based setup wizard (setup_wizard_web.py) (this is a WIP and may be temporary/replaced)
- Add vector memory resumability with per-chunk hash logging
Refactors:
- Consolidated image handlers into standard handlers (remove ~700 lines)
- Standardize preset/workflow naming convention (hyphenated)
- Archive legacy workflows to _archive subdirectories
- Add pre-configured shared workflow folders
Simplification:
- Updated preset names to match endpoint names. Now it makes more sense, as you can more easily use presets to make sure each endpoint gets the appropriate settings.
- The _example_general_workflow is the one stop shop for example productivity workflows, and thanks to the custom workflow system its easier to spin more off easily. You can just drop new folders into _shared within workflows and suddenly have new workflows available to you as models. I'll make a video about this later.
- Dropped the imagespecific handlers. Finally. Those were something I did early on and I just kept putting off dealing with them, but they always annoyed me. Regular handlers have the image frameworks added in, if they support it.
Tests:
- Update tests for corrected memory hash behavior
- Added tests for new workflow override features
v0.3.1
What's Changed
- Updating urllib3 to correct a dependabot issue by @SomeOddCodeGuy in #79
Full Changelog: v0.3.0...v0.3.1
v0.3.0 - API swapped, Claude Support added, other fixes
- Added support for the Claude llm_api
- Replaced Flask exposed runnable api with Eventlet for MacOS/Linux, and Waitress for Windows
- Fixed the unit tests not running in Windows properly
- Corrected two places where a slash at the end of the llm_api url and at the end of the ConfigDirectory folder name caused a break
- Added attempt at proper cancellation ability, where pressing "stop" in open webui or other front-ends will appropriately end a workflow and cascade down to an LLM
- Some llm apis work with this, some don't. This should appropriately kill Wilmer and its workflows, but an LLM api in the middle of processing a prompt may not be compelled to stop.
- Added the ability to replace Endpoints and Presets with variables
- Limited to hardcoded variables at top of workflow, or agentXInputs from parent workflows
v0.2.1 - New nodes, bug fixes, new docs, and first recursive workflow
What's Changed
- Added new LLM assisted workflow generation document folder. This is a work in progress.
This is still a work in progress, but I have successfully generated a few workflows with this. This is a start in the direction I want to take Wilmer of having its setup and workflow generation be something an LLM can automate easily.
- Fixed streaming on the static response node
- Update partial article wiki node to return N number of results
- Bugfix for thinking tag cleaning. We had a situation where an LLM (magistral 2509) was accounting for thinking tags but not generating any. This resulted in completely empty responses going into agentXOutput, as the whole response was being deleted.
- Added ArithmeticProcessor node
- Added Conditional node
- Added StringConcatenator Node
- Updated Conditional Workflows to allow a content passthrough on default instead of having to go into a workflow
- Added POC for recursive workflow, doing a simple coding workflow as an example. There's a wikipedia workflow coming next, but I want to test it a little more before pushing it out.
Full Changelog: v0.2...v0.2.1
v0.2 - 92% Unit Test Code Coverage, and Bug Fixes
What's Changed
- Unit Tests, bug fixes and documentation moving by @SomeOddCodeGuy in #75
Full Changelog: v0.1.8.2...v0.2
v0.1.8.2 - Quickguide and Documentation Update
Just updating some docs and a few tweaks to some configs
v0.1.8.1 - urllib3 version bump
Updated urllib3 to 2.5.0 to satisfy 2 dependabot issues and clear out security notifications.