feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31
Open
feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31
Conversation
- Injected export pipeline guaranteeing MLX metal library initialization hooks bypass Github Action test environments natively
- Introduced currentWing target on ChatViewModel for persona routing - Intercepted userText explicitly searching SwiftData native memories - Pre-pended retrieved factual context invisibly inside system prompts ensuring zero-latency, 100% stable context retention across all dumb models seamlessly
…teObject lifecycle bug
…rference and make downloaded models directly tappable to load
…hat bypasses token.isThinking flag
…for finalized chat messages
…r via HubApi for SwiftBuddy
… MLX/GGUF formatting for Hub queries
…ing the Search UI model list
…o prevent macOS layout recursion crashes resulting in blank models
…cursive background querying for HF Hub discovery
…ative Hub cursor pagination
…skeleton constraints for HuggingFace Hub modal layout
…tize absolute cached file size in row view
…ng to RegistryService to trace GitHub API access drops
…hed persona.json and statically request known room txt files
…g preventing successful 404 recovery
… WAL transaction flooding during massive persona corpus ingestion
…lts on SwiftBuddyApp boot sequence
…oops by converting TextEditor blocks to vertical TextFields inside iOS/macOS active ScrollViews
…ine and introduce Native graphical Map hierarchy for memory rooms
…or teardown on macOS modal sheets
…natively into ChatView toolbars for RAG identity mapping
…tly reflect the currently selected memory persona wing
…try and pivot root Navigation to a primary Friends List model
…ectures by forcefully prepending RAG variables linearly against raw User instructions rather than allocating hostile System Role bounds
…s/checkout fatal error
This commit ties the SwiftLM root back to the stabilized mlx-swift-lm papps-ssd-streaming branch containing the architectural RoPE phase bug fixes and LM head initialization fixes. It also updates Server.swift routing logically to match the Omni input payloads.
…ns natively to prevent padding loops
…ving overflow on M1/M2
…aming and Auto-Release CI
…ware test metrics
…o tower The ALMModelFactory used LLMModelFactory (text-only) which never loaded the Gemma4 audio tower or extracted mel features. Switching to OmniModelFactory ensures the VLM weights (audio_tower, embed_audio) are loaded and Gemma4's native prepareForMultimodal path extracts audio features into LMInput.audio, enabling real audio grounding instead of mock token interleaving.
Replace SoundHelix MP3 download with macOS 'say' TTS for a reproducible, network-independent speech transcription test. Update Turn 2 closed-loop question to 'what animal is mentioned' (tests context reasoning on 'fox') instead of music-specific instrument question.
… quants - ALMModelFactory._load now delegates to VLMModelFactory instead of LLMModelFactory, ensuring the Gemma4 audio tower and VLM weights are loaded for both 4-bit and 8-bit quantizations. - Add port-drain wait + log truncation before starting ALM test server to prevent health check from hitting a stale VLM process on port 5431. - ALM payloads: bump max_tokens 100→500 and add thinking:false to suppress reasoning chain tokens in transcription output. - Benchmark: update Turn 2 closed-loop question to test factual comprehension of the transcribed speech content.
ThinkingStateTracker (Server.swift): - Add <|channel|>thought...<channel|> Gemma4 native thinking token format alongside existing Qwen3/DeepSeek <thinking>...</thinking> support - Refactor partial-match into isPartialThinkingTag() helper to avoid Swift type-checker timeout on long || chains ALM benchmark (run_benchmark.sh): - Raise max_tokens 100→500 (Turn 1) and 100→200 (Turn 2) for full output - Add enable_thinking=false to both payloads to disable CoT at request level - Update prompts: 'Transcribe word for word, output only transcription' - Turn 2 closed-loop: 'summarize what the speaker said' (model quality test) - Strip thinking blocks in bash result extraction as belt-and-suspenders fallback
The error was a combination of two bugs: 1. gen_tokens=0 from a stale server left behind between builds (not a code bug — killall SwiftLM in the benchmark now reliably cleans up) 2. Python extractor treated empty content (gen_tokens=0) as 'ERROR' and aborted, masking the real underlying issue with a misleading message Fixes: - Use correct regex <|channel|>thought...<channel|> (not the old broken lookahead pattern that never matched Gemma4's actual thinking format) - Empty Turn 1 response now prints [WARN: gen_tokens=X, empty response] and continues instead of aborting the test run - Empty Turn 2 prints [empty] to make the zero-token case observable - Crash detection (server connection dropped) is now a distinct error message, separate from model-produced-empty-content
- Update README to document the SharpAI/mlx fork and streaming logic - Resolve swift-metrics package bounds
- Fallback softly on string lengths when tokenizing prompt length sizes - Adapt ThinkingStateTracker parsing for generic <think> tags over custom ones - Inline ALM type logic for Whisper registration
- Resolve the onChange parameter deprecations - Safely unroll directory enumerators with sequence unwrapping - Use let instead of var in chat payloads to prevent mutability warnings
The prompt cache stores KV state keyed on text tokens only. When a multimodal request (with image or audio) arrived after text-only requests with shared prefixes (e.g. BOS, system prompt), the cache was hitting and creating a trimmed LMInput that discarded the image/audio from LMInput. This caused Gemma4.prepare() to see input.image == nil and input.audio == nil even though the processor had correctly set processedImage/processedAudio, so getInputEmbeddings() skipped the vision/audio feature injection entirely. Symptom: Model would respond 'There is no audio clip provided to transcribe' despite audio tokens being present in the prompt token sequence. Fix: Add isMultimodalRequest guard before the prompt cache restore call. Multimodal requests always take the full-prefill path so that prepare() receives the complete LMInput with both modalities intact.
Points to fix/audio-fft-packing @ f2ef61f which contains: - ConformerBlock residual/normalization architectural fix - conv_norm corrected to AudioRMSNorm - Debug diagnostics stripped PR open at: SharpAI/mlx-swift-lm#fix/audio-fft-packing → main
Audio conformer pipeline now merged. Submodule tracks: ea87c09 Merge pull request #14 from SharpAI/fix/audio-fft-packing
That model has audio_config=null (VLM-only, no audio tower) and will always respond 'no audio provided' regardless of input. Added inline comment explaining the restriction. Also capture latest profiling results.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Title:
feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization🚀 Summary
This PR merges the entire
feature/swiftbuddy-mempalace-v1architecture intomain. This milestone finalizes our end-to-end multimodal inference pipeline over Gemma 4 (Text + Audio + Vision), heavily refines the SwiftBuddy interface natively for macOS, replaces the legacy persona integration pipeline with lightning-fast RAG vector synthesis (Memory Palace), and hardens the core local-inference framework to cleanly package zero-secret GitHub Action releases.🧠 Core Architectural Upgrades
DraftModelRefspeculative-decoding integration failures inServer.swift. Re-exposed the criticalGemma4Configurationpayload structs so thatMLXVLMprocessing can successfully bind scaling integers across the audio array ingestion layers. NativeinputEmbeddingoverrides are permanently restored!GraphPalaceServicein favor of precise, deterministic semantic searches usingMemoryPalaceService.synthesizePersonaIndex. To secure massive compute continuity, RAG directives are forcefully appended into dynamic UI states to avoid fracturing identical MLX KV-cache hits!pread()blocking when sliding across enormous MoE configurations.🎨 Layout & Discovery Aesthetics (SwiftBuddy)
Toolbarbounds for native UI fluidity. TrappedNSDetectedLayoutRecursionrendering faults inherently caused by Apple'sTextEditorSwiftUI boundaries.ByteCountFormatter, traps undocumented GGUF metadata returns, and natively embeds offline-verified "Interactive Summoning" states directly into individual model rows.🧪 Pipeline Validation
SwiftLMTestSTFTharness evaluation points.swift build -c release.