Skip to content

TreasureProject/Golem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Golem

Embodied AI agents that learn through experimentation.

Drop a character into a scene. Claude sees it through vision models, experiments with what's possible, remembers what works, and writes new code when needed. No predefined action lists. No hardcoded behaviors. The character discovers its own capabilities.

Golem is open source because the metaverse should not be owned by one company nor should foundational AI character systems. Instead of vendor lock-in, Golem defines an open standard for AI-to-character communication so that AI can control characters in any game engine. Golem characters learn through exploration, not pre-programming. They see their world, experiment, remember what works, and become co-contributors to the virtual worlds they inhabit.

Bring your own AI. No vendor lock-in. Contribute to Golem's codebase.

Why Golem?

Traditional AI characters (Convai, Inworld):

  • Developer defines 12 actions the character can do
  • AI picks from the menu
  • Character is limited to what was anticipated
  • Locked into their AI, their pricing, their roadmap

Golem:

  • Developer provides a character and a scene
  • Claude explores through vision and trial-and-error
  • Character discovers what's possible
  • Claude writes new scripts when needed
  • You choose the AI β€” Claude, GPT, local models, whatever comes next

As AI models improve, Golem characters automatically inherit those improvements. We're not building AIβ€”we're building the embodiment layer for whatever AI becomes.

Core Principles

πŸ”“ Open Source

Golem is MIT licensed. No API keys required to get started. No per-conversation fees. Run it locally, modify it freely, deploy it anywhere.

πŸ”Œ Bring Your Own AI

Not locked into any AI provider. Connect Claude for advanced reasoning, GPT for conversation, a local Llama for privacy, or your own fine-tuned model. Swap backends without changing game code.

πŸ“‘ Standard Protocol

A simple, documented WebSocket protocol for AI-to-character communication. Implement it once in any engineβ€”Unity, Unreal, Godot, web. Any AI that speaks the protocol can control any character that implements it. No proprietary SDKs.

🧠 Learning Over Programming

Characters discover their capabilities through experimentation, not configuration. Vision models see the scene. Trial-and-error finds what works. Memory retains what's learned. Code generation creates new abilities.

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Your AI Backend                       β”‚
β”‚         Claude β€’ GPT β€’ Llama β€’ Your Fine-tune           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Vision Language Model                    β”‚
β”‚                   Sees the Unity scene                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Golem Protocol (WebSocket)                  β”‚
β”‚           Standard JSON messages over WS                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Golem Runtime                         β”‚
β”‚         Unity β€’ Unreal (soon) β€’ Godot (soon)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Feedback Loop                          β”‚
β”‚       Did it work? β†’ Memory β†’ Pattern Recognition        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Vision β€” AI sees the scene through vision language models
  2. Experimentation β€” Try actions, observe results
  3. Memory β€” Remember what works, what doesn't
  4. Pattern Recognition β€” Generalize from experience
  5. Code Generation β€” Write new capabilities when needed

The character learns its environment like a child learns to walkβ€”through exploration, not instruction.

Quick Start

1. Clone and Open in Unity

git clone https://github.com/TreasureProject/Golem.git

Open the project in Unity 2022.3+.

2. Connect Your AI Backend

Golem connects to any AI server via WebSocket:

ws://localhost:5173/agents/chat/external:{agentId}

Your server receives scene state and sends commands. Use Claude, GPT, a local modelβ€”whatever you want.

3. Run

Press Play. The AI sees the scene, experiments, and learns.

The Golem Protocol

A simple JSON-over-WebSocket protocol. Any AI that produces these messages can control any Golem-compatible character.

Movement

{
  "type": "character_action",
  "data": {
    "action": {
      "type": "moveToLocation",
      "parameters": { "location": "cafe" }
    }
  }
}

Voice + Lip Sync

{
  "type": "emote",
  "data": {
    "type": "voice",
    "audioBase64": "<base64-encoded-audio>"
  }
}

Animations

{
  "type": "emote",
  "data": {
    "type": "animated",
    "animation": { "name": "wave", "duration": 2.0 }
  }
}

Facial Expressions

{
  "type": "facial_expression",
  "data": {
    "expression": "happy",
    "intensity": 0.9
  }
}

Expressions: happy, sad, surprised, angry, neutral, thinking

Dynamic Scripting

{
  "type": "script",
  "data": {
    "code": "<C# code to execute>",
    "target": "character"
  }
}

The AI can write and execute new behaviors at runtimeβ€”not limited to predefined actions.

Scene State (Runtime β†’ AI)

{
  "type": "scene_state",
  "data": {
    "character": { "position": [0, 0, 5], "state": "idle" },
    "objects": [...],
    "screenshot": "<base64-encoded-image>"
  }
}

The AI receives visual and structured feedback to close the learning loop.

Comparison

Convai/Inworld Golem
Action space Predefined by developer Discovered by AI
Vision None Vision language models
Learning None Trial-and-error + memory
Code generation None Runtime scripting
AI backend Locked to their API Any (Claude, GPT, local)
Protocol Proprietary SDK Open WebSocket standard
Pricing Per-API-call Open source / free
Improvement Their roadmap Inherits AI advances

Architecture

Golem/
β”œβ”€β”€ Assets/
β”‚   β”œβ”€β”€ Scripts/
β”‚   β”‚   β”œβ”€β”€ Character/
β”‚   β”‚   β”‚   β”œβ”€β”€ PointClickController.cs       # NavMesh movement
β”‚   β”‚   β”‚   β”œβ”€β”€ CharacterActionController.cs  # Action routing
β”‚   β”‚   β”‚   └── EmotePlayer.cs                # Voice + lip sync
β”‚   β”‚   β”œβ”€β”€ Systems/
β”‚   β”‚   β”‚   β”œβ”€β”€ Networking/
β”‚   β”‚   β”‚   β”‚   └── CFConnector.cs            # WebSocket client
β”‚   β”‚   β”‚   └── Camera/
β”‚   β”‚   β”‚       └── CameraStateMachine.cs     # Camera control
β”‚   β”‚   └── Utils/
β”‚   β”‚       └── WavUtility.cs                 # Audio decoding
β”‚   β”œβ”€β”€ Plugins/
β”‚   β”‚   └── SALSA LipSync/                    # Lip sync
β”‚   └── Scenes/
β”‚       └── Main.unity
└── README.md

Core Components

Component Purpose
CFConnector.cs WebSocket client, connects to any AI backend
CharacterActionController.cs Routes AI commands to character
PointClickController.cs NavMesh movement + interaction states
EmotePlayer.cs Voice playback with SALSA lip sync

Configuration

In the Unity Inspector, configure CFConnector:

Setting Default Description
Host localhost:5173 AI server address
Agent Id character Agent identifier
Use Secure false Use wss://
Query Token β€” Auth token

Debug Controls

Test actions manually while developing:

Key Action
1 Move to location
2 Sit at chair
3 Stand up
4 Examine display
5 Play arcade
6 Change camera
7 Idle
Space Stand up

Contributing

We welcome contributions:

  • Protocol improvements
  • New runtime implementations (Unreal, Godot, web)
  • AI backend adapters
  • Documentation

License

MIT β€” Use it however you want.

Links


Golem is built by Treasure, building the future of interactive IP and AI-driven entertainment experiences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published