LLM Chat is a chatbot leveraging industry-standard LLM models (OpenAI, Gemini, Anthropic) to converse and capture patient's behavior in between therapy sessions. This allows clinicians to have a good understanding of their patients.
- Clarify on different workflows and states
- User active chat should show both started and non-started chat
- Add filtering/sorting order for rovider chat window / potentially rearrange order
- CHeck if report generate per convo or per window
- Add more informational for when creating new chat window
- Adding model for report
- Adding header, standardizing language across pages and chat creation
- Navigation standardization, settings hiding, lock ability to recreate chat, showing inactive/past convo
-
Open Terminal (press
Cmd + Space, type "Terminal", and press Enter) -
Install Homebrew (if you don't have it):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -
Install Python and Git:
brew install python@3.11 git
-
Download the project:
cd ~/Desktop git clone https://github.com/yourusername/cosmos.git cd cosmos
-
Set up Python environment:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txtNote: Run
source .venv/bin/activateeach time you open a new Terminal window to work on this project. -
Configure API keys:
cp .env.example .env open -a TextEdit .env
In TextEdit, add at least one API key and save the file. The app will automatically load it when you start.
-
Start the application:
python manage.py
Open your browser and go to:
http://localhost:5000
For complete privacy without cloud APIs:
brew install ollama
ollama serve # Keep this running in one Terminal window
ollama pull llama3.2:1b # Run in a new Terminal windowTrade-offs: Complete privacy and free to use, but slower responses (30-60s vs 5-10s) and lower quality than cloud models.
cd ~/Desktop/cosmos
git pull
source .venv/bin/activate
pip install -r requirements.txt
python manage.pyAll conversations maintain context across messages, sending the last 20 messages to the LLM with each new message. This allows the AI to remember earlier parts of the conversation and provide contextually relevant responses.
- System prompts are included at the beginning of each request
- Older messages (beyond the last 20) are automatically dropped to manage token limits
- To adjust the context window, modify the
limit(20)value inllm_chat/routes/conversations.py:260
- Cloud APIs (OpenAI, Anthropic, Google): Fast responses (5-10 seconds)
- Local Ollama on M1 MacBook Air: Slower responses (~30-60 seconds for Llama 3.2 1B)
- Local models prioritize privacy over speed - ideal for development and sensitive data
API documentation implemented with Swagger available at /docs
|-- main
| |-- llm_chat # backend
| |-- models # database model
| |-- routes # api routes
| |-- services # setup LLM interface
| |-- utils # role management
| |--__init__.py
| |--extensions.py
| |-- client # frontend (soon)
| |-- templates # frontend (current)
| |-- manage.py # app entry point
| |-- README.mdPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.