Skip to main content

Command Palette

Search for a command to run...

MajorProjectAI – Offline Therapist Chatbot

Published
4 min read

1. Why I Built This

During my work on SaaS automation tools I noticed that most “AI therapy” chatbots send every keystroke to the cloud. That’s a privacy red‑flag—especially when users are sharing their deepest worries.
So I decided to create MentalAI, an open‑source project that:

  • Runs 100 % on the user’s own PC (no internet after first install).

  • Answers like a CBT‑trained psychologist using a compact Llama‑based model.

  • Detects self‑harm language and injects crisis resources automatically.

  • Retrieves evidence‑based coping tips from a local document store.

I uploaded the full source code and a one‑click Docker stack to GitHub so anyone can clone, study, or improve it.

2. Feature Highlights

✔ FeatureWhat it means for users
Offline LLMA 7 B‑parameter Llama model (GGUF‑Q4) runs on CPU or GPU—no cloud calls.
Built‑in safety layerDistil‑RoBERTa classifiers flag extreme distress and inject helpline info.
Local knowledge base650 CBT / DBT handouts embedded with MiniLM + FAISS; instantly retrieved.
Gradio chat UIClean chat page at localhost:7860, streams tokens in real time.
JSONL chat logsEvery turn saved in /logs/ for later analysis (or you can delete them).
Plug‑and‑play Dockerdocker compose up -d spins up API, vector DB, UI, and LLM in one shot.

3. Repository Layout

mentalAI/
│
├─ docker-compose.yml        # one file, three services
├─ src/
│   ├─ api/                  # FastAPI routes
│   ├─ chains/               # LangChain logic
│   ├─ safety/               # classifiers + filters
│   └─ ui/                   # Gradio interface
│
├─ models/                   # Llama GGUF weights (auto‑download)
├─ vectorstore/              # FAISS + Chroma files (auto‑built)
├─ logs/                     # chat transcripts (.jsonl)
└─ docs/                     # CBT PDFs (feedstock for embeddings)

4. End‑to‑End Architecture (words version)

  1. User → Chat UI
    Your browser (on localhost) sends the message to FastAPI.

  2. FastAPI → LangChain
    Acts like a traffic officer, forwarding the text to my LangChain “conversation” object.

  3. Parallel fork in LangChain
    Path A – Safety: On‑device classifier scores mood + self‑harm risk.
    Path B – Knowledge: Sentence‑transformer embeds the text; FAISS returns the top 3 CBT docs.

  4. Prompt Builder
    Combines: therapist persona + retrieved docs + chat history + risk tags → final prompt string.

  5. llama.cpp
    Loads the 7 B GGUF model from the models/ folder and streams a response.

  6. Safety Filter & Logger
    Censors any forbidden content, then appends both user and bot turns to a JSONL file.

  7. UI → User
    The cleaned answer streams back to the chat page.

5. How to Run It (quick start)

# 1. clone my repo
git clone https://github.com/vidit-gupta/mentalAI.git
cd mentalAI

# 2. launch everything
docker compose up -d          # add --build-arg GPU=1 if you have an NVIDIA card

# 3. open in browser
http://localhost:7860          # chat interface

First launch downloads a 4 GB model—grab coffee while it finishes.

Shutdown:

docker compose down

6. Component Details

LayerMy choicesWhy
LLM Enginellama.cpp running Llama‑2‑Chat‑7B‑Q4_K_MTiny enough for laptops but still fluent.
Embeddingsall‑MiniLM‑L6‑v2 (8‑bit)Fast 768‑D vectors, only 40 MB RAM.
Vector DBFAISS through ChromaDBBlazing‑fast local ANN search.
Safety ModelDistil‑RoBERTa fine‑tuned on GoEmotions + CLPsych45 MB; ≈4 ms per message on CPU.
UIGradioZero‑config, easy to customise.
APIFastAPI + UvicornAsync, OpenAPI docs auto‑generated.

7. Data & Privacy Notes

  • No cloud calls after first download. Inspect with Wireshark—only localhost.

  • Erase knowledge by deleting vectorstore/.

  • Erase chat history by deleting any file in logs/.

  • The Llama model file is read‑only; it never logs anything itself.

8. Demo Walk‑Through

  1. Ask: “I’m nervous about public speaking. Help?”

  2. Watch retrieved doc titles in the server log (e.g., ‘5‑Step Grounding Exercise’).

  3. See the bot respond with breathing tips + CBT reframing.

  4. Delete vectorstore/, restart, ask again → reply lacks doc excerpts (proof it’s local).

9. Extending My Project

Want to…Do this
Use a bigger modelReplace models/7B.gguf with a 13 B file; bump RAM/GPU.
Add your own PDFsDrop them in docs/ and run python ingest.py.
Encrypt logsMount logs/ to an encrypted volume (e.g., VeraCrypt).
Multi‑user authEnable FastAPI OAuth; store user creds in SQLite.
Mobile front‑endKeep FastAPI; build a Flutter app that hits /chat.

10. Known Limitations

  • Not a licensed therapist—include a disclaimer in production.

  • LLM may hallucinate; retrieval helps but doesn’t fix everything.

  • Slow on very weak CPUs (Raspberry Pi won’t cut it).

  • Safety classifier can mis‑read sarcasm; adjust the risk threshold if needed.

11. License & Credits

  • Code: MIT License.

  • Model weights: subject to Meta’s Llama 2 community license.

  • Embedded docs: Public‑domain WHO materials + DBT handouts (permitted for educational use).