MajorProjectAI – Offline Therapist Chatbot
1. Why I Built This
During my work on SaaS automation tools I noticed that most “AI therapy” chatbots send every keystroke to the cloud. That’s a privacy red‑flag—especially when users are sharing their deepest worries.
So I decided to create MentalAI, an open‑source project that:
Runs 100 % on the user’s own PC (no internet after first install).
Answers like a CBT‑trained psychologist using a compact Llama‑based model.
Detects self‑harm language and injects crisis resources automatically.
Retrieves evidence‑based coping tips from a local document store.
I uploaded the full source code and a one‑click Docker stack to GitHub so anyone can clone, study, or improve it.
2. Feature Highlights
| ✔ Feature | What it means for users |
| Offline LLM | A 7 B‑parameter Llama model (GGUF‑Q4) runs on CPU or GPU—no cloud calls. |
| Built‑in safety layer | Distil‑RoBERTa classifiers flag extreme distress and inject helpline info. |
| Local knowledge base | 650 CBT / DBT handouts embedded with MiniLM + FAISS; instantly retrieved. |
| Gradio chat UI | Clean chat page at localhost:7860, streams tokens in real time. |
| JSONL chat logs | Every turn saved in /logs/ for later analysis (or you can delete them). |
| Plug‑and‑play Docker | docker compose up -d spins up API, vector DB, UI, and LLM in one shot. |
3. Repository Layout
mentalAI/
│
├─ docker-compose.yml # one file, three services
├─ src/
│ ├─ api/ # FastAPI routes
│ ├─ chains/ # LangChain logic
│ ├─ safety/ # classifiers + filters
│ └─ ui/ # Gradio interface
│
├─ models/ # Llama GGUF weights (auto‑download)
├─ vectorstore/ # FAISS + Chroma files (auto‑built)
├─ logs/ # chat transcripts (.jsonl)
└─ docs/ # CBT PDFs (feedstock for embeddings)
4. End‑to‑End Architecture (words version)
User → Chat UI
Your browser (onlocalhost) sends the message to FastAPI.FastAPI → LangChain
Acts like a traffic officer, forwarding the text to my LangChain “conversation” object.Parallel fork in LangChain
Path A – Safety: On‑device classifier scores mood + self‑harm risk.
Path B – Knowledge: Sentence‑transformer embeds the text; FAISS returns the top 3 CBT docs.Prompt Builder
Combines: therapist persona + retrieved docs + chat history + risk tags → final prompt string.llama.cpp
Loads the 7 B GGUF model from themodels/folder and streams a response.Safety Filter & Logger
Censors any forbidden content, then appends both user and bot turns to a JSONL file.UI → User
The cleaned answer streams back to the chat page.
5. How to Run It (quick start)
# 1. clone my repo
git clone https://github.com/vidit-gupta/mentalAI.git
cd mentalAI
# 2. launch everything
docker compose up -d # add --build-arg GPU=1 if you have an NVIDIA card
# 3. open in browser
http://localhost:7860 # chat interface
First launch downloads a 4 GB model—grab coffee while it finishes.
Shutdown:
docker compose down
6. Component Details
| Layer | My choices | Why |
| LLM Engine | llama.cpp running Llama‑2‑Chat‑7B‑Q4_K_M | Tiny enough for laptops but still fluent. |
| Embeddings | all‑MiniLM‑L6‑v2 (8‑bit) | Fast 768‑D vectors, only 40 MB RAM. |
| Vector DB | FAISS through ChromaDB | Blazing‑fast local ANN search. |
| Safety Model | Distil‑RoBERTa fine‑tuned on GoEmotions + CLPsych | 45 MB; ≈4 ms per message on CPU. |
| UI | Gradio | Zero‑config, easy to customise. |
| API | FastAPI + Uvicorn | Async, OpenAPI docs auto‑generated. |
7. Data & Privacy Notes
No cloud calls after first download. Inspect with Wireshark—only
localhost.Erase knowledge by deleting
vectorstore/.Erase chat history by deleting any file in
logs/.The Llama model file is read‑only; it never logs anything itself.
8. Demo Walk‑Through
Ask: “I’m nervous about public speaking. Help?”
Watch retrieved doc titles in the server log (e.g., ‘5‑Step Grounding Exercise’).
See the bot respond with breathing tips + CBT reframing.
Delete
vectorstore/, restart, ask again → reply lacks doc excerpts (proof it’s local).
9. Extending My Project
| Want to… | Do this |
| Use a bigger model | Replace models/7B.gguf with a 13 B file; bump RAM/GPU. |
| Add your own PDFs | Drop them in docs/ and run python ingest.py. |
| Encrypt logs | Mount logs/ to an encrypted volume (e.g., VeraCrypt). |
| Multi‑user auth | Enable FastAPI OAuth; store user creds in SQLite. |
| Mobile front‑end | Keep FastAPI; build a Flutter app that hits /chat. |
10. Known Limitations
Not a licensed therapist—include a disclaimer in production.
LLM may hallucinate; retrieval helps but doesn’t fix everything.
Slow on very weak CPUs (Raspberry Pi won’t cut it).
Safety classifier can mis‑read sarcasm; adjust the risk threshold if needed.
11. License & Credits
Code: MIT License.
Model weights: subject to Meta’s Llama 2 community license.
Embedded docs: Public‑domain WHO materials + DBT handouts (permitted for educational use).

