MajorProjectAI – Offline Therapist Chatbot

1. Why I Built This

During my work on SaaS automation tools I noticed that most “AI therapy” chatbots send every keystroke to the cloud. That’s a privacy red‑flag—especially when users are sharing their deepest worries.
So I decided to create MentalAI, an open‑source project that:

Runs 100 % on the user’s own PC (no internet after first install).
Answers like a CBT‑trained psychologist using a compact Llama‑based model.
Detects self‑harm language and injects crisis resources automatically.
Retrieves evidence‑based coping tips from a local document store.

I uploaded the full source code and a one‑click Docker stack to GitHub so anyone can clone, study, or improve it.

2. Feature Highlights

✔ Feature	What it means for users
Offline LLM	A 7 B‑parameter Llama model (GGUF‑Q4) runs on CPU or GPU—no cloud calls.
Built‑in safety layer	Distil‑RoBERTa classifiers flag extreme distress and inject helpline info.
Local knowledge base	650 CBT / DBT handouts embedded with MiniLM + FAISS; instantly retrieved.
Gradio chat UI	Clean chat page at `localhost:7860`, streams tokens in real time.
JSONL chat logs	Every turn saved in `/logs/` for later analysis (or you can delete them).
Plug‑and‑play Docker	`docker compose up -d` spins up API, vector DB, UI, and LLM in one shot.

3. Repository Layout

mentalAI/
│
├─ docker-compose.yml        # one file, three services
├─ src/
│   ├─ api/                  # FastAPI routes
│   ├─ chains/               # LangChain logic
│   ├─ safety/               # classifiers + filters
│   └─ ui/                   # Gradio interface
│
├─ models/                   # Llama GGUF weights (auto‑download)
├─ vectorstore/              # FAISS + Chroma files (auto‑built)
├─ logs/                     # chat transcripts (.jsonl)
└─ docs/                     # CBT PDFs (feedstock for embeddings)

4. End‑to‑End Architecture (words version)

User → Chat UI
Your browser (on localhost) sends the message to FastAPI.
FastAPI → LangChain
Acts like a traffic officer, forwarding the text to my LangChain “conversation” object.
Parallel fork in LangChain
Path A – Safety: On‑device classifier scores mood + self‑harm risk.
Path B – Knowledge: Sentence‑transformer embeds the text; FAISS returns the top 3 CBT docs.
Prompt Builder
Combines: therapist persona + retrieved docs + chat history + risk tags → final prompt string.
llama.cpp
Loads the 7 B GGUF model from the models/ folder and streams a response.
Safety Filter & Logger
Censors any forbidden content, then appends both user and bot turns to a JSONL file.
UI → User
The cleaned answer streams back to the chat page.

5. How to Run It (quick start)

# 1. clone my repo
git clone https://github.com/vidit-gupta/mentalAI.git
cd mentalAI

# 2. launch everything
docker compose up -d          # add --build-arg GPU=1 if you have an NVIDIA card

# 3. open in browser
http://localhost:7860          # chat interface

First launch downloads a 4 GB model—grab coffee while it finishes.

Shutdown:

docker compose down

6. Component Details

Layer	My choices	Why
LLM Engine	`llama.cpp` running Llama‑2‑Chat‑7B‑Q4_K_M	Tiny enough for laptops but still fluent.
Embeddings	`all‑MiniLM‑L6‑v2` (8‑bit)	Fast 768‑D vectors, only 40 MB RAM.
Vector DB	FAISS through ChromaDB	Blazing‑fast local ANN search.
Safety Model	Distil‑RoBERTa fine‑tuned on GoEmotions + CLPsych	45 MB; ≈4 ms per message on CPU.
UI	Gradio	Zero‑config, easy to customise.
API	FastAPI + Uvicorn	Async, OpenAPI docs auto‑generated.

7. Data & Privacy Notes

No cloud calls after first download. Inspect with Wireshark—only localhost.
Erase knowledge by deleting vectorstore/.
Erase chat history by deleting any file in logs/.
The Llama model file is read‑only; it never logs anything itself.

8. Demo Walk‑Through

Ask: “I’m nervous about public speaking. Help?”
Watch retrieved doc titles in the server log (e.g., ‘5‑Step Grounding Exercise’).
See the bot respond with breathing tips + CBT reframing.
Delete vectorstore/, restart, ask again → reply lacks doc excerpts (proof it’s local).

9. Extending My Project

Want to…	Do this
Use a bigger model	Replace `models/7B.gguf` with a 13 B file; bump RAM/GPU.
Add your own PDFs	Drop them in `docs/` and run `python` `ingest.py`.
Encrypt logs	Mount `logs/` to an encrypted volume (e.g., VeraCrypt).
Multi‑user auth	Enable FastAPI OAuth; store user creds in SQLite.
Mobile front‑end	Keep FastAPI; build a Flutter app that hits `/chat`.

10. Known Limitations

Not a licensed therapist—include a disclaimer in production.
LLM may hallucinate; retrieval helps but doesn’t fix everything.
Slow on very weak CPUs (Raspberry Pi won’t cut it).
Safety classifier can mis‑read sarcasm; adjust the risk threshold if needed.

11. License & Credits

Code: MIT License.
Model weights: subject to Meta’s Llama 2 community license.
Embedded docs: Public‑domain WHO materials + DBT handouts (permitted for educational use).

MajorProjectAI – Offline Therapist Chatbot

1. Why I Built This

2. Feature Highlights

3. Repository Layout

4. End‑to‑End Architecture (words version)

5. How to Run It (quick start)

6. Component Details

7. Data & Privacy Notes

8. Demo Walk‑Through

9. Extending My Project

10. Known Limitations

11. License & Credits

Comments

More from this blog

Blog Post: The Challenges of Building a Full-Stack To-Do List App Using MERN

MERN-Stack Setup in vs code for ubuntu (linux)

Command Palette

1. Why I Built This

2. Feature Highlights

3. Repository Layout

4. End‑to‑End Architecture (words version)

5. How to Run It (quick start)

6. Component Details

7. Data & Privacy Notes

8. Demo Walk‑Through

9. Extending My Project

10. Known Limitations

11. License & Credits

Comments

More from this blog

1. Why I Built This

2. Feature Highlights

3. Repository Layout

4. End‑to‑End Architecture (words version)

5. How to Run It (quick start)

6. Component Details

7. Data & Privacy Notes

8. Demo Walk‑Through

9. Extending My Project

10. Known Limitations

11. License & Credits