PDF-Assistant / README.md
absiitr's picture
Update README.md
ef5cc47 verified

A newer version of the Streamlit SDK is available: 1.52.2

Upgrade
metadata
title: PDF Chatbot
emoji: πŸ“˜
colorFrom: blue
colorTo: purple
sdk: streamlit
app_port: 7860
pinned: false
license: mit

πŸ“˜ PDF RAG Chatbot (Groq + LangChain)

A Retrieval-Augmented Generation (RAG) application that allows users to:

  • Upload a PDF
  • Ask questions based only on the PDF content
  • Get accurate answers powered by Groq LLMs
  • Runs fully on CPU (Hugging Face Free Tier)

πŸš€ Features

  • πŸ“„ PDF upload & processing
  • βœ‚οΈ Intelligent text chunking
  • πŸ” Semantic search using embeddings
  • 🧠 Context-aware LLM responses
  • 🧹 Memory clear & health endpoints
  • ⚑ Fast inference via Groq API

🧱 Tech Stack

  • Frontend: Streamlit
  • Backend: FastAPI
  • LLM: Groq (llama-3.1-8b-instant)
  • Embeddings: all-MiniLM-L6-v2
  • Vector DB: Chroma (in-memory)
  • Frameworks: LangChain
  • Deployment: Docker + Hugging Face Spaces

πŸ§ͺ How It Works (RAG Pipeline)

  1. Upload PDF
  2. Split text into chunks
  3. Generate embeddings
  4. Store in vector database
  5. Retrieve relevant chunks
  6. Generate answer using Groq LLM

πŸ–₯️ Usage

  1. Upload a PDF file
  2. Ask questions related to the document
  3. If the answer is not in the PDF, the assistant will reply:

    "I cannot find this in the PDF."


πŸ” Environment Variables

The following secret must be added in Hugging Face Spaces:

Variable Description
GROQ_API_KEY Groq API key

⚠️ Do NOT commit .env files to the repository.


❀️ Notes

  • Runs on CPU only (no GPU required)
  • Free-tier friendly
  • First load may take a few minutes
  • Space may sleep when idle

πŸ‘¨β€πŸ’» Author

Abhishek Saxena
M.Tech Data Science, IIT Roorkee


⭐ If you like this project

Give it a ⭐ on Hugging Face and feel free to fork!