Sentiment-Analysis / README.md
Faffio's picture
Added Continuos retraining
aef5ea1
|
raw
history blame
5.89 kB
metadata
title: Reputation Monitor
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860

πŸ“Š End-to-End MLOps Pipeline for Real-Time Reputation Monitoring

Build Status Python Model Deployment License

πŸš€ Project Overview

MachineInnovators Inc. focuses on scalable, production-ready machine learning applications. This project is a comprehensive MLOps solution designed to monitor online company reputation through automated sentiment analysis of real-time news.

Unlike standard static notebooks, this repository demonstrates a full-cycle ML workflow. The system scrapes live data from Google News, analyzes sentiment using a RoBERTa Transformer model, and visualizes insights via an interactive dashboard, all orchestrate within a Dockerized environment.

Key Features

  • Real-Time Data Ingestion: Automated scraping of Google News for target brand keywords.
  • State-of-the-Art NLP: Utilizes twitter-roberta-base-sentiment for high-accuracy classification.
  • Full-Stack Architecture: Integrates a FastAPI backend for inference and a Streamlit frontend for visualization in a single container.
  • CI/CD Automation: Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
  • Embedded Monitoring: Basic logging system to track model predictions and sentiment distribution over time.

πŸ› οΈ Tech Stack & Tools

  • Core: Python 3.9+
  • Machine Learning: Hugging Face Transformers, PyTorch, Scikit-learn.
  • Backend: FastAPI, Uvicorn (REST API).
  • Frontend: Streamlit (Interactive Dashboard).
  • Data Ingestion: GoogleNews library (Real-time scraping).
  • DevOps: Docker, GitHub Actions (CI/CD).
  • Deployment: Hugging Face Spaces (Docker SDK).

βš™οΈ Architecture & MLOps Workflow

The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery:

  1. Data & Modeling:

    • Input: Real-time news titles and descriptions fetched dynamically.
    • Model: Pre-trained RoBERTa model optimized for social media and short-text sentiment.
  2. Containerization (Docker):

    • The application is containerized using a custom Dockerfile.
    • Implements a custom entrypoint.sh script to run both the FastAPI backend (port 8000) and Streamlit frontend (port 7860) simultaneously.
  3. CI/CD Pipeline (GitHub Actions):

    • Trigger: Pushes to the main branch.
    • Test: Executes pytest suite to verify API endpoints (/health, /analyze) and model loading.
    • Build: Verifies Docker image creation.
    • Deploy: Automatically pushes the validated code to Hugging Face Spaces.
  4. Monitoring:

    • The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard.

πŸ“‚ Repository Structure

β”œβ”€β”€ .github/workflows/   # CI/CD configurations (GitHub Actions)
β”œβ”€β”€ app/                 # Backend Application Code
β”‚   β”œβ”€β”€ api/             # FastAPI endpoints (main.py)
β”‚   β”œβ”€β”€ model/           # Model loader logic (RoBERTa)
β”‚   └── services/        # Google News scraping logic
β”œβ”€β”€ streamlit_app/       # Frontend Application Code (app.py)
β”œβ”€β”€ src/                 # Training simulation scripts
β”œβ”€β”€ tests/               # Unit and integration tests (Pytest)
β”œβ”€β”€ Dockerfile           # Container configuration
β”œβ”€β”€ entrypoint.sh        # Startup script for dual-process execution
β”œβ”€β”€ requirements.txt     # Project dependencies
└── README.md            # Project documentation

πŸ’» Installation & Usage
To run this project locally using Docker (Recommended):

1. Clone the repository
Bash

git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
cd SentimentAnalysis
2. Build the Docker Image
Bash

docker build -t reputation-monitor .
3. Run the Container
Bash

docker run -p 7860:7860 reputation-monitor
Access the application at http://localhost:7860

Manual Installation (No Docker)
If you prefer running it directly with Python:

Install dependencies:

Bash

pip install -r requirements.txt
Start the Backend (FastAPI):

Bash

uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
Start the Frontend (Streamlit) in a new terminal:

Bash

streamlit run streamlit_app/app.py
⚠️ Limitations & Future Roadmap
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.

Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing.

Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning.

🀝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ License
Distributed under the MIT License. See LICENSE for more information.

### πŸ‘€ Author

**[Fabio Celaschi]**
* [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/fabio-celaschi-4371bb92)
* [![Instagram](https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white)](https://www.instagram.com/fabiocelaschi/)