Spaces:
Running
Running
metadata
title: Reputation Monitor
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
π End-to-End MLOps Pipeline for Real-Time Reputation Monitoring
π Project Overview
MachineInnovators Inc. focuses on scalable, production-ready machine learning applications. This project is a comprehensive MLOps solution designed to monitor online company reputation through automated sentiment analysis of real-time news.
Unlike standard static notebooks, this repository demonstrates a full-cycle ML workflow. The system scrapes live data from Google News, analyzes sentiment using a RoBERTa Transformer model, and visualizes insights via an interactive dashboard, all orchestrate within a Dockerized environment.
Key Features
- Real-Time Data Ingestion: Automated scraping of Google News for target brand keywords.
- State-of-the-Art NLP: Utilizes
twitter-roberta-base-sentimentfor high-accuracy classification. - Full-Stack Architecture: Integrates a FastAPI backend for inference and a Streamlit frontend for visualization in a single container.
- CI/CD Automation: Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
- Embedded Monitoring: Basic logging system to track model predictions and sentiment distribution over time.
π οΈ Tech Stack & Tools
- Core: Python 3.9+
- Machine Learning: Hugging Face Transformers, PyTorch, Scikit-learn.
- Backend: FastAPI, Uvicorn (REST API).
- Frontend: Streamlit (Interactive Dashboard).
- Data Ingestion:
GoogleNewslibrary (Real-time scraping). - DevOps: Docker, GitHub Actions (CI/CD).
- Deployment: Hugging Face Spaces (Docker SDK).
βοΈ Architecture & MLOps Workflow
The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery:
Data & Modeling:
- Input: Real-time news titles and descriptions fetched dynamically.
- Model: Pre-trained RoBERTa model optimized for social media and short-text sentiment.
Containerization (Docker):
- The application is containerized using a custom
Dockerfile. - Implements a custom
entrypoint.shscript to run both the FastAPI backend (port 8000) and Streamlit frontend (port 7860) simultaneously.
- The application is containerized using a custom
CI/CD Pipeline (GitHub Actions):
- Trigger: Pushes to the
mainbranch. - Test: Executes
pytestsuite to verify API endpoints (/health,/analyze) and model loading. - Build: Verifies Docker image creation.
- Deploy: Automatically pushes the validated code to Hugging Face Spaces.
- Trigger: Pushes to the
Monitoring:
- The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard.
π Repository Structure
βββ .github/workflows/ # CI/CD configurations (GitHub Actions)
βββ app/ # Backend Application Code
β βββ api/ # FastAPI endpoints (main.py)
β βββ model/ # Model loader logic (RoBERTa)
β βββ services/ # Google News scraping logic
βββ streamlit_app/ # Frontend Application Code (app.py)
βββ src/ # Training simulation scripts
βββ tests/ # Unit and integration tests (Pytest)
βββ Dockerfile # Container configuration
βββ entrypoint.sh # Startup script for dual-process execution
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
π» Installation & Usage
To run this project locally using Docker (Recommended):
1. Clone the repository
Bash
git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
cd SentimentAnalysis
2. Build the Docker Image
Bash
docker build -t reputation-monitor .
3. Run the Container
Bash
docker run -p 7860:7860 reputation-monitor
Access the application at http://localhost:7860
Manual Installation (No Docker)
If you prefer running it directly with Python:
Install dependencies:
Bash
pip install -r requirements.txt
Start the Backend (FastAPI):
Bash
uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
Start the Frontend (Streamlit) in a new terminal:
Bash
streamlit run streamlit_app/app.py
β οΈ Limitations & Future Roadmap
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing.
Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning.
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
π License
Distributed under the MIT License. See LICENSE for more information.
### π€ Author
**[Fabio Celaschi]**
* [](https://www.linkedin.com/in/fabio-celaschi-4371bb92)
* [](https://www.instagram.com/fabiocelaschi/)