๐ค AI Sprint Manager โ OpenEnv
A reinforcement learning environment where an AI agent acts as a Tech Lead managing agile software sprints.
๐ฏ What Is This?
Modern software teams spend enormous time on sprint planning decisions:
- Which developer gets which task?
- What do you do when someone goes sick mid-sprint?
- How do you handle an urgent production bug that appears on day 5?
This environment simulates these real-world decisions so an AI agent can learn optimal sprint management strategies through reinforcement learning.
The agent plays the role of a Tech Lead. Each step it observes the full sprint state (tasks, developers, workloads, deadlines) and takes an action. The environment responds with a reward signal that guides learning.
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RL Agent / LLM / Training Loop โ
โ (uses client.py) โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP reset / step / state
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Server (port 7860) โ
โ /reset /step /state /health โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Sprint Environment (core logic) โ
โ โข Task/developer simulation โ
โ โข Reward calculation โ
โ โข Random events (bugs, absences) โ
โ โข 3 graders: easy / medium / hard โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ data loaded from
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ data/sprint_data.json โ
โ (customizable โ bring your own data!) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฎ Live Demo
- Select a sprint scenario (easy / medium / hard)
- Click ๐ Reset Sprint
- Use the Skill โ Dev Guide to assign tasks correctly
- Or click ๐ค Auto-Assign All to let the system decide
- Watch the reward history and task status update in real time
๐ Action Space
| Field | Type | Values |
|---|---|---|
action_type |
string | assign, reassign, reprioritize, unblock, skip |
task_id |
string | Task ID e.g. "T1", "T6" |
dev_id |
string | Developer ID e.g. "dev1", "dev3" |
new_priority |
int | 1โ5 (1=highest, for reprioritize only) |
๐ Observation Space
| Field | Type | Description |
|---|---|---|
current_day |
int | Day in sprint (1โ10) |
sprint_length |
int | Total sprint length |
developers |
list | Each dev's skill, capacity, load, tasks, availability |
tasks |
list | Each task's type, priority, effort, deadline, status, progress |
reward |
float | Step reward |
cumulative_reward |
float | Total reward this episode |
tasks_completed/missed/in_progress/backlog |
int | Status counts |
workload_balance_score |
float | 0=unbalanced, 1=perfect |
events |
list | Events that just happened (completions, misses, absences) |
done |
bool | Whether episode is complete |
๐ฏ Tasks (Scenarios)
| ID | Difficulty | Devs | Tasks | Random Events |
|---|---|---|---|---|
easy_sprint |
๐ข Easy | 3 | 5 | None |
medium_sprint |
๐ก Medium | 4 | 8 | Dev absences, bugs expire |
hard_sprint |
๐ด Hard | 5 | 12 | Urgent bugs mid-sprint, cascading failures |
Baseline Scores (meta-llama/Llama-3.1-8B-Instruct)
| Task | Score |
|---|---|
easy_sprint |
0.01 |
medium_sprint |
0.46 โโโโโโโโ |
hard_sprint |
0.01 |
| Average | 0.16 |
๐ฐ Reward Function
| Event | Reward |
|---|---|
| Assign task (skill match) | +0.8 to +1.3 |
| Assign task (skill mismatch penalty) | +0.1 to +0.6 |
| Wrong skill / over capacity | -0.15 |
| Task completed on time | +0.5 to +2.5 |
| Task completed late | +0.1 |
| Task missed deadline | -0.3 to -1.5 |
| Urgent bug missed | -0.25 extra |
| Skip (no action) | -0.05 |
| Final score bonus | score ร 10.0 |
๐ API Reference
# Health check
GET /health โ {"status": "ok", "env": "ai-sprint-manager"}
# Start new episode
POST /reset
Body: {"task_name": "easy_sprint", "seed": 42}
# Take one action
POST /step
Body: {"action": {"action_type": "assign", "task_id": "T1", "dev_id": "dev1"}}
# Get full state
GET /state
# List scenarios
GET /tasks
๐ Python Client Usage
from client import SprintEnvClient
from sprint_env.models import SprintAction
# Connect to live Space
with SprintEnvClient(base_url="https://sejal-k-ai-sprint-manager.hf.space") as env:
# Reset
obs = env.reset(task_name="medium_sprint", seed=42)
# Agent loop
while not obs["done"]:
action = SprintAction(
action_type="assign",
task_id="T1",
dev_id="dev1",
)
result = env.step(action)
print(result) # StepResult(reward=+1.20, done=False, day=2, completed=0)
obs = result.observation
๐๏ธ Project Structure
ai-sprint-manager-openenv/
โโโ openenv.yaml # OpenEnv spec metadata
โโโ pyproject.toml # Project dependencies
โโโ Dockerfile # Container definition
โโโ requirements.txt # Python dependencies
โโโ inference.py # Baseline LLM agent script
โโโ client.py # Typed Python client (for RL training)
โโโ ui.py # Gradio UI + FastAPI combined server
โโโ start.sh # Container startup script
โ
โโโ data/
โ โโโ sprint_data.json # All scenario data (customizable!)
โ
โโโ sprint_env/
โ โโโ __init__.py
โ โโโ models.py # Pydantic Action/Observation/State
โ โโโ tasks.py # Task & Developer dataclasses
โ โโโ environment.py # Core RL environment logic
โ โโโ graders.py # Scoring functions (easy/medium/hard)
โ โโโ data_loader.py # JSON data loader with caching
โ
โโโ server/
โโโ __init__.py
โโโ app.py # OpenEnv-compliant FastAPI server entry
๐ง Bring Your Own Data
Don't want to use our sample scenarios? Edit data/sprint_data.json:
{
"scenarios": {
"my_custom_sprint": {
"description": "My team's actual sprint",
"difficulty": "medium",
"developers": [
{"id": "dev1", "name": "Your Name", "skill": "backend", "capacity": 5, "productivity": 1.0}
],
"tasks": [
{"id": "T1", "name": "Your Task", "task_type": "feature", "priority": 1,
"effort": 3, "deadline": 5, "required_skill": "backend"}
]
}
}
}
Or point to your own file:
export SPRINT_DATA_PATH=/path/to/your/data.json
python ui.py
๐ Setup & Run
# Clone
git clone https://github.com/sejalsksagar/ai-sprint-manager-openenv.git
cd ai-sprint-manager-openenv
# Install
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env with your HF_TOKEN
# Run locally
python ui.py
# Open http://localhost:7860
# Docker
docker build -t ai-sprint-manager .
docker run -p 7860:7860 ai-sprint-manager
# Run inference
python inference.py
๐ค Can an RL Agent Learn From This?
Yes. The environment is designed for policy gradient training (GRPO, PPO):
# Example training loop skeleton (TRL/GRPO compatible)
from client import SprintEnvClient
from sprint_env.models import SprintAction
env = SprintEnvClient(base_url="http://localhost:7860")
for episode in range(1000):
obs = env.reset(task_name="medium_sprint")
trajectory = []
while not obs["done"]:
action = policy.sample(obs) # your policy here
result = env.step(action)
trajectory.append((obs, action, result.reward))
obs = result.observation
policy.update(trajectory) # GRPO/PPO update
The shaped reward function provides learning signal at every step โ not just at episode end โ which is critical for efficient RL training.
๐ฅ Team
Built for the Meta PyTorch OpenEnv Hackathon x SST | India AI Hackathon '26