๐Ÿค– AI Sprint Manager โ€” OpenEnv

A reinforcement learning environment where an AI agent acts as a Tech Lead managing agile software sprints.


๐ŸŽฏ What Is This?

Modern software teams spend enormous time on sprint planning decisions:

  • Which developer gets which task?
  • What do you do when someone goes sick mid-sprint?
  • How do you handle an urgent production bug that appears on day 5?

This environment simulates these real-world decisions so an AI agent can learn optimal sprint management strategies through reinforcement learning.

The agent plays the role of a Tech Lead. Each step it observes the full sprint state (tasks, developers, workloads, deadlines) and takes an action. The environment responds with a reward signal that guides learning.


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     RL Agent / LLM / Training Loop      โ”‚
โ”‚         (uses client.py)                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚ HTTP  reset / step / state
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         FastAPI Server (port 7860)      โ”‚
โ”‚    /reset  /step  /state  /health       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      Sprint Environment (core logic)    โ”‚
โ”‚  โ€ข Task/developer simulation            โ”‚
โ”‚  โ€ข Reward calculation                   โ”‚
โ”‚  โ€ข Random events (bugs, absences)       โ”‚
โ”‚  โ€ข 3 graders: easy / medium / hard      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚ data loaded from
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      data/sprint_data.json              โ”‚
โ”‚  (customizable โ€” bring your own data!)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽฎ Live Demo

  1. Select a sprint scenario (easy / medium / hard)
  2. Click ๐Ÿ”„ Reset Sprint
  3. Use the Skill โ†’ Dev Guide to assign tasks correctly
  4. Or click ๐Ÿค– Auto-Assign All to let the system decide
  5. Watch the reward history and task status update in real time

๐Ÿ“ Action Space

Field Type Values
action_type string assign, reassign, reprioritize, unblock, skip
task_id string Task ID e.g. "T1", "T6"
dev_id string Developer ID e.g. "dev1", "dev3"
new_priority int 1โ€“5 (1=highest, for reprioritize only)

๐Ÿ“Š Observation Space

Field Type Description
current_day int Day in sprint (1โ€“10)
sprint_length int Total sprint length
developers list Each dev's skill, capacity, load, tasks, availability
tasks list Each task's type, priority, effort, deadline, status, progress
reward float Step reward
cumulative_reward float Total reward this episode
tasks_completed/missed/in_progress/backlog int Status counts
workload_balance_score float 0=unbalanced, 1=perfect
events list Events that just happened (completions, misses, absences)
done bool Whether episode is complete

๐ŸŽฏ Tasks (Scenarios)

ID Difficulty Devs Tasks Random Events
easy_sprint ๐ŸŸข Easy 3 5 None
medium_sprint ๐ŸŸก Medium 4 8 Dev absences, bugs expire
hard_sprint ๐Ÿ”ด Hard 5 12 Urgent bugs mid-sprint, cascading failures

Baseline Scores (meta-llama/Llama-3.1-8B-Instruct)

Task Score
easy_sprint 0.01
medium_sprint 0.46 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
hard_sprint 0.01
Average 0.16

๐Ÿ’ฐ Reward Function

Event Reward
Assign task (skill match) +0.8 to +1.3
Assign task (skill mismatch penalty) +0.1 to +0.6
Wrong skill / over capacity -0.15
Task completed on time +0.5 to +2.5
Task completed late +0.1
Task missed deadline -0.3 to -1.5
Urgent bug missed -0.25 extra
Skip (no action) -0.05
Final score bonus score ร— 10.0

๐Ÿ”Œ API Reference

# Health check
GET /health โ†’ {"status": "ok", "env": "ai-sprint-manager"}

# Start new episode
POST /reset
Body: {"task_name": "easy_sprint", "seed": 42}

# Take one action
POST /step
Body: {"action": {"action_type": "assign", "task_id": "T1", "dev_id": "dev1"}}

# Get full state
GET /state

# List scenarios
GET /tasks

๐Ÿ Python Client Usage

from client import SprintEnvClient
from sprint_env.models import SprintAction

# Connect to live Space
with SprintEnvClient(base_url="https://sejal-k-ai-sprint-manager.hf.space") as env:
    # Reset
    obs = env.reset(task_name="medium_sprint", seed=42)

    # Agent loop
    while not obs["done"]:
        action = SprintAction(
            action_type="assign",
            task_id="T1",
            dev_id="dev1",
        )
        result = env.step(action)
        print(result)  # StepResult(reward=+1.20, done=False, day=2, completed=0)
        obs = result.observation

๐Ÿ—‚๏ธ Project Structure

ai-sprint-manager-openenv/
โ”œโ”€โ”€ openenv.yaml              # OpenEnv spec metadata
โ”œโ”€โ”€ pyproject.toml            # Project dependencies
โ”œโ”€โ”€ Dockerfile                # Container definition
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ inference.py              # Baseline LLM agent script
โ”œโ”€โ”€ client.py                 # Typed Python client (for RL training)
โ”œโ”€โ”€ ui.py                     # Gradio UI + FastAPI combined server
โ”œโ”€โ”€ start.sh                  # Container startup script
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ sprint_data.json      # All scenario data (customizable!)
โ”‚
โ”œโ”€โ”€ sprint_env/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ models.py             # Pydantic Action/Observation/State
โ”‚   โ”œโ”€โ”€ tasks.py              # Task & Developer dataclasses
โ”‚   โ”œโ”€โ”€ environment.py        # Core RL environment logic
โ”‚   โ”œโ”€โ”€ graders.py            # Scoring functions (easy/medium/hard)
โ”‚   โ””โ”€โ”€ data_loader.py        # JSON data loader with caching
โ”‚
โ””โ”€โ”€ server/
    โ”œโ”€โ”€ __init__.py
    โ””โ”€โ”€ app.py                # OpenEnv-compliant FastAPI server entry

๐Ÿ”ง Bring Your Own Data

Don't want to use our sample scenarios? Edit data/sprint_data.json:

{
  "scenarios": {
    "my_custom_sprint": {
      "description": "My team's actual sprint",
      "difficulty": "medium",
      "developers": [
        {"id": "dev1", "name": "Your Name", "skill": "backend", "capacity": 5, "productivity": 1.0}
      ],
      "tasks": [
        {"id": "T1", "name": "Your Task", "task_type": "feature", "priority": 1,
         "effort": 3, "deadline": 5, "required_skill": "backend"}
      ]
    }
  }
}

Or point to your own file:

export SPRINT_DATA_PATH=/path/to/your/data.json
python ui.py

๐Ÿš€ Setup & Run

# Clone
git clone https://github.com/sejalsksagar/ai-sprint-manager-openenv.git
cd ai-sprint-manager-openenv

# Install
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your HF_TOKEN

# Run locally
python ui.py
# Open http://localhost:7860

# Docker
docker build -t ai-sprint-manager .
docker run -p 7860:7860 ai-sprint-manager

# Run inference
python inference.py

๐Ÿค– Can an RL Agent Learn From This?

Yes. The environment is designed for policy gradient training (GRPO, PPO):

# Example training loop skeleton (TRL/GRPO compatible)
from client import SprintEnvClient
from sprint_env.models import SprintAction

env = SprintEnvClient(base_url="http://localhost:7860")

for episode in range(1000):
    obs = env.reset(task_name="medium_sprint")
    trajectory = []

    while not obs["done"]:
        action = policy.sample(obs)           # your policy here
        result = env.step(action)
        trajectory.append((obs, action, result.reward))
        obs = result.observation

    policy.update(trajectory)                 # GRPO/PPO update

The shaped reward function provides learning signal at every step โ€” not just at episode end โ€” which is critical for efficient RL training.


๐Ÿ‘ฅ Team

Built for the Meta PyTorch OpenEnv Hackathon x SST | India AI Hackathon '26

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading