YOLOv8s-Seg for Assistive Indoor Navigation (Humans & Chairs)

Overview

This repository provides an Ultralytics YOLOv8s-Seg instance segmentation model trained to detect and segment humans and chairs in indoor environments. The model is intended as a perception component for an AI-powered smart indoor navigation system for blind and visually impaired users, where accurate understanding of object boundaries can support safer navigation and more reliable accessibility cues.

The repository includes a reproducible inference pipeline for images and videos, and an optional Hugging Face Spaces-ready Gradio demo. For videos, an online tracker is used to maintain chair identities across frames and to compute a real-time “best available chair” recommendation based on confidence, distance proxy, and temporal stability.

Motivation

Indoor navigation in cluttered spaces benefits from more than bounding boxes: navigation and accessibility cues often require object extent and free-space reasoning. Chairs are a common indoor affordance (sitting and resting), but selecting a usable chair requires consistent identification over time and a notion of proximity and stability. This model targets these requirements by combining segmentation with tracking and a chair scoring mechanism.

Model Architecture

Base model: Ultralytics YOLOv8s-Seg
Task: Instance segmentation (with detection outputs)
Framework: Ultralytics YOLO

YOLOv8s-Seg performs object detection and segmentation in a single forward pass. The “s” variant provides a favorable trade-off between real-time throughput and accuracy, which is relevant for assistive indoor applications.

Why Segmentation (Not Only Detection)

Segmentation is preferred over pure detection for indoor assistive navigation because:

Pixel-level boundaries better capture the real occupied area of objects (e.g., chair legs, partially occluded humans).
More reliable spatial reasoning: masks enable estimating relative size/area and mask overlap, which can improve heuristics for distance and free-space inference.
Improved robustness in clutter: indoor scenes often involve occlusion and overlapping objects; segmentation can provide more informative geometry than boxes alone.

Intended Use

Indoor perception module for assistive navigation research.
Real-time chair selection cueing in videos.

Out of Scope

This model is not a complete navigation system. It does not provide global localization, mapping, path planning, depth sensing, or haptic/audio feedback.

Installation

python -m pip install --upgrade pip
pip install -r requirements.txt

Inference

1) Python (image)

from inference import run_image

run_image(
    model_path="yolov8s-seg.pt",
    source="path/to/image.jpg",
    save_dir="runs/image_demo",
    conf=0.25,
)

2) Python (video)

from inference import run_video

run_video(
    model_path="yolov8s-seg.pt",
    source="path/to/video.mp4",
    save_path="runs/video_demo.mp4",
    conf=0.25,
    track=True,
)

3) CLI

python inference.py --model yolov8s-seg.pt --source path/to/image.jpg --save-dir runs/img
python inference.py --model yolov8s-seg.pt --source path/to/video.mp4 --save-path runs/out.mp4 --track

Chair Tracking and “Best Chair” Scoring

For video input, the repository uses a lightweight online tracker to maintain consistent chair IDs across frames. Each chair track receives a score composed of:

Confidence: segmentation/detection confidence from YOLO.
Distance proxy: based on mask (or box) size; larger apparent area is treated as closer.
Temporal stability: rewards tracks that persist over time with consistent observations.

The system outputs the current best available chair (highest score) per frame and overlays it on the visualization.

Limitations

Distance estimation is approximate: without calibrated depth, distance is inferred using 2D size heuristics.
Domain shift: performance may degrade under unusual lighting, camera lenses, extreme occlusions, or non-standard chair designs.
Occlusion and crowded scenes: heavy overlap may lead to fragmented masks or identity switches.
Safety-critical use: this research prototype should not be used as the sole safety mechanism.

Future Work

Integrate true depth sensing (stereo / ToF / monocular depth) for metric distance.
Use re-identification or stronger tracking for long-term consistency.
Add affordance/occupancy reasoning (e.g., chair is occupied, blocked, or reachable).
Evaluate fairness/robustness across diverse indoor settings and camera viewpoints.

License

Code: MIT (see LICENSE)
Model weights: released under AGPL-3.0 to align with Ultralytics defaults (update this section if you have a different license for your trained weights and dataset).

Citation

If you use this repository in academic work, cite Ultralytics YOLOv8 and include your thesis/project citation as appropriate.

@software{ultralytics_yolov8,
  title  = {Ultralytics YOLOv8},
  author = {Jocher, Glenn and Ultralytics},
  year   = {2023},
  url    = {https://github.com/ultralytics/ultralytics}
}

Downloads last month: 17