YOLOv8s-Seg for Assistive Indoor Navigation (Humans & Chairs)

Overview

This repository provides an Ultralytics YOLOv8s-Seg instance segmentation model trained to detect and segment humans and chairs in indoor environments. The model is intended as a perception component for an AI-powered smart indoor navigation system for blind and visually impaired users, where accurate understanding of object boundaries can support safer navigation and more reliable accessibility cues.

The repository includes a reproducible inference pipeline for images and videos, and an optional Hugging Face Spaces-ready Gradio demo. For videos, an online tracker is used to maintain chair identities across frames and to compute a real-time “best available chair” recommendation based on confidence, distance proxy, and temporal stability.

Motivation

Indoor navigation in cluttered spaces benefits from more than bounding boxes: navigation and accessibility cues often require object extent and free-space reasoning. Chairs are a common indoor affordance (sitting and resting), but selecting a usable chair requires consistent identification over time and a notion of proximity and stability. This model targets these requirements by combining segmentation with tracking and a chair scoring mechanism.

Model Architecture

  • Base model: Ultralytics YOLOv8s-Seg
  • Task: Instance segmentation (with detection outputs)
  • Framework: Ultralytics YOLO

YOLOv8s-Seg performs object detection and segmentation in a single forward pass. The “s” variant provides a favorable trade-off between real-time throughput and accuracy, which is relevant for assistive indoor applications.

Why Segmentation (Not Only Detection)

Segmentation is preferred over pure detection for indoor assistive navigation because:

  1. Pixel-level boundaries better capture the real occupied area of objects (e.g., chair legs, partially occluded humans).
  2. More reliable spatial reasoning: masks enable estimating relative size/area and mask overlap, which can improve heuristics for distance and free-space inference.
  3. Improved robustness in clutter: indoor scenes often involve occlusion and overlapping objects; segmentation can provide more informative geometry than boxes alone.

Intended Use

  • Indoor perception module for assistive navigation research.
  • Real-time chair selection cueing in videos.

Out of Scope

This model is not a complete navigation system. It does not provide global localization, mapping, path planning, depth sensing, or haptic/audio feedback.

Installation

python -m pip install --upgrade pip
pip install -r requirements.txt

Inference

1) Python (image)

from inference import run_image

run_image(
    model_path="yolov8s-seg.pt",
    source="path/to/image.jpg",
    save_dir="runs/image_demo",
    conf=0.25,
)

2) Python (video)

from inference import run_video

run_video(
    model_path="yolov8s-seg.pt",
    source="path/to/video.mp4",
    save_path="runs/video_demo.mp4",
    conf=0.25,
    track=True,
)

3) CLI

python inference.py --model yolov8s-seg.pt --source path/to/image.jpg --save-dir runs/img
python inference.py --model yolov8s-seg.pt --source path/to/video.mp4 --save-path runs/out.mp4 --track

Chair Tracking and “Best Chair” Scoring

For video input, the repository uses a lightweight online tracker to maintain consistent chair IDs across frames. Each chair track receives a score composed of:

  • Confidence: segmentation/detection confidence from YOLO.
  • Distance proxy: based on mask (or box) size; larger apparent area is treated as closer.
  • Temporal stability: rewards tracks that persist over time with consistent observations.

The system outputs the current best available chair (highest score) per frame and overlays it on the visualization.

Limitations

  • Distance estimation is approximate: without calibrated depth, distance is inferred using 2D size heuristics.
  • Domain shift: performance may degrade under unusual lighting, camera lenses, extreme occlusions, or non-standard chair designs.
  • Occlusion and crowded scenes: heavy overlap may lead to fragmented masks or identity switches.
  • Safety-critical use: this research prototype should not be used as the sole safety mechanism.

Future Work

  • Integrate true depth sensing (stereo / ToF / monocular depth) for metric distance.
  • Use re-identification or stronger tracking for long-term consistency.
  • Add affordance/occupancy reasoning (e.g., chair is occupied, blocked, or reachable).
  • Evaluate fairness/robustness across diverse indoor settings and camera viewpoints.

License

  • Code: MIT (see LICENSE)
  • Model weights: released under AGPL-3.0 to align with Ultralytics defaults (update this section if you have a different license for your trained weights and dataset).

Citation

If you use this repository in academic work, cite Ultralytics YOLOv8 and include your thesis/project citation as appropriate.

@software{ultralytics_yolov8,
  title  = {Ultralytics YOLOv8},
  author = {Jocher, Glenn and Ultralytics},
  year   = {2023},
  url    = {https://github.com/ultralytics/ultralytics}
}
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support