YOLOv8s-Seg for Assistive Indoor Navigation (Humans & Chairs)
Overview
This repository provides an Ultralytics YOLOv8s-Seg instance segmentation model trained to detect and segment humans and chairs in indoor environments. The model is intended as a perception component for an AI-powered smart indoor navigation system for blind and visually impaired users, where accurate understanding of object boundaries can support safer navigation and more reliable accessibility cues.
The repository includes a reproducible inference pipeline for images and videos, and an optional Hugging Face Spaces-ready Gradio demo. For videos, an online tracker is used to maintain chair identities across frames and to compute a real-time “best available chair” recommendation based on confidence, distance proxy, and temporal stability.
Motivation
Indoor navigation in cluttered spaces benefits from more than bounding boxes: navigation and accessibility cues often require object extent and free-space reasoning. Chairs are a common indoor affordance (sitting and resting), but selecting a usable chair requires consistent identification over time and a notion of proximity and stability. This model targets these requirements by combining segmentation with tracking and a chair scoring mechanism.
Model Architecture
- Base model: Ultralytics YOLOv8s-Seg
- Task: Instance segmentation (with detection outputs)
- Framework: Ultralytics YOLO
YOLOv8s-Seg performs object detection and segmentation in a single forward pass. The “s” variant provides a favorable trade-off between real-time throughput and accuracy, which is relevant for assistive indoor applications.
Why Segmentation (Not Only Detection)
Segmentation is preferred over pure detection for indoor assistive navigation because:
- Pixel-level boundaries better capture the real occupied area of objects (e.g., chair legs, partially occluded humans).
- More reliable spatial reasoning: masks enable estimating relative size/area and mask overlap, which can improve heuristics for distance and free-space inference.
- Improved robustness in clutter: indoor scenes often involve occlusion and overlapping objects; segmentation can provide more informative geometry than boxes alone.
Intended Use
- Indoor perception module for assistive navigation research.
- Real-time chair selection cueing in videos.
Out of Scope
This model is not a complete navigation system. It does not provide global localization, mapping, path planning, depth sensing, or haptic/audio feedback.
Installation
python -m pip install --upgrade pip
pip install -r requirements.txt
Inference
1) Python (image)
from inference import run_image
run_image(
model_path="yolov8s-seg.pt",
source="path/to/image.jpg",
save_dir="runs/image_demo",
conf=0.25,
)
2) Python (video)
from inference import run_video
run_video(
model_path="yolov8s-seg.pt",
source="path/to/video.mp4",
save_path="runs/video_demo.mp4",
conf=0.25,
track=True,
)
3) CLI
python inference.py --model yolov8s-seg.pt --source path/to/image.jpg --save-dir runs/img
python inference.py --model yolov8s-seg.pt --source path/to/video.mp4 --save-path runs/out.mp4 --track
Chair Tracking and “Best Chair” Scoring
For video input, the repository uses a lightweight online tracker to maintain consistent chair IDs across frames. Each chair track receives a score composed of:
- Confidence: segmentation/detection confidence from YOLO.
- Distance proxy: based on mask (or box) size; larger apparent area is treated as closer.
- Temporal stability: rewards tracks that persist over time with consistent observations.
The system outputs the current best available chair (highest score) per frame and overlays it on the visualization.
Limitations
- Distance estimation is approximate: without calibrated depth, distance is inferred using 2D size heuristics.
- Domain shift: performance may degrade under unusual lighting, camera lenses, extreme occlusions, or non-standard chair designs.
- Occlusion and crowded scenes: heavy overlap may lead to fragmented masks or identity switches.
- Safety-critical use: this research prototype should not be used as the sole safety mechanism.
Future Work
- Integrate true depth sensing (stereo / ToF / monocular depth) for metric distance.
- Use re-identification or stronger tracking for long-term consistency.
- Add affordance/occupancy reasoning (e.g., chair is occupied, blocked, or reachable).
- Evaluate fairness/robustness across diverse indoor settings and camera viewpoints.
License
- Code: MIT (see
LICENSE) - Model weights: released under AGPL-3.0 to align with Ultralytics defaults (update this section if you have a different license for your trained weights and dataset).
Citation
If you use this repository in academic work, cite Ultralytics YOLOv8 and include your thesis/project citation as appropriate.
@software{ultralytics_yolov8,
title = {Ultralytics YOLOv8},
author = {Jocher, Glenn and Ultralytics},
year = {2023},
url = {https://github.com/ultralytics/ultralytics}
}
- Downloads last month
- 17