
Project Overview
This project develops a real-time obstacle detection and recognition system for a mobile robot using a customized YOLOv8 model. The system is designed to process live video feeds, detect obstacles (such as pedestrians, vehicles, or other objects), and classify them accurately. By leveraging the speed and accuracy of YOLOv8 in Python, the system enhances robot autonomy and safety, enabling the robot to avoid collisions and understand its surroundings effectively. This approach is especially useful in dynamic, real-world environments where quick and accurate object detection is critical.
Key Features
- Customized YOLOv8 Model: The project adapts the latest YOLOv8 architecture for obstacle detection. The model is fine-tuned on a custom dataset specific to the robot’s operational environment, ensuring high accuracy for the classes of interest.
- Real-Time Detection: Leveraging the efficiency of YOLOv8, the system processes video frames in real time (approximately 20 FPS), drawing bounding boxes and class labels over detected obstacles with minimal latency.
- Obstacle Recognition & Classification: Beyond mere detection, the system classifies obstacles into categories (e.g., person, chair, vehicle), enabling the robot to make informed decisions—such as slowing down for pedestrians while navigating around static objects.
- Distance Estimation: Although YOLOv8 does not directly provide depth, the system incorporates simple geometric calculations based on bounding box sizes and camera calibration data to approximate distances, enhancing the robot’s ability to plan avoidance maneuvers.
- Python-based Implementation: The entire pipeline is implemented in Python using OpenCV for image processing and PyTorch for running YOLOv8. This makes it easily integrable into existing Python-based robotics systems.
- User-Friendly Integration: The design focuses on simplicity and flexibility. The code is modular, enabling quick adaptation or extension of the detection system without the need for specialized hardware like Jetson or middleware like ROS.
Development Process & Challenges
- Dataset Preparation: A custom dataset was created by collecting images from the robot’s environment and manually annotating them using tools like LabelImg. Data augmentation (rotations, brightness adjustments) was employed to increase robustness.
- Model Customization: We fine-tuned YOLOv8 on the custom dataset. A key challenge was ensuring the model learned to differentiate between similar objects (e.g., distinguishing a person from a similarly shaped inanimate object). Techniques such as careful annotation and balanced class weighting helped achieve high precision.
- Real-Time Performance: Ensuring that the detection ran at around 20 FPS on standard hardware required code optimization. We streamlined image preprocessing using OpenCV and optimized the YOLOv8 inference pipeline with PyTorch to reduce latency.
- Integration of Distance Estimation: Since YOLOv8 does not provide depth information, a simple geometric method was implemented to estimate object distance using known camera parameters and bounding box dimensions. This added a layer of intelligence for dynamic obstacle avoidance.
- Robustness Under Varied Conditions: The model was tested under different lighting conditions and backgrounds. Initial tests showed false positives in shadowed areas; refining the training dataset and adjusting confidence thresholds mitigated these issues.
Technologies Used
- Programming Language: Python
- YOLOv8: Utilized for state-of-the-art object detection and recognition.
- OpenCV: For image handling, preprocessing, and drawing bounding boxes.
- PyTorch: To run and fine-tune the YOLOv8 model.
- Data Augmentation Tools: To expand the custom dataset and improve model robustness.
- Visualization Libraries: Matplotlib and OpenCV for real-time visual feedback during development and testing.
Achievements & Metrics
- High Detection Accuracy: The fine-tuned YOLOv8 model achieves around 92% mean Average Precision (mAP) on the test set, ensuring that most obstacles are detected and classified correctly.
- Real-Time Processing: The optimized pipeline runs at approximately 20 FPS, ensuring the robot can react promptly to dynamic obstacles.
- Robust Classification: The system consistently differentiates between critical classes (e.g., distinguishing a person from inanimate objects) even under varying lighting conditions.
- Effective Distance Estimation: The incorporated geometric approach provides reasonable distance approximations, enabling better obstacle avoidance planning.
- Enhanced Navigation Safety: In simulation and live tests, the robot successfully avoids collisions in scenarios where traditional sensor-based methods might fail, demonstrating the system’s practical benefits.