Building a Real-Time LED Classifier on ESP32 with Computer Vision

How I implemented on-device computer vision on an ESP32-S3 microcontroller using HSV color filtering for real-time LED detection and classification.


Introduction

Embedded computer vision is often associated with powerful single-board computers like Raspberry Pi or NVIDIA Jetson. But what if you could implement real-time image classification on a $10 microcontroller with just 512KB of RAM?

In this project, I built a complete computer vision pipeline that runs entirely on an ESP32-S3 microcontroller. The system captures live camera frames, processes them on-device using HSV color filtering, and classifies colored LEDs (Red, Green, Blue) in real-time—all while controlling physical LEDs to provide immediate feedback.

No cloud processing. No external compute. Just efficient embedded systems engineering.


Table of Contents

  1. The Challenge
  2. System Architecture
  3. The HSV Color Space Advantage
  4. Implementation Details
  5. Experimentation and Tuning
  6. Real-Time Performance
  7. Conclusion

The Challenge

Computer vision on the ESP32-S3 means working with just ~512KB SRAM, no GPU, and pure CPU processing. In embedded environments, optimization isn't optional—it's mandatory.

My first instinct was to use OpenCV for C++, but it doesn't work on the ESP32—the library's dependencies and memory footprint far exceed what the microcontroller can handle. Machine learning models were also out of the question due to model weights, inference engines, and memory consumption.

The solution: HSV color filtering—a lightweight classical computer vision technique that requires no model weights, uses simple arithmetic operations, and provides deterministic performance. The goal was to build a system that captures frames, processes them on-device, classifies LEDs (Red, Green, Blue), controls output LEDs, and achieves 5-10 FPS—all on a $10 microcontroller.


System Architecture

System ArchitectureSystem architecture showing data flow from camera to LED output

Hardware Components


HSV Filtering in Action

Here's how HSV filtering isolates specific LED colors from a scene. By counting the number of white pixels in each filtered image, the classifier determines if each color is present. If the count exceeds a threshold (50 pixels), that LED is classified as "detected."

Original camera frameOriginal frame
Red filter appliedRed filter applied

Implementation Details

The image processing pipeline processes each frame in four stages:

  1. JPEG Decoding - Convert compressed JPEG from camera to RGB888 format for pixel-level access
  2. RGB to HSV Conversion - Transform pixels to HSV color space where color information is separated from brightness
  3. HSV Filtering - Count pixels matching predefined HSV ranges for each target color (red, green, blue)
  4. Threshold Classification - If pixel count exceeds threshold (50 pixels), classify that LED as detected

Red LEDs required special handling due to hue wraparound (0° and 360° both represent red). The solution: check two separate HSV ranges and sum the matching pixels.


Experimentation and Tuning

Before deploying to the ESP32, I built a PC-based experimentation environment for rapid iteration:

The Development Loop

  1. Capture test images from the ESP32 camera
  2. Process offline using the experimentation harness
  3. Visualize results in Jupyter notebooks
  4. Tune HSV ranges until detection is reliable
  5. Deploy optimized parameters to ESP32

This approach dramatically accelerated development—no need to upload firmware for every parameter tweak.

Key Tuning Insights


Real-Time Performance

The system achieves impressive real-time performance on resource-constrained hardware, accurately detecting different LED combinations. The physical output LEDs mirror detected colors instantly, providing tactile confirmation of the classification results.

All three LEDs detectedAll LEDs
Two LEDs detectedTwo LEDs
One LED detectedOne LED
No LEDs detectedNo LEDs

Conclusion

This project demonstrates that sophisticated computer vision doesn't require expensive hardware or cloud connectivity. With careful algorithm selection and efficient implementation, a $10 microcontroller can perform real-time image classification entirely on-device.

Embedded computer vision is becoming increasingly accessible. Projects like this prove that powerful AI and vision capabilities can run on the smallest of devices—no cloud required.

Repository: View on GitHub