Product Recognition in Retail Stores

Overview

As tech lead of the product recognition team at Scandit, I was driving the ML development of ShelfView, a cloud solution that helps retailers understand what products are on their shelves through computer vision.

ShelfView revolutionizes retail operations by providing real-time shelf monitoring, planogram compliance checking, and inventory management through advanced computer vision technology. The system processes millions of images daily, helping major retailers optimize their shelf space and reduce out-of-stock scenarios.

→ Learn more about ShelfView

Key Achievements

Built large-scale object detection and classification pipelines validated during pilots in the stores of European and US supermarket retailers
Led visual search product development for 3 years from a simple prototype to a system that scaled to processing >1M images/day
Increased end-to-end accuracy from 40% to 95% for the product search across >10K unique products
Built an evaluation and annotation system from ground that grew to >1M images with >100M annotations for training and testing
Grew my project squad from 2 to 8 engineers while giving them mentoring and technical leadership

Technical Stack

Python - Primary development language for ML pipelines and backend
PyTorch - ML framework for computer vision models
Comet - ML experiment tracking framework
Django and PostgreSQL - Backend development, admin panels and production database
OpenAPI and Swagger - API specification and development
GCP - Google Cloud infrastructure and ML services
ClickHouse and Metabase - Evaluation database and metric dashboards
Dataloop - Image annotation tooling and labelling job management
Gitlab - Git server and build, evaluation and deployment workflows

Machine Learning

I trained custom ML models from scratch in PyTorch for object detection and for image embedding extraction. We did continuous training and experiments based on investigating failure modes and discussing ideas on how to incorporate new and improved data and architectures.

The training and evaluation was based on in-house data that was collected through an external pool of annotators. We collected and annotated ~1M images with >100M single annotations for detection and classification. We contracted external human labelers to scale annotation work. Their labelling followed project-specific standards that we defined and validated through tooling.

I implemented GUI tools to speed up debugging and which enabled quicker root cause analysis of regressions. This included metrics dashboard that tracked progress on custom metrics over time and reported regressions.

Ralph Aeschimann

Homepage of a Machine Learning Engineer

Product Recognition in Retail Stores

Overview

Key Achievements

Technical Stack

Machine Learning