A hands-on comparison of ResNet50, SigLIP, and Gemini for building smarter recommendation & Search Systems in Elasticsearch.
6 min read
Just now
--
Press enter or click to view image in full size
Building a recommendation system that truly understands aesthetics is challenging. While Convolutional Neural Networks (CNNs) capture basic visual features like edges and colors, they often miss the big picture.
As illustrated in a , ResNet50 struggled with semantic relevance. To mitigate this issue, I had to implement an additional filtering layer using Filtered k-NN in Elasticsearch.
In another story, I experimented with SigLIP for embedding extraction, and it helped produce better recommendations without needing to apply Filtered k-NN.
The goal of this story is to experiment with an API-based embedding model and evaluate its results against the ResNet and SigLIP models used previously. Concretely, this story leverages the Gemini Embedding API through the Google AI Studio and the GenAI Python SDK.
Below are the other stories related to building a visual recommendation system.
