Visual Product Recommendation

Goal: Build an image-based search app that recommends visually similar fashion products from an uploaded query image.

Tools/Tech: Python, FastAPI, PyTorch, Hugging Face Transformers, ViT, nmslib, HNSW, Docker, Railway

This project uses a pre-trained vision transformer, google/vit-base-patch16-224, as a feature extractor for clothing images. The final classification head is skipped and the pooled image representation is used as an embedding, allowing uploaded images to be compared against a catalogue in vector space.

The dataset started from 50,000 clothing images. The preprocessing pipeline removes unreadable files, 1x1 images, and duplicate filenames before generating embeddings for the remaining catalogue. This leaves around 30,000 usable images for search.

Search Implementation

I first implemented cosine similarity by loading all embeddings and comparing them one by one, which made searches take around 20 seconds. Stacking the embeddings into one tensor and computing cosine similarity in a vectorised pass reduced this to around 2 seconds while also simplifying storage into a single artifact.

The app also includes an approximate nearest-neighbour mode using Hierarchical Navigable Small Worlds through nmslib. The frontend lets users switch between vectorised cosine search and HNSW, adjust the neighbour count, and compare query times directly in the UI.

Future Improvements

Useful next steps would be increasing the dataset size, comparing more ANN libraries or vector databases, testing domain-specific image encoders, and moving large generated artifacts to object storage or Git LFS.