FuseRank

A two-stage hybrid anime recommender built with TensorFlow embeddings, Flask, and Kubernetes on GKE

I built and deployed a two-stage hybrid recommender that blends user-based collaborative filtering with embedding-similarity expansion in a single weighted ranking pass, improved relevance over KNN, and reduced p95 latency through that hybrid pipeline. FuseRank ingested 50M interaction rows from a 70M+ source dataset, filtered to users with at least 400 ratings before training, and trains 128-dimensional user and anime embeddings on the result, with artifact versioning, experiment tracking, automated testing, and deployment automation across Docker, GitHub Actions, Kubernetes, and GKE.

ContextPersonal Project

RoleMachine Learning Engineer

TeamSolo

DateJun 2025

Built the complete pipeline solo, including data ingestion, both recommendation stages, the hybrid scorer, the Flask interface, the testing work, and the deployment workflow.

50M interaction rows (70M+ source)128-dim user/anime embeddings0.1863 MAE / 0.0727 MSE

Hybrid RecommenderSearch and RankingCosine SimilarityTensorFlow EmbeddingsCollaborative FilteringMatrix Factorization

Source

Overview

FuseRank ingested 50M interaction rows from a 70M+ MyAnimeList source dataset and filtered to users with at least 400 ratings before training. A TensorFlow matrix-factorization model learns 128-dimensional user and anime embeddings from interaction data, and cosine similarity over those embeddings powers two recommendation stages: a user-based pass that surfaces what similar users rate highly, and an item-based pass that expands those results with anime similar to them. A weighted hybrid scorer merges both stages into one ranked list before Flask serves the recommendations.

Architecture

The training pipeline loads interaction and anime metadata from Google Cloud Storage, filters to active users with at least 400 ratings, scales ratings to a 0-1 range, and tracks experiments with CometML. A TensorFlow matrix-factorization model learns 128-dimensional user and anime embeddings through a normalized dot-product interaction, trained with a custom warm-up and decay learning-rate schedule, early stopping, and checkpointing on the best validation loss. After training, the embeddings are L2-normalized so cosine similarity can drive two recommendation stages: finding users similar to a given user and surfacing their top-rated anime, then finding anime similar to those results to expand the candidate set. A weighted hybrid scorer combines both stages into a single ranked list, while Docker and GitHub Actions package and deploy the Flask service to GKE.

What I Built

I personally built the data preparation pipeline, the TensorFlow embedding training, the user-based and item-based similarity stages, the weighted hybrid ranking, artifact versioning, the Flask serving layer, automated testing, and the deployment workflow.

Trained a TensorFlow matrix-factorization model with 128-dimensional user and anime embeddings, using a normalized dot-product interaction, a custom warm-up and decay learning-rate schedule, and early stopping, reaching convergence by epoch 16 with a 0.4963 test loss, 0.1863 test MAE, and 0.0727 test MSE.
Built a two-stage hybrid recommender: a user-based stage that finds similar users through embedding cosine similarity and surfaces their top-rated anime, and an item-based stage that expands those results with anime similar to them, combined through a weighted scorer.
Filtered the interaction data to active users with at least 400 ratings, scaled ratings to a 0-1 range, and label-encoded user and anime IDs before training.
Versioned datasets and model artifacts with DVC backed by Google Cloud Storage instead of committing large files to Git.
Configured GitHub Actions to pull versioned artifacts with DVC, build and push the Docker image, and deploy it to a 2-pod GKE cluster behind a load-balanced service.
Implemented automated tests during the project to validate the training and ranking workflow.

Engineering Decisions

Two-stage hybrid: user-based and item-based embedding similarity

Why — A user-based pass captured what similar users actually preferred, while an item-based pass over the same embeddings expanded the candidate set with anime similar to those picks. Combining them produced a broader ranking signal than either stage alone.

Trade-off — Both stages depend on the same learned embeddings, so the hybrid adds ranking diversity through two related similarity lookups rather than two fully independent signal sources.

Filter to active users with at least 400 ratings before training

Why — The pipeline ingested 50M interaction rows from a 70M+ MyAnimeList source dataset and filtered to users with at least 400 ratings before training, keeping training practical while retaining enough user-item history to train the embedding model.

Trade-off — Filtering out lower-activity users may have omitted long-tail interaction patterns present in the full dataset.

Custom learning-rate schedule with early stopping

Why — A warm-up-then-decay schedule combined with early stopping and checkpointing on validation loss let training run for up to 20 epochs while automatically restoring the best-performing weights, rather than fixing the epoch count in advance.

DVC + GCS for data and model versioning

Why — DVC and Google Cloud Storage kept large datasets and model artifacts outside the Git repository while preserving versioned references to them.

Results & Validation

Training converged by epoch 16, reaching a test loss of 0.4963 (vs. 0.4268 on train), 0.1863 test MAE, and 0.0727 test MSE. In a separate evaluation, the hybrid ranking improved relevance by 17% over KNN, and the deployed serving setup reduced p95 latency by 30%.

Automated testing was implemented during the project, and validation covered model training, recommendation behavior, and the reported loss, MAE, and MSE metrics, tracked epoch by epoch through CometML against a held-out set of ratings.

This version focused on model training, ranking, packaging, and deployment automation. The GKE deployment ran two replica pods behind a load-balanced service with a readiness probe, and the workflow was engineered for reproducibility rather than permanent public availability.

Deployment

Docker packaged the Flask service, GitHub Actions pulled versioned model artifacts with DVC, built and pushed the image, and deployed it to a GKE cluster on Google Cloud during the project period. The deployment is no longer active.

Evidence / Technologies

View embedding model architecture View hybrid ranking logic View training pipeline View CI/CD workflow View Kubernetes deployment

Hybrid RecommenderSearch and RankingCosine SimilarityTensorFlow EmbeddingsCollaborative FilteringMatrix FactorizationWeighted ScoringFlaskDockerGitHub ActionsKubernetesGKEDVCGoogle Cloud StorageCometMLPython

Back to all projects