An experiment in Steam game discovery.
Select from a number of different ways of looking at the same list of games on the Steam Store. Different datasets use different 'perspectives', so points which are close in one dataset may be distant in another.
This is an embedding-based map of games available in the Steam store (as of 2025-12-14), plotted in 2D so you can visually browse the landscape. Each point is a game: its size scales with review volume, its color reflects overall review sentiment, and its position comes from a learned embedding of the game's description and reviews. The dataset's description shows you which model generated the positions.
The full dataset contains more than 160,000 games and encompasses roughly 5 million individual reviews.
However, some datasets aren't able to show all of these games at once because some games may not be comparable to the rest for a number reasons (for example, they have no user reviews or no user-suggested tags).
Take a look around and see if there's any games out there you haven't heard of but are similar to games you already like.
Use the search bar at the top of the screen to find a game you already enjoy playing, or used to enjoy playing in the past. Then, see which games exist nearby. These are games which are talked about in a similar way to the game you searched for.
I've discovered a few new interesting games this way, and I hope to be able to shed some light on undiscovered but well-reviewed games on Steam.
A very brief and technical overview of how this was cobbled together.
First, approximately one week was spent fetching app information from the Steam Store. Was actually more than 2 weeks because I had to restart a few times to collect additional missed data.
Then, I used my RTX 3090 to start embedding the text I fetched from the Steam Store. This also took a few weeks, since there's so many reviews.
From that data, two embeddings were created per game: one from the store page (title, tags, genres, publisher, developer) and one from the most helpful reviews (given similar context, but shorter description). Reviews are pooled (currently mean pooling with normalization; other experiments pending) to build a representative review vector. The final review embedding and description embedding are then pooled with a 70/30 weight favoring reviews.
The combined embedding vector is reduced from it's native dimension (1024, 2560, etc.) with PCA to 50 dimensions and then flattened to 2D with UMAP. Points are rendered with Plotly in your browser.
Lots of effort was made to make sure the large amount of data didn't require the user to download hundreds of MB in order to view it all. This project takes ~1 GB of disk space on my server, but the main view only needs 30 MB of data to get started, with each additional dataset taking ~7 MB to load.
This map is basically a continuation of an earlier project: a simple HNSW-based search engine powered by an older embedding model (instructor). You can view that at the blog post below, as well.