Automated Product Positioning Tools: Steam Game

Foreword: This ReadMe.md serves as a blog post for submission to the 2025 Google Cloud's Hackathon "BigQuery AI - Building the Future of Data". This submission follows "Approach 2: The Semantic Detective 🕵️‍♀️".

Official page: https://www.kaggle.com/competitions/bigquery-ai-hackathon/overview

Background

Business Question

Performing product positioning in real-time is often required in the current fast-paced environment in particular products with short life cycles. A vivid example is the video game industry. According to SteamDB, there are 18,641 games released on Steam in 2024. The business question relevant in this setting, to both developers and analysts, is how to understand the position of your game in the global market, whether you are planning to produce a computer game or to analyze an existing game.

Pain Point

Let say if you are planning to make a horror game, you would like to know what kind of horror games are available in the market, and how their buyers perceive the quality of horror games. Currently, product or market analysts are conducting their analysis through manual reviews of unstructured data such as game description and user comments. Furthermore, it is hard for analysts to understand foreign markets because of language barriers. The manual process can be broken down into two steps:

Identify computer games with similar characteristics based on their tags on Steam.
Review user comments based on how many people think it is "helpful" or the time of posting the comment.

As such, an automated solution using LLM-based text embedding models is proposed.

Public Dataset

Our solution is based on two public datasets on Kaggle:

steam_game_list A complete list of Steam games (40.21 MB, 55.7k records): All 55,000 Games on Steam (November 2022)
review_data A list of game reviews (8.17 GB, 21.7m records): Steam Reviews Dataset 2021

The data files are stored in Cloud Storage and then loaded to BigQuery according to the schema.

Solution

Automated Workflow

Our solution is based on Vector Search in SQL because it can be integrated easily as a backend process in deployment. The embedding model is text-embedding-004. The automated workflow is sequential in design which accepts user queries in two steps:

The first user query is used to retrieve games satisfying the semantic description of game characteristics or elements e.g.

I would like to find a multi-person strategic game on farming in an open-world setting.
I would like to find a multi-person action RPG game with dragon

Vector search is applied to textual fields in the first dataset consisting of a complete list of Steam games. To ensure accurate matching, two columns from the first dataset is used to create embeddings: short description and tags. A game will be returned as a result only if the semantic meaning of both game descriptions and tags are highly relevant to the user query. Technically, it is done by an inner join of two tables created by vector-searching two embeddings.

The second user query is used to explore user comments based on the previous extracted list of games. It automates the analysis of customer perceptions of product features across multiple countries e.g.

Is this game easy to play for elderly?
Tell me the music and visual aspects of this game.
Is the price expensive or cheap or just affordable given the quality?

Vector search is applied to review column in the second dataset consisting of user comments of some Steam games. This automated workflow based on vector search successfully addresses the abovementioned pain points.

First, the accuracy of identifying games with specified genre and characteristics improves. Second, user comments are retrieved based on semantic meaning in multiple languages but not the number of likes.

Data Model

The connection between steam_game_list and review_data is the Steam ID of each game App ID. It is a unique primary key for steam_game_list and review_data is partitioned by App ID to reduce cost in joining tables.

Architecture Diagram

Demo

The exact implementation can be found in the notebook

User Query on Game Characteristics: I would like to find a multi-person strategic game on farming in an open-world setting.

Result: After extracting games with the game characteristics, we calculate a new feature odd by dividing the number of positive reviews by the number of negative reviews. This new feature measures whether the extent of good rating of a game.

User Query on Product Features: Is this game easy to play for elderly? Farm Together and Farming Simulator 19 with high good-to-bad reviews ratio is commented as friendly game to kids and elderly. So, the ease to play for all ages with is an important factor for farming game.

User Query on Product Features: Tell me the music and visual aspects of this game.

Result: Both games with high or low good-to-bad reviews ratio may have good music and visual aspects. The result shows that music and visual aspects is not the most important factor.

BigQuery Runtime

The SQL queries in our solution can be executed efficiently in a few seconds because of vector indexing and partioning.

User Query on Game Characteristics

User Query on Product Features

Metrics

A Medium blog in 2021 (How Much Time is Spent for Product Managers to do Their Tasks?) suggests that user research and competitor/feature research take up product managers 80 minutes and 68 minutes, respectively, per day. With Google AI BigQuery, our proposed solution reduces the time of information gathering into only at most 5 minutes, which saves more than 90% time.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
results		results
schema		schema
ReadMe.md		ReadMe.md
architecture_diagram.png		architecture_diagram.png
google_hackathon_survey.txt		google_hackathon_survey.txt
prototype.ipynb		prototype.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Product Positioning Tools: Steam Game

Background

Business Question

Pain Point

Public Dataset

Solution

Automated Workflow

Data Model

Architecture Diagram

Demo

BigQuery Runtime

User Query on Game Characteristics

User Query on Product Features

Metrics

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated Product Positioning Tools: Steam Game

Background

Business Question

Pain Point

Public Dataset

Solution

Automated Workflow

Data Model

Architecture Diagram

Demo

BigQuery Runtime

User Query on Game Characteristics

User Query on Product Features

Metrics

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages