Setup your virtual environment using pyenv (pyenv installer)
pyenv install 3.11.5
pyenv local 3.11.5
python -m venv venv
source venv/bin/activateThen install the requirements and the package locally:
make install
The files train_samples.parquet and test_samples.parquet must be in the folder data/raw
We want to predict the user LTV for an app at 120 days after the installation using only its features and what the user did during the first 24 hours.
In the context of gaming apps, revenue can come from ads watched or in-app purchases.
- Business Metric / Online Metric
The business metric would be the total amount of revenue generated by actions made based on the predictions. In this use case, it would be impossible to compute them.
- ML Metric / Offline Metric
The metric to evaluate the performance of the algorithm would be the difference between the actual d120 revenue and the prediction.
In the trainset, 20% of users generate 0$. The other revenue are spread from 0.0002$ to thousands of dollars. That is why we will use the log of the revenue to compute the loss of the algorithm (using a simple MSE).
The data consist in 7 month of d0 metrics, user features and the d120 total revenue (also split between ad and iap) (see Raw Features Section)
In the data, the are no users in more than on app. The constructed cohort value using a hash of the tuple (install_date, app_id, campaign_id, advertiser_id) will not be used.
In this approach we will only use the d120_rev. As this revenue can be split in d120_iap_rev and d120_ad_rev, these two value could be seperate targets.
- User features
| Feature | Type | Description |
|---|---|---|
| app_id | STRING | Unique identifier for each application |
| install_date | STRING | Date when the user installed the app |
| platform | STRING | Operating system platform of the user's device (iOS or Android) |
| is_optin | STRING | Indicates if the user opted in for personalized ads or other services |
| game_type | STRING | |
| country | STRING | Country where the user downloaded the app |
| manufacturer | STRING | Manufacturer of the user's device |
| mobile_classification | STRING | Classification of the mobile device |
| city | STRING | City where the user downloaded the app |
- Extra features
| Feature | Type | Description |
|---|---|---|
| install_month | INTEGER | month of installation |
| install_day_of_month | INTEGER | day of month of installation |
| install_day_of_week | INTEGER | day of week of installation |
- Engagement features
| Feature | Type | Description |
|---|---|---|
| iap_count | INTEGER | Time series features from: Number of items bought by the user |
| iap_coins_rev | INTEGER | Time series features from: Revenue from in-app purchases of coins |
| iap_coins_count | INTEGER | Time series features from: Number of coin items bought by the user |
| iap_ads_rev | INTEGER | Time series features from: Revenue from in-app purchases related to ads (for instance paying to remove ads) |
| iap_ads_count | INTEGER | Time series features from: Number of ad items bought by the user |
| ad_count | INTEGER | Time series features from: Number of ads viewed by the user |
| session_count | INTEGER | Time series features from: Number of sessions done by the user, a session is defined by the moment a user opens the app |
| game_count | INTEGER | Time series features from: Number of games played; you can have multiple games played within a session |
| current_level | INTEGER | Time series features from: User's current game level |
| session_length | INTEGER | Time series features from: Sum of every session length |
| coins_spend_sum | INTEGER | Time series features from: Total amount of coins spent by the user |
| booster_used_count | INTEGER | Time series features from: Number of boosters used by the user |
| rv_shown_count | INTEGER | Time series features from: Number of rewarded videos shown to the user |
- Targets
| Feature | Type | Description |
|---|---|---|
| d120_rev | FLOAT | Total revenue generated on days 120 |
| d120_iap_rev | FLOAT | Cumulated IAP revenue generated on days 120 |
| d120_ad_rev | FLOAT | Cumulated ad revenue generated on days 120 |
python -m voodoo_homeworkWill display all possible commands with their description. You can display each command documentation with:
python -m voodoo_homework <command> --helpUsing the raw data we want to make a train/validation/test split based on the column install_date
For the default values you can do:
make datasetThe train and validation set are all the lines in the time range specified in the configuration file.
No features are computed yet.
We want to be able to compute our features separatly by "category". It enables to compute them in parallel and join them only at the end.
To build every group of features run:
make build-features
and to merge them into a unified parquet file launch:
make merge-features
Here we just transform the type of the user's feature to be handle as categorical value later in the training part.
These feature are there to show how features can be computed separatly and merge afterwards. We just extract the month, day of month and day of week from the install_date feature.
Engagement features are computed using time series package tsfresh and the config file value historical_data_points which is the list of data points to use:
- "0,3,10": to use
d0,d3andd10 - "0": to use only
d0point
it will be useful when we will train models on more data than the first 24h.
The neural network is implemented using Tensorflow to be able to visualize easily the training process using Tensorboard, to save and use the model quickly and to be able to complexify it without changing too much the code. It eases also the normalization of numerical features and the handling of categorical features as it will be embed in the graph.
make trainTo check the training live, launch Tensorboard:
tensorboard --logdir logs
Save the predictions and print the performance
make predictionsmake testsor
pytest tests- Predict ad revenue and iap revenue. Then use a ML layer above to make the final prediction
- Change the loss to more meaningful metrics like WMAPE
- Fine-tune parameters: leaning rate, network architecture etc.
- Complete unit tests
Let's say we already have a data warehouse where the event are stored and updated live if needed. The different steps for a complete (Kubeflow) pipeline and a deployment would be:
-
Feature Engineering: from the data warehouse, construct all the feature needed. It could be done with SQL queries (for instance using
BigQuery's power) orBeamwithSparkto have a more sustainable codebase. The features will be store in databases like SQL for the training part, or in file likeParquetorTFRecords. For the serving part, the features are computed on the fly using fast reading database (like in memory database). -
Model Training: once we have our features, if we use a neural network, we could deploy our code in a kubernetes pod using our Dockerfile, and launch multiple trainings (for GridSearch for instance). The evolution of the trainings could be follow using
Tensorboard. At the end, the models would be saved inGCSorS3to be call later. -
Offline evaluation: once our model is trained, we want to observe its performances on a testset. We could use the
ML.PREDICTfunction ofBigQueryto apply our model to our testset stored inBigQueryand then plug aGoogle Data Studioon the results to have a proper dashboard. With this, we could easily compared multiple algorithms on the same dashboard and choose the best one to put in production. -
Serving: the serving could be done in two different ways. Either you make your predictions in batch every night and store them in a big key-value database which will be called each time we need a prediction. Or the predictions are computed online.
- Offline approach: with few users, this could be a good solution with very good response time. The generation of the feature for the testset will be very easy as it would be exactly the same code as the traiset generation. The only problem would be the lack of contextual feature: time of the day, device, live popularity.
- Online approach: slower (compute the feature then query the model) and more complicated to maintain (multiple services), this approach will be usually more accurate and would use every live and contextual signals. One problem will be the computation of the features live which can be tricky.
-
Online evaluation
- Monitor the services: using
Grafanadashboard orDatadogwe could observe the number of query to our service, the response time per step (feature construction, predictions), the total response time and the ressources usage. - AB Test: to assess the performance of our approach we have to perform an AB Test which will compare our metric to the previous version of the algorithm. It has to be well calibrated to be able to make valuable conclusion (randomization unit, sample size, etc.). The conclusions should be made at the right moment: we should be careful with the first days results and wait to have statistical significance.
- Monitor the services: using