Skip to content

Upstream changed database format and everything is broken #7

@maxpoulin64

Description

@maxpoulin64

Lots of things are broken, and I'll probably have to start over to match theirs.

So, props where due, they finally moved away from a giant JSON of everything and now ship an individual file per month. However it's still compressed, and it still extracts to the same file name.

The new report format is still very inconsistent, and some of the important keys are gone. Most notably, the ratings are gone.

For example, the first report of the December 2019 set:

  {
    "app": {
      "steam": {
        "appId": "352620"
      },
      "title": "Porcunipine"
    },
    "responses": {
      "answerToWhatGame": "352620",
      "appSelectionMethod": "libraryLookup",
      "installs": "no",
      "notes": {
        "extra": "oh noes",
        "verdict": "is le borked"
      },
      "protonVersion": "Default",
      "type": "steamPlay",
      "verdict": "no"
    },
    "timestamp": 1572299227,
    "systemInfo": {
      "cpu": "Intel Core i5-6600K @ 3.50GHz",
      "gpu": "NVIDIA GeForce GTX 980 Ti",
      "gpuDriver": "NVIDIA 396.54",
      "kernel": "4.15.0-33-generic",
      "os": "Ubuntu 18.04.1 LTS",
      "ram": "16 GB"
    }
  }

A more complete report looks like this, but still lacks rating:

{
    "app": {
      "steam": {
        "appId": 637650
      },
      "title": "FINAL FANTASY XV WINDOWS EDITION"
    },
    "responses": {
      "answerToWhatGame": "637650",
      "audioFaults": "yes",
      "customizationsUsed": {
        "customProton": true
      },
      "duration": "aboutAnHour",
      "extra": "no",
      "followUp": {
        "audioFaults": {
          "crackling": true
        },
        "performanceFaults": "slightSlowdown"
      },
      "graphicalFaults": "no",
      "inputFaults": "no",
      "installs": "yes",
      "isImpactedByAntiCheat": "no",
      "isMultiplayerImportant": "no",
      "launcher": "steam",
      "localMultiplayerAttempted": "no",
      "notes": {
        "customizationsUsed": "https://github.com/GloriousEggroll/proton-ge-custom/releases/download/4.21-GE-1/Proton-4.21-GE-1.tar.gz",
        "protonVersion": "Proton-4.21-GE-1",
        "verdict": "Other then crackling audio and having to run the game on lowish settings 60fps gameplay with occasional stuttering the demo was playable"
      },
      "onlineMultiplayerAttempted": "no",
      "opens": "yes",
      "performanceFaults": "yes",
      "protonVersion": "Proton-4.21-GE-1",
      "saveGameFaults": "no",
      "significantBugs": "no",
      "stabilityFaults": "no",
      "startsPlay": "yes",
      "type": "tinker",
      "verdict": "yes",
      "windowingFaults": "no"
    },
    "timestamp": 1575290512,
    "systemInfo": {
      "cpu": "Intel Core i5-9600K @ 3.70GHz",
      "gpu": "NVIDIA GeForce GTX 980 Ti",
      "gpuDriver": "NVIDIA 430.64",
      "kernel": "5.3.12-1-MANJARO",
      "os": "Manjaro Linux",
      "ram": "32 GB"
    }
  }

Overall not pleased with the new format. It's as annoying as hell to work with, it's inconsistent and severely lacks documentation of what keys are available and what they mean. "yes" and "no" strings instead of booleans. This screams that there weren't any design work whatsoever on the database format or export format. I hadn't seen such a bad database dump in years, and that's having worked with weird government supplied "CSV"s. They just keep adding random fields at random places. For example, in the above example, there's two customizationsUsed keys at different places with different meanings.

I'm going to need some time to think about this and might give up on normalizing that completely insane data set and just use MongoDB instead. Garbage in, garbage out. I can't make a table schema capable of fitting all of these fields without some proper documentation of what fields can exist.

At this point I don't know if upstream is just very bad, or specifically making the dumps as unusable as possible so people can't build competing websites. There's clearly minimal effort being put into this, which in my opinion is extremely disrespectful of the open source community and the people that take their time to fill in that data.

cc @DanMan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions