Load data with Marshal instead of YAML#17
Load data with Marshal instead of YAML#17markprzepiora wants to merge 2 commits intomonterail:masterfrom
Conversation
|
This looks awesome. Thank you for writing it so that it can be sped up. I just wish the gem author merged it now. |
|
@markprzepiora, I'd like to explore options for storing the data. What I ended up doing in a fork was converting the YAML to CSV and using the FastCSV gem to process the data more quickly and to prevent loading it into memory all at once; however I don't feel that it was necessarily the best method. Another idea is to keep the YAML around for development purposes but bundle a SQLite database generated by a rake task. Since you've opened this PR I wanted to get your feedback. |
| require 'yaml' | ||
|
|
||
| module ZipCodes | ||
| VERSION = '0.2.1' |
There was a problem hiding this comment.
This is a breaking change (semver-1)
| VERSION = '0.2.1' | |
| VERSION = '1.0' |
If it's going to be included one day, I think the PR needs to include a build script, that convert a YAML (easy for humans to deal with) into the marshaled format.
There was a problem hiding this comment.
Feel free to also open a PR against https://github.com/suvie-eng/zip-codes/
I'm not wanting to create maintainer controversy - but until the maintainers here integrate all of these, I'd like to have a gem that is the most up to date. Also, I like the performance of this change 👌
|
Library data like this really shouldn't be distributed via Marshal - the format of Marshal is not guaranteed to be stable between ruby versions (and indeed, is not in practice). I'd suggest either pursuing fast CSV options or just querying a sqlite file. |
Loading 4MB of YAML data takes about 1 second on my development machine. Replacing the
YAML-dumped data withMarshal-dumped data instead brings this down to about 0.08 seconds. This makes a big difference in feedback speed especially when running individual tests that rely onZipCode.identify.