add support for libpostal over http service, adapter pattern et al.#249
add support for libpostal over http service, adapter pattern et al.#249missinglink wants to merge 4 commits intomasterfrom
Conversation
|
Okay, I noticed that the demo didn't work with this PR, so I tried to figure out why. It was pretty difficult and actually exposed an issue in the error handling for the server. I was using Portland Metro data, and was using the demo to explore, which generated the following query: This URL returns a 400 error, but also an empty JSON response, as shown below: Diving deper, I took a look at the relevant express code: Lines 99 to 104 in 3fa614e This code doesn't properly print the error to the response body, as Printing the error to stdout reveals the following: So there are two issues here:
|
705290c to
9db959d
Compare
|
Demo issues fixed in 6a5e084 and rebased to all 3 branches. |
449d1b4 to
538bc3c
Compare
538bc3c to
de9206b
Compare
|
I just did a quick rebase of this now that #244 is merged. We can continue to test this out and hopefully merge it soon! |
de9206b to
fb8d77e
Compare
c09b84f to
6e33f9e
Compare
|
Well its been a while for this one but after some of the cleanup and maintenance work we've been doing in the interpolation repository, I took another look at this. With all the better-sqlite3 stuff long since merged, this PR is a lot simpler now. I've rebased it on top of the latest changes from the The development workflow for updating the mock libpostal responses went smoothly (set up libpostal and configure the Maybe we take a look at getting this merged, since it's been quite helpful for avoiding the extra memory usage of having libpostal data loaded twice. My current questions are:
|
|
I agree it would be great to merge this functionality, since it provides a lot of flexibility going-forward. Regarding which adapter is appropriate for default => (HTTP | NPM) is yet to be decided. One of the complexities here is that this PR couples the adapter pattern code with a change to the default adapter to change it from NPM Module => HTTP. So.. while I'm not totally against that, the HTTP and NPM Module adapters both have pros and cons, neither is a clearly superior method to the other, but I just wanted to note that its two changes in one, which is making this harder to merge. Coupled with the unknown topic of performance (and consequently) build times it might be better to merge the code changes without changing the default adapter and then measuring the performance of different methods? Alternatively we can use the existing code to measure the performance of a full planet build using HTTP and compare it to NPM Module which should give us some more confidence that changing the default adapter is the right choice, despite the additional complexity of network service discovery and I/O. It might be worth running that build twice, since changes to how the |
this test is only applicable when libpostal is installed locally
d46a824 to
77f8209
Compare
|
I rebased this again today, agreed it would be nice to finally get it merged in, I'll try to get to it this week. @orangejulius any additional thoughts/concerns I should be aware of before I get it ready for merging? |
|
Yeah, as I recall there are at least a couple reasons we held off merging it. They aren't insurmountable but they were enough that it was too much of a pain to actually merge all the way through. Impact on Pelias DockerEither we allow the interpolation service to continue using local libpostal in memory, or we need to set up some new steps similar to Impact on interpolation buildsInterpolation builds are already slow enough, and as I recall they make a call to libpostal for every line in the address files read during the build. The overhead of HTTP calls probably would make it even slower, which we may not want Interpolation Docker image considerationsFinally, as it stands now this PR changes the interpolation Docker image so that it does not depend on the That's technically ideal from an efficiency standpoint if we know we don't need to load libpostal in memory. However, it reduces flexibility, and due to the nature of how the layers of Docker images work, it probably doesn't affect much in practice: as long as the same physical machine downloads both the Path ForwardThinking about it now, the easiest way to merge this PR might be to adjust it so that by default, the interpolation service uses libpostal in memory as it has in the master branch for all time. I think you suggested that in some earlier comments here. We would need some config value to tell the interpolation service to prefer using the remote libpostal service in some situations. Right now, this PR just looks for the Sidenote: this PR started the convention of using the root level We should think about options for what that config value is, it could be simple boolean like |

this code is branched off #244, please merge that PR first!
▶️ see only the diff ◀️
this draft PR is all the best bits of #146 & #240 rebased on top of #244
See #146 for more info on the functionality and how to configure the service.
closes #146
closes #240