Skip to content

add support for libpostal over http service, adapter pattern et al.#249

Draft
missinglink wants to merge 4 commits intomasterfrom
better-sqlite3-v2-plus-libpostal-service
Draft

add support for libpostal over http service, adapter pattern et al.#249
missinglink wants to merge 4 commits intomasterfrom
better-sqlite3-v2-plus-libpostal-service

Conversation

@missinglink
Copy link
Member

@missinglink missinglink commented Mar 17, 2020

this code is branched off #244, please merge that PR first!
▶️ see only the diff ◀️

this draft PR is all the best bits of #146 & #240 rebased on top of #244

See #146 for more info on the functionality and how to configure the service.

closes #146
closes #240

@orangejulius
Copy link
Member

orangejulius commented Mar 19, 2020

Okay, I noticed that the demo didn't work with this PR, so I tried to figure out why. It was pretty difficult and actually exposed an issue in the error handling for the server.

I was using Portland Metro data, and was using the demo to explore, which generated the following query:

http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint

This URL returns a 400 error, but also an empty JSON response, as shown below:

julian@rockaway ~/repos/pelias/interpolation $ curl -I "http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint"
HTTP/1.1 400 Bad Request
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
Content-Length: 2
ETag: W/"2-vyGp6PvFo4RvsFtPoIWeCReyIC8"
Date: Thu, 19 Mar 2020 19:49:27 GMT
Connection: keep-alive

julian@rockaway ~/repos/pelias/interpolation $ curl "http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint"
{}

Diving deper, I took a look at the relevant express code:

conn.extract.query( point, names, function( err, data ){
if( err ){ return res.status(400).json( err ); }
if( !data ){ return res.status(200).json({}); }
res.json( pretty.geojson( data ) );
});

This code doesn't properly print the error to the response body, as JSON.stringify(err) ends up with an empty object.

Printing the error to stdout reveals the following:

interpolation_1  | SqliteError: near "/": syntax error
interpolation_1  |     at DynamicQueryCache.getStatement (/code/pelias/interpolation/query/DynamicQueryCache.js:70:28)
interpolation_1  |     at Object.module.exports [as extract] (/code/pelias/interpolation/query/extract.js:36:22)            
interpolation_1  |     at Object.q [as query] (/code/pelias/interpolation/api/extract.js:35:26)
interpolation_1  |     at processTicksAndRejections (internal/process/task_queues.js:94:5)

So there are two issues here:

  1. Higher priority: fix the error messaging for all endpoints.
  2. Improve this PR so it doesn't cause errors.

@missinglink missinglink force-pushed the better-sqlite3-v2-plus-libpostal-service branch from 705290c to 9db959d Compare March 20, 2020 14:58
@missinglink
Copy link
Member Author

missinglink commented Mar 20, 2020

Demo issues fixed in 6a5e084 and rebased to all 3 branches.

@missinglink missinglink force-pushed the better-sqlite3-v2-plus-libpostal-service branch 2 times, most recently from 449d1b4 to 538bc3c Compare March 20, 2020 15:18
@missinglink
Copy link
Member Author

Screenshot 2020-03-20 at 15 53 55

@orangejulius orangejulius force-pushed the better-sqlite3-v2-plus-libpostal-service branch from 538bc3c to de9206b Compare March 30, 2020 15:25
@orangejulius
Copy link
Member

I just did a quick rebase of this now that #244 is merged. We can continue to test this out and hopefully merge it soon!

@orangejulius
Copy link
Member

Well its been a while for this one but after some of the cleanup and maintenance work we've been doing in the interpolation repository, I took another look at this.

With all the better-sqlite3 stuff long since merged, this PR is a lot simpler now. I've rebased it on top of the latest changes from the master branch and it was all straightforward.

The development workflow for updating the mock libpostal responses went smoothly (set up libpostal and configure the service entry in pelias.json, run SEED_MOCK_LIBPOSTAL=true npm run ci, tests shoudl pass after that), and everything else seems to still work.

Maybe we take a look at getting this merged, since it's been quite helpful for avoiding the extra memory usage of having libpostal data loaded twice.

My current questions are:

  • Will an interpolation build in the docker projects still work? Will it run a lot slower using a remote libpostal service?
  • What should we do around messaging that a new configuration section is now recommended? Maybe add some deprecation notices to pelias/config? We'd want the Pelias API to start preferring the new services section too.
  • Should we add a pelias libpostal wait command or something similar to the docker CLI, and update the documentation on when to use it? Without something like that, running pelias compose up libpostal; pelias prepare interpolation might fail as the libpostal service takes a few seconds to start (though nothing like elasticsearch)

@missinglink
Copy link
Member Author

missinglink commented Nov 8, 2021

I agree it would be great to merge this functionality, since it provides a lot of flexibility going-forward.

Regarding which adapter is appropriate for default => (HTTP | NPM) is yet to be decided.

One of the complexities here is that this PR couples the adapter pattern code with a change to the default adapter to change it from NPM Module => HTTP.

So.. while I'm not totally against that, the HTTP and NPM Module adapters both have pros and cons, neither is a clearly superior method to the other, but I just wanted to note that its two changes in one, which is making this harder to merge.

Coupled with the unknown topic of performance (and consequently) build times it might be better to merge the code changes without changing the default adapter and then measuring the performance of different methods?

Alternatively we can use the existing code to measure the performance of a full planet build using HTTP and compare it to NPM Module which should give us some more confidence that changing the default adapter is the right choice, despite the additional complexity of network service discovery and I/O.

It might be worth running that build twice, since changes to how the NPM method is used in this PR vs. prior may also affect the performance of build times using the NPM method.

@missinglink missinglink force-pushed the better-sqlite3-v2-plus-libpostal-service branch from d46a824 to 77f8209 Compare March 11, 2025 14:12
@missinglink
Copy link
Member Author

I rebased this again today, agreed it would be nice to finally get it merged in, I'll try to get to it this week.

@orangejulius any additional thoughts/concerns I should be aware of before I get it ready for merging?

@orangejulius
Copy link
Member

Yeah, as I recall there are at least a couple reasons we held off merging it. They aren't insurmountable but they were enough that it was too much of a pain to actually merge all the way through.

Impact on Pelias Docker

Either we allow the interpolation service to continue using local libpostal in memory, or we need to set up some new steps similar to pelias elastic wait so that the libpostal service is guaranteed to be up and running before the interpolation service starts

Impact on interpolation builds

Interpolation builds are already slow enough, and as I recall they make a call to libpostal for every line in the address files read during the build. The overhead of HTTP calls probably would make it even slower, which we may not want

Interpolation Docker image considerations

Finally, as it stands now this PR changes the interpolation Docker image so that it does not depend on the libpostal-baseimage. That means that the docker images generated by this PR are incapable of loading libpostal in memory.

That's technically ideal from an efficiency standpoint if we know we don't need to load libpostal in memory. However, it reduces flexibility, and due to the nature of how the layers of Docker images work, it probably doesn't affect much in practice: as long as the same physical machine downloads both the libpostal-service and interpolation docker images, all the libpostal data is downloaded and using the same amount of disk space anyway. I think that situation will be the case in general, and certainly for us at Geocode Earth.

Path Forward

Thinking about it now, the easiest way to merge this PR might be to adjust it so that by default, the interpolation service uses libpostal in memory as it has in the master branch for all time. I think you suggested that in some earlier comments here. We would need some config value to tell the interpolation service to prefer using the remote libpostal service in some situations.

Right now, this PR just looks for the api.services.libpostal config option and requires it to be present and point to a valid running instance of the libpostal service. However, that config option will almost always be present since the API always needs the libpostal service for full functionality.

Sidenote: this PR started the convention of using the root level services property for configuration for the libpostal service and others in the future. This kinda makes sense in a world where more services have network dependencies than just the API, but it doesn't really make much difference at the moment.

We should think about options for what that config value is, it could be simple boolean like interpolation.use_libpostal_service: true, or maybe a string with multiple options: interpolation.libpostal_source: "memory/service/something else we have yet to invent"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants