Skip to content

Scraper: OSM requests sometimes fails with Gateway Timeout #35

@aardjon

Description

@aardjon

Sometimes an Overpass request fails with the following output (from version 0.2.0):

[2026-03-13 19:38:27,107][INFO][trad.kernel.usecases] Executing filter 'OpenStreetMap' (stage 0)
[2026-03-13 19:38:27,108][DEBUG][trad.application.filters.source.osm] 'OpenStreetMap' filter started
[2026-03-13 19:38:27,108][DEBUG][trad.application.filters.source.osm] Retrieving ID for area 'Sächsische Schweiz' from Nominatim
[2026-03-13 19:38:27,351][DEBUG][trad.application.filters.source.osm] Querying elements within area ID 202443898 from Overpass, using filter {<_OsmObjectTypes.node: 'node'>: {'natural': 'peak', 'sport': 'climbing', 'climbing:trad': 'yes'}, <_OsmObjectTypes.relation: 'relation'>: {'type': 'site', 'climbing': 'crag', 'sport': 'climbing', 'climbing:trad': 'yes'}}
[2026-03-13 19:38:35,455][DEBUG][trad.application.filters.source.osm] Retrieved 1139 OSM elements
[2026-03-13 19:38:35,457][DEBUG][trad.application.filters.source.osm] Querying 135 IDs from Overpass, using filter {'natural': 'peak'}
Traceback (most recent call last):
  File "/home/aardjon/src/trad/scraper/src/trad/application/filters/source/osm.py", line 478, in retrieve_nodes_by_ids
    json_data = self._http_boundary.retrieve_json_resource(
        url=self._OVERPASS_API_ENDPOINT,
        url_params={},
        query_content=f"data={query}",
    )
  File "/home/aardjon/src/trad/scraper/src/trad/infrastructure/http_recorder.py", line 58, in retrieve_json_resource
    content = self._delegate.retrieve_json_resource(url, url_params, query_content)
  File "/home/aardjon/src/trad/scraper/src/trad/infrastructure/requests.py", line 59, in retrieve_json_resource
    response = self._retrieve_resource(
        url=url,
    ...<2 lines>...
        query_content=query_content,
    )
  File "/home/aardjon/src/trad/scraper/src/trad/infrastructure/requests.py", line 90, in _retrieve_resource
    raise HttpRequestError(f"HTTP error {response.status_code}: {response.reason}")
trad.application.boundaries.http.HttpRequestError: HTTP error 504: Gateway Timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aardjon/src/trad/scraper/src/scraper.py", line 6, in <module>
    main()
    ~~~~^^
  File "/home/aardjon/src/trad/scraper/src/trad/main.py", line 45, in main
    bootstrap.run_application()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/aardjon/src/trad/scraper/src/trad/main.py", line 126, in run_application
    usecase.produce_routedb()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/aardjon/src/trad/scraper/src/trad/kernel/usecases.py", line 38, in produce_routedb
    self.__run_filters_of_stage(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        stage_idx,
        ^^^^^^^^^^
    ...<2 lines>...
        output_pipe=pipe,
        ^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/aardjon/src/trad/scraper/src/trad/kernel/usecases.py", line 60, in __run_filters_of_stage
    current_filter.execute_filter(input_pipe, output_pipe)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aardjon/src/trad/scraper/src/trad/application/filters/_base.py", line 20, in execute_filter
    self._execute_source_filter(output_pipe)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/aardjon/src/trad/scraper/src/trad/application/filters/source/osm.py", line 209, in _execute_source_filter
    self.__retrieve_missing_nodes(osm_nodes, osm_relations)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aardjon/src/trad/scraper/src/trad/application/filters/source/osm.py", line 279, in __retrieve_missing_nodes
    for node in self._osm_api_receiver.retrieve_nodes_by_ids(
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        osm_ids=missing_nodes,
        ^^^^^^^^^^^^^^^^^^^^^^
        node_filter=self._OVERPASS_PEAK_NODE_TAGS,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/aardjon/src/trad/scraper/src/trad/application/filters/source/osm.py", line 484, in retrieve_nodes_by_ids
    raise DataRetrievalError("Overpass request failed") from e
trad.kernel.errors.DataRetrievalError: Overpass request failed

I don't think that the scraper exceeds the Overpass API request limitations. Often, it works after a few tries, so it may be due to heavy load on OSM. But it may also be caused by a too short time between follow-up requests, for example.

There are two timeout parameters: The HTTP and the Overpass one. Please play around with both of them to find a reliable value (combination).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions