Skip to content

Added fall-back to address for missing numbers#128

Open
RobertoRoos wants to merge 5 commits intowtfloris:masterfrom
RobertoRoos:127-numbers
Open

Added fall-back to address for missing numbers#128
RobertoRoos wants to merge 5 commits intowtfloris:masterfrom
RobertoRoos:127-numbers

Conversation

@RobertoRoos
Copy link

Resolves #127

This solution isn't terribly neat, patching the address like this. Better would be some kind of identifier column in the database, but to keep the changes to a minimum I think this is acceptable.

@wtfloris what do you think?

@RobertoRoos RobertoRoos marked this pull request as ready for review March 11, 2026 08:48
@wtfloris
Copy link
Owner

Thanks for the PR! Some comments:

Why do we need this change for other parsers than Pararius? I'd argue we only need it for websites that don't have house numbers.

I also strongly disagree with generalizing some parsing functions (into _marshal_address in this case): the entire point of parsers being completely independent units is that you can safely work on them without having to worry about some dependencies or unexpected effects. Especially since websites will change from time to time and parsers need modifications, you're shooting yourself in the foot complexity-wise if you introduce these cross-parser functions. To visualize:

A -> Fa(A) -> Ra
B -> Fb(B) -> Rb
C -> Fc(C) -> Rc

Now, if C changes, you only modify Fc and check if Rc is correct again.

If you do:

  A
   \
B - Fx(X) -> Rx
   /
  C

Sure, you cut down on some code duplication but exponentially increase the amount of complexity in Fx, and the amount of testing and risk if just one input changes.

As a concrete example from the PR, this is Pararius-specific:

        # Many titles are prefixed with a property type; strip when present
        parts = address_raw.split()
        ignore = {"appartement", "huis", "studio", "kamer", "woning", "woonhuis"}
        if parts and parts[0].lower() in ignore:
            address = " ".join(parts[1:]).strip()
        else:
            address = address_raw.strip()

It makes no sense to have that code anywhere but the Pararius parser: it adds nothing of value to other parsers and only has the potential to break them.

That argumentation was a bit longer than planned but I hope to provide some insight into why (I think) sometimes duplicated code is not a bad thing, especially if it reduces complexity by a substantial amount.

@RobertoRoos
Copy link
Author

Why do we need this change for other parsers than Pararius? I'd argue we only need it for websites that don't have house numbers.

Do we know for a fact that all other agencies always supply numbers? It seems to me that in all cases when the number is missing we want to improvise something.

Like you said, mentioning anything should always be better than mentioning nothing.

More centralized data handling is it's own discussion, my bad for mixing up the two. See: #129 (comment)

@RobertoRoos
Copy link
Author

RobertoRoos commented Mar 17, 2026

@wtfloris I reverted all divisive changes. It's really just Pararius that now uses a fallback number. I'd love to have this merged soon, as it's actively limiting my search personally now.

I think this is worth expanding on together with #129 .

The version as it was before can still be found at https://github.com/RobertoRoos/hestia/tree/127-numbers-universal

@wtfloris
Copy link
Owner

I'll have some time to test it this weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Listings without a full address (without the number) are ignored

2 participants