Skip to content

read functions should return insightful errors #342

@tlukkezen

Description

@tlukkezen

The read_cpt and read_bore functions have some "automagical" logic that infers the content of the file argument. The user can provide an object of types io.BytesIO | Path | str and with "engine"="auto", the content type is inferred automatically. This can result in confusing errors when erroneous input is provided.

Some examples:

Providing a non-existing path results in XMLSyntaxError

Input:

from pygef import read_cpt
read_cpt(file="non/existing/file.gef")

Response:

lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

The expectation is to get a FileNotFoundError

Providing a non-existing path and engine="gef" results in ValueError

Input:

from pygef import read_cpt
read_cpt(file="non/existing/file.gef", engine="gef")

Response:

ValueError: The selected gef file is not a cpt. Check the REPORTCODE or the PROCEDURECODE.

The expectation is to get a FileNotFoundError

Providing an erroneous gef file results in XMLSyntaxError while gef can be parsed when forced

Input:

from pygef import read_cpt
read_cpt(file="path/to/erroneous.GEF")

Response:

lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

Input:

from pygef import read_cpt
read_cpt(file="path/to/erroneous.GEF", engine="gef")

Response:

CPTData(bro_id=None, research_report_date=None, ...

The expectation is to get an error that the gef file is invalid, and this response should be consistent no matter the value for engine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions