-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
UPDATE 2025-10-10
libxml2 issue 715 was released with version 2.15.0.
Latest lxml 6.0.2 uses libxml2 2.14.6 so the fix is not yet mainstream.
import sys
from lxml import etree
consts = [ ('Python', '.'.join([str(sys.version_info[i]) for i in range(3)])),
('lxml.etree', etree.LXML_VERSION),
('libxml used', etree.LIBXML_VERSION),
('libxml compiled', etree.LIBXML_COMPILED_VERSION),
('libxslt used', etree.LIBXSLT_VERSION),
('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION)]
for c in consts:
print(f"{c[0]:<20s}: {c[1]}")
Python : 3.12.11
lxml.etree : (6, 0, 2, 0)
libxml used : (2, 14, 6)
libxml compiled : (2, 14, 6)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)
UPDATE 2025-07-19:
libxml2 issue 715 is fixed upstream but it may take sometime until it's released.
UPDATE:
Reported as libxml2 bug since xmllint shows same behavior.
https://gitlab.gnome.org/GNOME/libxml2/-/issues/715
Long element names may cause some duplicate and count errors while parsing
ERROR: duplicated path: /xbrli:xbrl/in-bse-cg:AggregateValueOfSecurityProvidedDuringSixMonthsOfSecurityInConnectionWithLoanOrAnyOtherD
...
ERROR: 0 elements found with /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE xpath expression.
Original xpath: /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE
Cause:
tree.getpath(ele) truncates the returned path to 110 characters.
Possible fix:
tree.getelementpath(ele)
Reproduce
Given an XML with namespaces
<root xmlns:ns="http://example.com/ns">
<ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>1</ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>
</root>
>>> from lxml import etree
>>> xtree = etree.parse('tmp.xml')
>>> root = xtree.getroot()
truncated path
>>> xtree.getpath(root.getchildren()[0])
'/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123'
complete path with tree.getelementpath()
>>> xtree.getelementpath(root.getchildren()[0])
'{http://example.com/ns}a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x'
While trying to report an lxml bug I found that it's an libxml2 bug as xmllint shows the same behavior
xmllint --shell tmp.xml
/ > du
/
root
ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/ > setrootns
/ > setns default=
/ > setns ns=http://example.com/ns
/ > whereis //ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working