Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
a5c52d2
Fix freeing of the content model by making use of XML_FreeContentModel
hartwork Sep 10, 2025
93f1436
Merge pull request #106 from hartwork/fix-freeing-of-the-content-model
toddr Mar 14, 2026
2b25a4c
Support standard LIBS and INC options in Makefile.PL
toddr-bot Mar 16, 2026
5361c2b
Fix buffer overflow in parse_stream when filehandle has :utf8 layer
toddr-bot Mar 16, 2026
2ef086b
docs: add ERROR HANDLING section and improve parse error documentation
toddr-bot Mar 16, 2026
9ec4a9a
Add hint about unescaped characters for invalid token errors
toddr-bot Mar 16, 2026
11509e9
fix: parameter entity references in internal DTD subset no longer bre…
toddr-bot Mar 16, 2026
d50dcf0
Skip external DTD tests when expat lacks parameter entity support
toddr-bot Mar 16, 2026
0bcecfb
Merge pull request #114 from toddr-bot/koan.toddr.bot/fix-issue-51
toddr Mar 16, 2026
f096d4e
Merge pull request #112 from toddr-bot/koan.toddr.bot/fix-issue-54
toddr Mar 16, 2026
de1d6d4
Merge pull request #111 from toddr-bot/koan.toddr.bot/fix-issue-55
toddr Mar 16, 2026
6b291f4
Merge pull request #109 from toddr-bot/koan.toddr.bot/fix-issue-64
toddr Mar 16, 2026
89a7d8f
Merge pull request #108 from toddr-bot/koan.toddr.bot/fix-issue-65
toddr Mar 16, 2026
c1fde00
Add current_length method to XML::Parser::Expat
toddr-bot Mar 16, 2026
b2dc9ba
Merge pull request #113 from toddr-bot/koan.toddr.bot/fix-issue-53
toddr Mar 16, 2026
1f75267
fix: prevent current_byte overflow for large XML files on 32-bit perl
toddr-bot Mar 16, 2026
ba8a01c
Merge pull request #116 from toddr-bot/koan.toddr.bot/fix-issue-49
toddr Mar 16, 2026
51afb71
Merge pull request #117 from toddr-bot/koan.toddr.bot/fix-issue-48
toddr Mar 16, 2026
70ad8fe
fix: route character data after root element to Char handler
toddr-bot Mar 16, 2026
8453c86
Merge pull request #118 from toddr-bot/koan.toddr.bot/fix-issue-47
toddr Mar 16, 2026
23c9895
feat: add UseForeignDTD option for documents without DOCTYPE
toddr-bot Mar 16, 2026
af9daf0
Merge pull request #119 from toddr-bot/koan.toddr.bot/fix-issue-46
toddr Mar 16, 2026
4c9f902
fix: handle lexical filehandles in ExternEnt handler return values
toddr-bot Mar 16, 2026
f0d44c3
fix: escape all occurrences of quote characters in xml_escape
toddr-bot Mar 16, 2026
08dd37c
fix: off-by-one heap buffer overflow in st_serial_stack growth check
toddr-bot Mar 16, 2026
3eb9cc9
Merge pull request #122 from toddr-bot/koan.toddr.bot/fix-issue-39
toddr Mar 16, 2026
5b7285e
Merge pull request #121 from toddr-bot/koan.toddr.bot/fix-issue-41
toddr Mar 16, 2026
fd8e385
Merge pull request #120 from toddr-bot/koan.toddr.bot/fix-issue-44
toddr Mar 16, 2026
3fd68e7
fix: prevent position overflow for large files in line/column/error p…
toddr-bot Mar 16, 2026
c610474
Merge pull request #124 from toddr-bot/koan.toddr.bot/fix-issue-36
toddr Mar 16, 2026
4718cde
test: add encoding tests for windows-1251, koi8-r, windows-1255, and …
toddr-bot Mar 16, 2026
a4b6a3d
Expose expat security APIs: BillionLaughs and ReparseDeferral
toddr-bot Mar 14, 2026
dd74063
rebase: apply review feedback on #107
toddr-bot Mar 16, 2026
4501604
fix: propagate xpcroak errors in Subs style instead of swallowing them
toddr-bot Mar 16, 2026
c8eacff
rebase: apply review feedback on #115
toddr-bot Mar 16, 2026
f921291
Merge pull request #126 from toddr-bot/koan.toddr.bot/fix-issue-31
toddr Mar 16, 2026
86d10a1
feat: add G_VOID flag to all void-context perl_call_sv/method/pv calls
toddr-bot Mar 16, 2026
bb69f79
rebase: apply review feedback on #123
toddr-bot Mar 16, 2026
e0067c7
Merge pull request #115 from toddr-bot/koan.toddr.bot/fix-issue-50
toddr Mar 16, 2026
dbe7ccb
Merge pull request #123 from toddr-bot/koan.toddr.bot/fix-issue-38
toddr Mar 16, 2026
ed0529b
fix: set UTF-8 flag on sysid in ExternEnt handler and fix Debug style…
toddr-bot Mar 16, 2026
dee4511
test: add globref and lexical filehandle tests to astress.t
toddr-bot Mar 16, 2026
fa166b0
Merge pull request #128 from toddr-bot/koan.toddr.bot/fix-issue-28
toddr Mar 16, 2026
e391742
Merge pull request #127 from toddr-bot/koan.toddr.bot/fix-issue-30
toddr Mar 16, 2026
6b0acb8
test: add memory leak symtab test and fix astress.t auto-vivification…
toddr-bot Mar 16, 2026
e2684ba
fix: prevent symbol table auto-vivification in Expat::parse (GH#27)
toddr-bot Mar 16, 2026
e0ae221
test: add Debug style multibyte character regression test (GH#25)
toddr-bot Mar 16, 2026
bfa6d50
Merge pull request #130 from toddr-bot/koan.toddr.bot/fix-issue-25
toddr Mar 16, 2026
4eeeb1c
fix: skip -rpath on Mac OS X 10.4 and earlier (GH#103)
toddr-bot Mar 16, 2026
3630dac
fix: clean up MSVC assertlib .obj files on Windows (GH#100)
toddr-bot Mar 17, 2026
a3c198a
fix: improve "Couldn't find your C compiler" error message (GH#90)
toddr-bot Mar 17, 2026
5538a78
fix: use system tmpdir for temp files in Devel::CheckLib (GH#76)
toddr-bot Mar 17, 2026
cbc9b71
docs: document predefined entity expansion in Tree style (GH#74)
toddr-bot Mar 17, 2026
8517e60
test: add XMLDecl standalone value regression tests (GH#73)
toddr-bot Mar 17, 2026
9c28019
fix: XMLDecl handler now returns "yes"/"no" for standalone (GH#73)
toddr-bot Mar 17, 2026
7393470
fix: localize $_ in Style::Stream to avoid read-only modification (GH…
toddr-bot Mar 17, 2026
3b60e3a
test: add parse error context tests for ErrorContext enhancement (GH#70)
toddr-bot Mar 17, 2026
c76e92a
fix: enhance parse exceptions with XML context when ErrorContext is s…
toddr-bot Mar 17, 2026
ff8a2ff
Merge pull request #141 from toddr-bot/koan.toddr.bot/fix-issue-72
toddr Mar 17, 2026
1eef021
fix: move encoding maps from PERL5LIB to File::ShareDir (GH#71)
toddr-bot Mar 17, 2026
501f6cd
rebase: apply review feedback on #142
toddr-bot Mar 17, 2026
3a58d76
Merge pull request #143 from toddr-bot/koan.toddr.bot/fix-issue-70
toddr Mar 17, 2026
2506f4e
Merge pull request #142 from toddr-bot/koan.toddr.bot/fix-issue-71
toddr Mar 17, 2026
aaabe63
fix: add NoLWP to expat capability probes for consistent skip logic (…
toddr-bot Mar 17, 2026
6c409a3
fix: auto-detect multiarch library paths for expat (GH#69)
toddr-bot Mar 17, 2026
dfa6a62
rebase: apply review feedback on #144
toddr-bot Mar 17, 2026
00dddb4
Merge pull request #144 from toddr-bot/koan.toddr.bot/fix-issue-69
toddr Mar 17, 2026
6861a6f
Merge pull request #145 from toddr-bot/koan.toddr.bot/fix-issue-67
toddr Mar 17, 2026
bc9b676
fix: propagate LIBS/INC expat paths to Expat/Makefile.PL (GH#65)
toddr-bot Mar 17, 2026
073e87f
Merge pull request #140 from toddr-bot/koan.toddr.bot/fix-issue-73
toddr Mar 17, 2026
069f349
Merge pull request #146 from toddr-bot/koan.toddr.bot/fix-issue-65
toddr Mar 17, 2026
be7ace1
Merge pull request #139 from toddr-bot/koan.toddr.bot/fix-issue-74
toddr Mar 17, 2026
1ccf239
Merge pull request #138 from toddr-bot/koan.toddr.bot/fix-issue-76
toddr Mar 17, 2026
9301d11
docs: clarify Char handler splitting with example and test (GH#56)
toddr-bot Mar 17, 2026
79ddb9b
Merge pull request #136 from toddr-bot/koan.toddr.bot/fix-issue-90
toddr Mar 17, 2026
4fb35e3
Merge pull request #135 from toddr-bot/koan.toddr.bot/fix-issue-100
toddr Mar 17, 2026
ff846ec
Merge pull request #107 from toddr-bot/koan.toddr.bot/expose-expat-se…
toddr Mar 17, 2026
51fa40a
Merge pull request #132 from toddr-bot/koan.toddr.bot/fix-issue-103
toddr Mar 17, 2026
54a1ff4
test: add NoLWP and LWP-fallback regression tests (GH#101)
toddr-bot Mar 17, 2026
de0b560
fix: make LWP::UserAgent a recommended dependency, not required (GH#101)
toddr-bot Mar 17, 2026
186e5e5
rebase: apply review feedback on #134
toddr-bot Mar 17, 2026
7f248cb
Merge pull request #129 from toddr-bot/koan.toddr.bot/fix-issue-27
toddr Mar 17, 2026
9bafe06
Merge pull request #110 from toddr-bot/koan.toddr.bot/fix-issue-56
toddr Mar 17, 2026
eb1f823
fix: clean up compiler warnings in Expat.xs
toddr-bot Mar 17, 2026
a4cb170
Merge pull request #147 from toddr-bot/koan.toddr.bot/fix-issue-45
toddr Mar 17, 2026
82bf2f1
fix: use pkg-config to auto-detect expat in non-standard locations (G…
toddr-bot Mar 17, 2026
8da8a29
rebase: apply review feedback on #137
toddr-bot Mar 17, 2026
ec60c06
Merge pull request #137 from toddr-bot/koan.toddr.bot/fix-issue-83
toddr Mar 17, 2026
8fb8008
Merge pull request #134 from toddr-bot/koan.toddr.bot/fix-issue-101
toddr Mar 17, 2026
1d6a939
fix: use double quotes for variable interpolation in xpcarp() and set…
toddr-bot Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/testsuite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ jobs:
- name: perl -V
run: perl -V
- name: Makefile.PL
run: perl Makefile.PL EXPATLIBPATH="C:\strawberry\c\lib" EXPATINCPATH="C:\strawberry\c\include"
run: perl Makefile.PL INC="-IC:\strawberry\c\include" LIBS="-LC:\strawberry\c\lib -lexpat"
- name: make
run: gmake
- name: make test
Expand Down
206 changes: 180 additions & 26 deletions Expat/Expat.pm
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,19 @@ our $VERSION = '2.47';
our ( %Encoding_Table, @Encoding_Path, $have_File_Spec );

use File::Spec ();
use File::ShareDir ();

%Encoding_Table = ();
if ($have_File_Spec) {
@Encoding_Path = (
grep( -d $_,
map( File::Spec->catdir( $_, qw(XML Parser Encodings) ),
@INC ) ),
File::Spec->curdir
);
}
else {
@Encoding_Path = ( grep( -d $_, map( $_ . '/XML/Parser/Encodings', @INC ) ), '.' );
}

my $_share_dir;
eval { $_share_dir = File::ShareDir::dist_dir('XML-Parser') };

@Encoding_Path = (
( defined $_share_dir && -d $_share_dir ? ($_share_dir) : () ),
grep( -d $_,
map( File::Spec->catdir( $_, qw(XML Parser Encodings) ), @INC ) ),
File::Spec->curdir
);

XSLoader::load( 'XML::Parser::Expat', $VERSION );

Expand Down Expand Up @@ -67,6 +67,21 @@ sub new {
$self, $args{ProtocolEncoding},
$args{Namespaces}
);

if ( defined $args{BillionLaughsAttackProtectionMaximumAmplification} ) {
$self->billion_laughs_attack_protection_maximum_amplification(
$args{BillionLaughsAttackProtectionMaximumAmplification}
);
}
if ( defined $args{BillionLaughsAttackProtectionActivationThreshold} ) {
$self->billion_laughs_attack_protection_activation_threshold(
$args{BillionLaughsAttackProtectionActivationThreshold}
);
}
if ( defined $args{ReparseDeferralEnabled} ) {
$self->reparse_deferral_enabled( $args{ReparseDeferralEnabled} );
}

$self;
}

Expand All @@ -77,11 +92,7 @@ sub load_encoding {
$file .= '.enc' unless $file =~ /\.enc$/;
unless ( $file =~ m!^/! ) {
foreach (@Encoding_Path) {
my $tmp = (
$have_File_Spec
? File::Spec->catfile( $_, $file )
: "$_/$file"
);
my $tmp = File::Spec->catfile( $_, $file );
if ( -e $tmp ) {
$file = $tmp;
last;
Expand Down Expand Up @@ -115,7 +126,7 @@ sub setHandlers {
while (@handler_pairs) {
my $type = shift @handler_pairs;
my $handler = shift @handler_pairs;
croak 'Handler for $type not a Code ref'
croak "Handler for $type not a Code ref"
unless ( !defined($handler) or !$handler or ref($handler) eq 'CODE' );

my $hndl = $self->{_Setters}->{$type};
Expand Down Expand Up @@ -148,7 +159,7 @@ sub xpcarp {

my $eclines = $self->{ErrorContext};
my $line = GetCurrentLineNumber( $_[0]->{Parser} );
$message .= ' at line $line';
$message .= " at line $line";
$message .= ":\n" . $self->position_in_context($eclines)
if defined($eclines);
carp $message;
Expand Down Expand Up @@ -196,6 +207,13 @@ sub current_byte {
}
}

sub current_length {
my $self = shift;
if ( $self->{_State_} == 1 ) {
return GetCurrentByteCount( $self->{Parser} );
}
}

sub base {
my ( $self, $newbase ) = @_;
my $p = $self->{Parser};
Expand Down Expand Up @@ -400,10 +418,10 @@ sub xml_escape {
$text =~ s/>/\>/g;
}
elsif ( $_ eq '"' ) {
$text =~ s/\"/\"/;
$text =~ s/\"/\"/g;
}
elsif ( $_ eq "'" ) {
$text =~ s/\'/\'/;
$text =~ s/\'/\'/g;
}
else {
my $rep = '&#' . sprintf( 'x%X', ord($_) ) . ';';
Expand All @@ -426,6 +444,44 @@ sub skip_until {
}
}

################
# Security API methods (require sufficiently recent libexpat)

sub billion_laughs_attack_protection_maximum_amplification {
my ( $self, $factor ) = @_;
croak "Usage: \$parser->billion_laughs_attack_protection_maximum_amplification(\$factor)"
unless defined $factor;
unless ( defined &SetBillionLaughsAttackProtectionMaximumAmplification ) {
croak "SetBillionLaughsAttackProtectionMaximumAmplification not available"
. " (requires libexpat >= 2.4.0 built with XML_DTD)";
}
SetBillionLaughsAttackProtectionMaximumAmplification( $self->{Parser}, $factor );
}

sub billion_laughs_attack_protection_activation_threshold {
my ( $self, $threshold ) = @_;
croak "Usage: \$parser->billion_laughs_attack_protection_activation_threshold(\$threshold)"
unless defined $threshold;
unless ( defined &SetBillionLaughsAttackProtectionActivationThreshold ) {
croak "SetBillionLaughsAttackProtectionActivationThreshold not available"
. " (requires libexpat >= 2.4.0 built with XML_DTD)";
}
SetBillionLaughsAttackProtectionActivationThreshold( $self->{Parser}, $threshold );
}

sub reparse_deferral_enabled {
my ( $self, $enabled ) = @_;
croak "Usage: \$parser->reparse_deferral_enabled(\$enabled)"
unless defined $enabled;
unless ( defined &SetReparseDeferralEnabled ) {
croak "SetReparseDeferralEnabled not available"
. " (requires libexpat >= 2.6.0)";
}
SetReparseDeferralEnabled( $self->{Parser}, $enabled ? 1 : 0 );
}

################

sub release {
my $self = shift;
ParserRelease( $self->{Parser} );
Expand Down Expand Up @@ -458,7 +514,19 @@ sub parse {
require IO::Handle;
eval {
no strict 'refs';
$ioref = *{$arg}{IO} if defined *{$arg};
if ( ref $arg eq 'GLOB' ) {

# Glob reference not recognized as IO::Handle
$ioref = *{$arg}{IO};
}
elsif ( $arg =~ /\A[^\W\d]\w*(?:::\w+)*\z/
&& defined *{$arg} )
{
# Bareword filehandle name — only look up if it could be
# a valid Perl identifier, to prevent auto-vivification
# of symbol table entries for XML strings. (GH#27)
$ioref = *{$arg}{IO};
}
};
if ( ref($ioref) eq 'FileHandle' ) {

Expand All @@ -477,13 +545,21 @@ sub parse {
$prev_rs = $ioclass->input_record_separator("\n$delim\n")
if defined($delim);

$result = ParseStream( $parser, $ioref, $delim );
eval { $result = ParseStream( $parser, $ioref, $delim ) };

$ioclass->input_record_separator($prev_rs)
if defined($delim);
}
else {
$result = ParseString( $parser, $arg );
eval { $result = ParseString( $parser, $arg ) };
}

if ($@) {
# Preserve reference exceptions (e.g. objects thrown by handlers)
die $@ if ref $@;
# For string exceptions, add XML context when ErrorContext is set
$self->xpcroak($@) if defined $self->{ErrorContext};
die $@;
}

$self->{_State_} = 2;
Expand Down Expand Up @@ -773,11 +849,52 @@ Unless standalone is set to "yes" in the XML declaration, setting this to
a true value allows the external DTD to be read, and parameter entities
to be parsed and expanded.

=item * UseForeignDTD

When set to a true value, this option tells expat to call the ExternEnt
handler even for documents that do not have a DOCTYPE declaration. This
allows the application to provide a DTD for validation and entity
definitions. In this case, the ExternEnt handler will be called with
both the system ID and public ID set to undef. This option should be
used together with ParseParamEnt.

=item * Base

The base to use for relative pathnames or URLs. This can also be done by
using the base method.

=item * BillionLaughsAttackProtectionMaximumAmplification

Sets the maximum amplification factor for the Billion Laughs attack
protection. This limits how many times larger the output of entity
expansion can be relative to the input. For example, a value of 100.0
means the parser will abort if entity expansion would produce output more
than 100 times the size of the input.

Requires libexpat E<gt>= 2.4.0 built with C<XML_DTD>. Will C<croak> at
runtime if the underlying C function is not available.

=item * BillionLaughsAttackProtectionActivationThreshold

Sets the activation threshold (in bytes) for the Billion Laughs attack
protection. The amplification limit only kicks in after the parser has
processed this many bytes of output from entity expansion. This prevents
false positives on small documents that happen to have a high
amplification ratio.

Requires libexpat E<gt>= 2.4.0 built with C<XML_DTD>. Will C<croak> at
runtime if the underlying C function is not available.

=item * ReparseDeferralEnabled

When set to a true value, enables reparse deferral. When set to a false
value (e.g. C<0>), disables it. Reparse deferral is a security mechanism
in expat that defers reparsing of unfinished tokens until more input
arrives, preventing certain XML-based attacks.

Requires libexpat E<gt>= 2.6.0. Will C<croak> at runtime if the
underlying C function is not available.

=back

=item setHandlers(TYPE, HANDLER [, TYPE, HANDLER [...]])
Expand Down Expand Up @@ -933,9 +1050,9 @@ including any internal or external DTD declarations.

This handler is called for XML declarations. Version is a string containing
the version. Encoding is either undefined or contains an encoding string.
Standalone is either undefined, or true or false. Undefined indicates
that no standalone parameter was given in the XML declaration. True or
false indicates "yes" or "no" respectively.
Standalone is either undefined, or the string C<"yes"> or C<"no">.
Undefined indicates that no standalone parameter was given in the XML
declaration.

=back

Expand Down Expand Up @@ -1026,6 +1143,12 @@ Returns the column number of the current position of the parse.

Returns the current position of the parse.

=item current_length

Returns the byte length of the current event. This is useful in conjunction
with current_byte to determine the exact byte range of an event in the
original XML document.

=item base([NEWBASE]);

Returns the current value of the base for resolving relative URIs. If
Expand Down Expand Up @@ -1077,6 +1200,37 @@ been set, then this is the first tag that the start handler will see
after skip_until has been called.


=item billion_laughs_attack_protection_maximum_amplification(FACTOR)

Sets the maximum amplification factor for the Billion Laughs attack
protection. FACTOR is a floating-point number (e.g. C<100.0>).

$parser->billion_laughs_attack_protection_maximum_amplification(100.0);

Requires libexpat E<gt>= 2.4.0 built with C<XML_DTD>. Will C<croak> if
the underlying C API is not available.

=item billion_laughs_attack_protection_activation_threshold(THRESHOLD)

Sets the activation threshold (in bytes) for the Billion Laughs attack
protection. THRESHOLD is an unsigned integer.

$parser->billion_laughs_attack_protection_activation_threshold(1_000_000);

Requires libexpat E<gt>= 2.4.0 built with C<XML_DTD>. Will C<croak> if
the underlying C API is not available.

=item reparse_deferral_enabled(ENABLED)

Enables or disables reparse deferral. ENABLED is a boolean (true to
enable, false to disable).

$parser->reparse_deferral_enabled(0); # disable
$parser->reparse_deferral_enabled(1); # enable

Requires libexpat E<gt>= 2.6.0. Will C<croak> if the underlying C API
is not available.

=item position_in_context(LINES)

Returns a string that shows the current parse position. LINES should be
Expand Down
Loading
Loading