Skip to content

UTF-8 changes#1154

Merged
drgrice1 merged 1 commit into
openwebwork:developfrom
taniwallach:UTF8-replace-encode_utf8-and-decode_utf8
Mar 17, 2021
Merged

UTF-8 changes#1154
drgrice1 merged 1 commit into
openwebwork:developfrom
taniwallach:UTF8-replace-encode_utf8-and-decode_utf8

Conversation

@taniwallach
Copy link
Copy Markdown
Member

@taniwallach taniwallach commented Nov 2, 2020

So far I have not thoroughly tested the changes.

Motivation:

syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 86.
Caught a SIGsyswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 87.
PIPEsyswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 88.
 at syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 90.
-esyswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 91.
 line syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 92.
0syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 93.
syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at /usr/share/perl/5.26/sigtrap.pm line 119.

and replace encode_utf8($x) by Encode::decode("UTF-8",$x)
as the prior is not a proper conversion, and the documentation at
   https://perldoc.perl.org/Encode#encode_utf8
state that it should not be used for data exchange.

In several places use Encode::encode() explicitly, instead of just encode()
after a use Encode qw(encode).

Replace   binmode(STDOUT, ":utf8");
with      binmode(STDOUT, ":encoding(UTF-8)");
as Perl 5.30 makes sysread and syswrite on ":utf8" filehandles fatal
(it was deprecated in 5.24).
See: https://perldoc.perl.org/perl5300delta#Previously-deprecated-sysread()/syswrite()-on-:utf8-handles-is-now-fatal
@taniwallach taniwallach added Do Not Merge Yet PR to allow others to inspect -- not ready for prime time NeedsTesting Tentatively fixed bug or implemented feature Multilingual Part of the Multilingual (localization) project labels Nov 2, 2020
@taniwallach taniwallach added this to the WW 2.16 milestone Nov 2, 2020
Copy link
Copy Markdown
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I am not able to do any extensive testing on this as my usage rarely deals with anything that will really stress test the utf8 encoding and decoding.

I noticed that there is one instance of encode_utf8 that you did not convert to the Encode::encode("UTF-8", ...) method, and this is in lib/WeBWorK/EquationCache.pm. It may be it is not needed there. Just pointing this out.

Also, both Encode::decode_utf8 and Encode::encode_utf8 are used in PGcore.pm in the PG code. binmode(STDOUT, ":utf8") is also used there. Do those need to be changed also?

@taniwallach
Copy link
Copy Markdown
Member Author

I noticed that there is one instance of encode_utf8 that you did not convert to the Encode::encode("UTF-8", ...) method, and this is in lib/WeBWorK/EquationCache.pm. It may be it is not needed there. Just pointing this out.

I'll look into that one soon.


Also, both Encode::decode_utf8 and Encode::encode_utf8 are used in PGcore.pm in the PG code. binmode(STDOUT, ":utf8") is also used there. Do those need to be changed also?

Ideally yes, practically - probably not now.

WWSafe interferes with Encode::encode("UTF-8", ...).

@drgrice1
Copy link
Copy Markdown
Member

I didn't see that issue.

@taniwallach
Copy link
Copy Markdown
Member Author

I noticed that there is one instance of encode_utf8 that you did not convert to the Encode::encode("UTF-8", ...) method, and this is in lib/WeBWorK/EquationCache.pm. It may be it is not needed there. Just pointing this out.

I'll look into that one soon.

That is pg/lib/WeBWorK/EquationCache.pm and our experience is that WWSafe does not let Encode::encode("UTF-8", ...) work properly in PG.

The use is in my $md5 = md5_hex(encode_utf8($tex)); so it is only using Perl's lax UTF-8 to encode a string which is them passed into md5_hex. I do not think that any of the potential security risks related to Perl's lax UTF-8 code can cause any significant problems in such a setting.


From some research 2 years ago, the issues were related to the autoloading mechanisms used by Encode and that more modern versions of the regular Perl Safe.pm had changes to allow Encode to behave properly inside Safe.

See:

@taniwallach taniwallach requested review from pstaabp and removed request for heiderich and mgage March 11, 2021 23:36
@drgrice1
Copy link
Copy Markdown
Member

Yeah, that probably is fine. Furthermore, does anyone even use images for equations anymore? MathJax is so much better.

@mgage
Copy link
Copy Markdown
Member

mgage commented Mar 12, 2021 via email

@drgrice1
Copy link
Copy Markdown
Member

Yeah, I am not saying to remove the code for that. I am just saying it is probably not used much, and as such, it is probably not a serious concern if utf8 encoding is proper (if that is even needed for equations in any case).

I am betting that the instances in which students have trouble with the javascript working are becoming increasingly infrequent (at least in regards to MathJax). I have never had that happen for my students as far as I know.

@taniwallach
Copy link
Copy Markdown
Member Author

I periodically run into students who for one reason or another can’t get the javascript to work on their computer/browser.

I have run into a few such cases. Often asking the student to use a different browser than their default browser will overcome the problem. A few cooperative students who disabled browser extensions/plugins and bothered to respond after trying that reported success with extensions/plugins disabled. (There are several types of plugins which block content or modify content in ways the the user's don't really understand, but have some feature they like.) I have never got a student to test enough and write back telling me which extension/plugin caused their problem. The students typically just want to do the homework and move on.

@taniwallach
Copy link
Copy Markdown
Member Author

if utf8 encoding is proper (if that is even needed for equations in any case).

Submitted answers can contain UTF-8 character (even if they are not suitable), and almost everything gets rendered inside a math element in the "submitted answers" column in the results table. When using images equation display mode without the UTF-8 encoding an error page would be displayed (see the details in openwebwork/pg#443 (comment)), as the MD5 hash used to create file names cannot process Unicode characters from the internal Perl encoding of strings. The patch was made to encode into UTF-8, and the the image fails to be generated as LaTeX can't actually make the image so also fails appear in the rendered image, but the tool-top text does work. Having something fail to appear on screen, is far better than there being a WeBWorK error page.

@taniwallach taniwallach added essentially ready to merge and removed Do Not Merge Yet PR to allow others to inspect -- not ready for prime time NeedsTesting Tentatively fixed bug or implemented feature labels Mar 17, 2021
@drgrice1 drgrice1 merged commit 8d1fd45 into openwebwork:develop Mar 17, 2021
@taniwallach taniwallach deleted the UTF8-replace-encode_utf8-and-decode_utf8 branch April 21, 2021 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

essentially ready to merge Multilingual Part of the Multilingual (localization) project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants