Skip to content

New develop candidate multilingual#390

Closed
mgage wants to merge 18 commits into
openwebwork:developfrom
mgage:new_develop_candidate_multilingual
Closed

New develop candidate multilingual#390
mgage wants to merge 18 commits into
openwebwork:developfrom
mgage:new_develop_candidate_multilingual

Conversation

@mgage
Copy link
Copy Markdown
Member

@mgage mgage commented Jan 10, 2019

Merge this PR which contains only the utf8 changes before merging PR #386

This is cleaner. It still adds some t/..... unit test files but they can be ignored for now. They won't interfere with anything else for now. We can sanity check the unit test files after we merge #386.

goehle and others added 16 commits June 20, 2016 20:57
create records in PG->{flags} with settings which will influence the
HTML lang and dir attributes set for the HTML element containing the
problem. This allows proper detection of the language of the problem in
the browser, when it is not the primary course language, and to override
the direction for cases when a LTR problem is being viewed/assigned in a
course in a RTL course, or visa-versa. The flag values set in the problem
are processed in the subroutine output_problem_lang_and_dir() in
	webwork2/lib/WeBWorK/ContentGenerator/Problem.pm
if there is no override set in the course configuration.
set flags for the problem language and textdirection which can be accessed
inside the webwork2 code to allow using this data to set the HTML lang and
dir tags as needed on DIV elements which envelop the problem text.
…ction_to_PG_flags

Add lang and direction to pg flags
@mgage
Copy link
Copy Markdown
Member Author

mgage commented Jan 28, 2019

I have removed the unit tests in t/ so that they don't confuse checking utf8. I have also moved updates to tableau.pl and other features to the pull requests #394. All of the remaining files
have some reference to utf8 or to multilingual capabilities

@taniwallach
Copy link
Copy Markdown
Member

I ran some tests with this version of PG and a modified version of openwebwork/webwork2#927 which makes changes to use a database using utf8mb4 encoding. Basic testing works fine. See the comments to that PR.

@taniwallach
Copy link
Copy Markdown
Member

Getting the perl Encode module working in WWSafe sees to be an issue which led to some of the changes in how PG files are read in.

This comment is mainly for the history, unless someone is ready to hack at WWSafe or related code to try to solve the underlying issue.

This issue seems to be the reason why some places (in particular the includePGproblem() subroutine from lib/WeBWorK/PG/IO.pm) at present needs to use

        open(INPUT, "<:utf8", $filePath)

which is considered "less secure" that the alternative which depends on the Encode module:

        open(INPUT, "<:encoding(utf8)", $filePath)

This is a matter for further discussion in the future.

The material below is based an email discussion with @mgage:


  1. I ran into a problem which may relate to the WWSafe evaluation environment and the UTF8 code. I'm out of my depth about this, as I don't really have any idea of how this part of the code works.

includePGproblem() was updated in this branch (mgage:new_develop_candidate_multilingual) to use

        open(INPUT, "<:utf8", $filePath)

instead of

        open(INPUT, "<:encoding(utf8)", $filePath)

as was the case in the older mgage:develop_candidate branch.

This change lets problems which use includePGproblem() work, as they failed under the option which used the <:encoding(utf8) style code. The error messages were:

Warning messages
ERRORS in rendering problem: 8 
| Library/FortLewis/DiffEq/1-First-order/01-Integrals-as-solutions/Lebl-1-1-06.pg| 
ERRORS from evaluating PG file:
Undefined subroutine &Safe::Root19::Encode::find_encoding called at [PG]/lib/WeBWorK/PG/IO.pm line 142
Died within WeBWorK::PG::IO::read_whole_file called at line 131 of [PG]/lib/WeBWorK/PG/IO.pm
from within WeBWorK::PG::IO::read_whole_problem_file called at line 917 of [PG]/macros/PG.pl
from within main::includePGproblem called at line 10 of (eval 2972)

Based on the error messages (included below), maybe the problem was just that the Perl Encode
module is blocked in the "safe" environment?

If so, maybe a better (safer) fix would be to allow use of the Encode module inside the "safe" environment (if that is feasible).

I tried to see if I could figure out some method to allow the
open(INPUT, "<:encoding(utf8)", $filePath)
version to work.

I tried several things (discussed below) and was not successful. I kept getting either error
messages in the warnings or getting to a state that the page would not load at all.

The only other idea I have to try to get Encode working would be to allow the read_whole_file() to run somehow outside of the WWSafe environment when a PG file is being parsed, so it could use the Encode module without getting Encode to work inside WWSafe.

  1. Why do I think we may prefer to keep the use of <:encoding(utf8)?

Having looked at: https://stackoverflow.com/questions/14566460/how-differs-the-open-pragma-with-different-utf8 it seems that <:encoding(utf8) is probably better than just "<:utf8" but that the safest option is probably <:encoding(UTF-8):

        "using ":utf8" for input can sometimes result in security
         breaches, so please use ":encoding(UTF-8)" instead."

Two more links on :utf8 vs. :encoding(UTF-8):

Note: The down-side seems to be that the strict UTF-8 is likely to make trouble with some special characters (ex. the copyright symbol in 8-bit latin1) which work for Perl's liberal utf8 and not the strict UTF-8.

However, even <:encoding(utf8) which will allow later changing to <:encoding(UTF-8) is probably better then the current fix of <:utf8 if we can get it to work.

  1. There is another file which reads files in the first manner:
  • lib/WeBWorK/PG/Translator.pm
    see lines 393 and 529:
        lib/WeBWorK/PG/Translator.pm:393:
        open(MACROFILE, "<:encoding(utf8)", $filePath)
                || die "Cannot open file: $filePath";

part of pre_load_macro_files() which as far as I can tell from webwork2/lib/WeBWorK/PG/Local.pm was used only until 2010 and is not no longer used.

        lib/WeBWorK/PG/Translator.pm:529:
        if ( open(SOURCEFILE, "<:encoding(utf8)", $filePath) ) {

part of a source_file() subroutine in PG which does not seem to be used anywhere I could find.

In any case, it seems that if we don't want this to later break something, unless we can get Encode working they should also use <:utf8, at least for now.

  1. From what I understood from the Wiki pages on GitHub, and webwork2/lib/WeBWorK/PG/Local.pm it seems to me that the main text of a PG problem is read in from the file in webwork2/lib/WeBWorK/PG/Local.pm and then pushed into PG via a call to
        $translator->source_string( $source )

this seems to explain why until I looked at a problem using includePGproblem() no issue about the Encode module not being in the "safe" environment was triggered.

  1. I tried putting back
        open(INPUT, "<:encoding(utf8)", $filePath)

or

         open(INPUT, "<:encoding(utf-8-strict)", $filePath) 

in lib/WeBWorK/PG/IO.pm and adding slowly some/all of the following to the $default_share
in webwork2/lib/WWSafe.pm:

    &Encode::find_encoding
    &Encode::getEncoding
    &Encode::decode
    &Encode::decode_utf8
    &Encode::bytes2str
    &Encode::Encoding::needs_lines

and also tried adding modules to ${pg}{modules} in webwork2/conf/defaults.config:

        [qw(Encode)],
        [qw(Encode::Alias Encode::utf8 Encode::UTF_EBCDIC Encode::Internal
        Encode::XS Encode::Byte Encode::Config Encode::MyEncoding Encode::Guess
        )],

At once time, it seemed to be getting close, but was complaining about utf8 not supporting needs_lines, and several times I saw messages complaining about not finding a "decode" method in Encode::utf8 .

In another attempt I tried allowing &Encode::Alias::find_alias (or something like that) and it complained about "require" not being found.

  1. While looking into this, I noticed that on http://perldoc.perl.org/Safe.html it mentions that
        Since it is only at the compilation stage that the operator mask
        applies, controlled access to potentially unsafe operations can
        be achieved by having a handle to a wrapper subroutine (written
        outside the compartment) placed into the compartment. For example,
        ...

Maybe that is an approach to allowing a "read_whole_file()" subroutine access to the Encode module.

  1. Another thing I noticed is that on my machine, the plain Perl Safe module is newer (version 2.39) and somewhat different from what is in the WWSafe.pm module (based on Safe version 2.16). . In the newer file there is some discussion of utf8::SWASHNEW and why it is important inside Safe, but I don't feel that comfortable fiddling with WWSafe.pm.

The page https://perldoc.perl.org/PerlIO/encoding.html about the PerlIO::encoding module may also be of some help.

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Mar 12, 2019

this pr has been replaced by pr #402 which addresses the utf8 issues to some degree

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Mar 12, 2019

closing #390 in favor of #402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants