Develop uft8 ver2 by mgage · Pull Request #798 · openwebwork/webwork2

mgage · 2017-07-31T18:11:38Z

This is experimental -- do not merge it yet. This is a summary of the the utf8 related changes I have made to get international characters to work. It has been successful to some extent and no longer fails on PGML material. The companion request for pg is #319.

It may well fail other test cases in which case I encourage you to report them to this pull request.

Here is a pg problem that can be used for initial testing. All of the bulgarian characters should
now appear properly.

##DESCRIPTION
##  Test problem 
##ENDDESCRIPTION


## DBsubject(WeBWorK)
## Date(7/30/2017)
## TitleText1('')
## AuthorText1('')
## EditionText1('')
## Section1('')
## Problem1('')
## KEYWORDS('test')

########################################################################

DOCUMENT();      

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "PGML.pl",
   #"source.pl",        # allows code to be displayed on certain sites.
   #"PGcourse.pl",      # Customization file for the course
);

# Print problem number and point value (weight) for the problem



TEXT($PAR, "bulgarian printed from within a TEXT() function:   ьеижъанч",$PAR);

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ -- it works");

BEGIN_TEXT
This is bulgarian printed from within BEGIN_TEXT/END_TEXT block
$PAR $HR
ьеижъан   it works
$HR $PAR
END_TEXT

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ -- it works");

BEGIN_PGML_SOLUTION
---
This is Bulgarian printed from within a pgml solution:  

ьеижъанч

It fails.
---
END_PGML_SOLUTION


BEGIN_PGML
---
This is bulgarian printed from within BEGIN_PGML/END_PGML block

pgml text ьеижъан

It also fails. 

---
And after PGML mode the TEXT doesn't work anymore either. 

END_PGML


TEXT($PAR, "from within a TEXT() function.  bulgarian:   ьеижъанч",$PAR);

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ and it no longer works.");

BEGIN_TEXT
This is bulgarian printed from within BEGIN_TEXT/END_TEXT block
$PAR $HR
ьеижъан   it still works
$HR $PAR
END_TEXT
ENDDOCUMENT();

… and mess up the database. Note: We do not need decode from thaw because as sequences of bytes nothing changes. (I think.)

…bled.)

…ch/webwork2 into locbug Conflicts: courses.dist/modelCourse/course.conf

… suggested by goehle

added [qw(Encode::Encoding)] to ${pg}{modules}) in defaults.config as…

…om/heiderich/webwork2 into locbug

…o locbug

…lop_uft8_ver2 # Conflicts: # lib/WeBWorK/ContentGenerator/Instructor/SendMail.pm # lib/WeBWorK/Utils.pm

mgage · 2017-07-31T18:14:49Z

Here are the changes I've made and the observations on how the question above works for me.

changes made to develop to add localization

creating develop_utf8_ver2

merge PR UTF8 Support #712 Homework Manager: Library Browser #278 (goehle::locbugs)
Additional utf8 entries for webwork2
- Apache/WeBWorK has use utf8 in addition to binmode(STDOUT, "utf8")
- Problem.pm has use utf8; this was essential to allowing bulgarian to be printed from begin_text/end_text segments.
  - use open ':encoding(utf8)'; and
  - binmode(STDOUT,":utf8") were not needed here and were not sufficient on their own without use utf8;
- PGproblemEditor3.pm needed open OUTPUTFILE ">:encoding(UTF-8)"
Cherrypick last commit on (heiderich:utf-8) which updates OPL and Tags.
testing code added
- math4/systems.template This is a bulgarian character test....
- Problem.pl line 1229,
  - output problem boddy -- more bulgarian
  - again ... (same bulgarian characters inside div())
  - pgml.pl ..bulgarian letters.. inside PGML.pl 788, pushText()
- Additional utf8 entries for PG
  - change read_whole_file to specify utf8 coding on input
  - use utf8 and binmode added to PGcore.pm
  - add "use utf8", "use v5.12", and binmode(STDOUT, ":utf8" to Translator.pm. At least the first of these was essential. Also add "<:encoding(utf8)" to "source_file" macro in Translator.pm
  - add ":utf8" to input for PGloadfiles.pl

behavior

Some bulgarian characters

print correctly inside TEXT()
print correctly inside BEGIN_TEXT/END_TEXT
print incorrectly inside BEGIN_SOLUTION_PGML/END_SOLUTION_PGML
print incorrectly inside BEGIN_PGML/END_PGML
no longer print correctly inside TEXT()
it does print correctly inside BEGIN_TEXT/END_TEXT
DEBUG_MESSAGE() prints correctly until a PGML segment is processes and afterwards it prints incorrectly

One other thing I noticed is that running the string through PG_restricted_eval(...) guarantees that it will print correctly, e.g. TEXT(PG_restricted_eval(...));
will work. This is why BEGIN_TEXT/END_TEXT prints correctly but TEXT() does not. PGML doesn't use PG_restricted_eval() at all.

If anyone has insights into the PGML failure please respond.

mgage · 2017-08-01T17:11:49Z

Bug squished!!!!! Adding use utf8; to WWSafe.pm fixed the behavior.

Now to figure out exactly what it is I squished. It almost certainly involves $safe->reval().
It's not clear why PG_restricted_eval code was protected. Unprotected TEXT() strings
were ok until after the PGML blocks were evaluated.

Does anyone else who has been using characters beyond the ASCII character set have any insights or experiences to add with getting WeBWorK to work with an extended character set?

dpvc · 2017-08-01T18:47:48Z

Does this mean it now works with PGML properly, or do I have to look into that yet?

mgage · 2017-08-01T19:13:09Z

It now works with PGML properly so you can take this off your agenda. I'm still puzzled as to why only PGML failed -- and why TEXT() succeeds before PGML is run but not afterward. It's partially related to the fact that PGML doesn't use PG_restricted_eval(). I'm going to test the configuration I have here more extensively and then I may try to figure out which extra "use utf8" pragmas were not needed.

dpvc · 2017-08-01T19:23:29Z

OK, sounds good. Thanks for figuring it out!

jutrembBDEB · 2017-08-03T20:46:01Z

I have merged this pull request on my develop branch and there is still some characters that are not showing correctly. I suspect a problem with mysql table. The charset was latin1_sweedish_ci on all the tables, but I changed some to utf8_general_ci for the course I was testing and restarted everything, and still, it's not showing correctly.

This is the userList page : (the name is suppose to be Stéphane)

This is the CourseTitle : (Portail d'aide étudiant)

heiderich · 2017-08-04T06:39:01Z

I still experience an encoding problem when the translation of the string "Your authentication failed. Please try again. Please speak with your instructor if you need help." contains non-ASCII characters. It it used in line 237 of lib/WeBWorK/Authen.pm.

heiderich · 2017-08-04T06:43:03Z

@jutrembBDEB When I use utf8 for the MySQL tables, then titles of newly created courses and names of newly added students display correctly. I would suspect a conversion issue from latin1_sweedish_ci to utf8.

mgage · 2017-08-04T12:03:31Z

I'm up against my deadline for packing for vacation so I'll be out of the loop for the next few weeks.

As Julie @jutrembBDEB points out we will have to figure out how to configure the mysql database. We will also have to figure out the procedure for converting existing courses to the new format.
There are still places where the extended characters aren't being recognized. @heiderich -- does putting "use utf8;" at the top of the Authen.pm file solve the problem you report?

Keep reporting anomalies like that and we'll tackle them one by one. Jason Aubrey should be back from vacation soon. He can help pull suggested merges into develop. He can also give you access to the Transifex account if that is helpful.

I'll be back around August 22.

-- Mike

jutrembBDEB · 2017-08-04T16:09:19Z

Everything seems to work fine now when I create a new course. As Mike said, it's the existing courses that is the problem.

I found another anomalie though. I added a extra library button in localOverrides file, it's call "Banque de problèmes libres". the characters doesn't show correctly in the SetMaker page.

heiderich · 2017-08-04T16:43:49Z

@mgage: Adding "use utf8;" at the top of the Authen.pm does not solve the problem I reported.

mgage · 2017-08-04T17:26:27Z

@jutrembBDEB --I'll work on it when I get back if you guys haven't fixed it already Keep a list of things you find. There may be some things in your previous PR (which you have since withdrawn) that didn't get fixed in this current pull request -- please add them back in.

@heiderich or @jutrembBDEB feel free to create to clone this branch on my site (mgage:develop_utf8_ver2 and create a ver3 with more fixes. We can review it together
and merge it with develop once I get back. See you around August 22.

jutrembBDEB · 2017-08-04T17:39:09Z

@heiderich : I found a solution to your problem :

To the lib/WeBWorK/ContentGenerator/Login.pm file, at the begining of the file, you add use Encode;

and arround line 188, under the line
my $authen_error = MP2 ? $r->notes->get("authen_error") : $r->notes("authen_error");

you add this new line : $authen_error = Encode::decode_utf8($authen_error);

It worked for me!

heiderich · 2017-08-04T17:45:41Z

Thank you @jutrembBDEB. This indeed fixes the problem.

heiderich · 2017-08-04T18:01:44Z

As proposed by @mgage I created a new pull request #800. I already added the fix proposed by @jutrembBDEB.

jutrembBDEB · 2017-08-04T19:48:15Z

There's seem to happen an extra conversion in utf8 with the commands addmessage and maketext. In the file ContentGenerator/Instructor/PGProblemEditor2.pm, some text that are called with a addgoodmessage or addmessage command have an "?" instead of an accent character :

I tested this action called by the line 1833 : $self->addgoodmessage($r->maketext("The set header for set [_1] has been renamed to '[_2]'.", $setName, $self->shortPath($outputFilePath))) ;

Here's what happened :

jutrembBDEB · 2017-08-05T20:15:42Z

I found a solution for my previous problem with the question mark character. It was a problem with the escape handler call in the file PGProblemEditor2.pm.

We have to replace all the uri_escape by uri_escape_utf8.

I know that the file PGProblemEditor3.pm calls the uri_escape, but I don't know if there is other files and if we need to change everything to uri_escape_utf8.

But for now, it solved my problem.

heiderich · 2017-08-06T11:18:12Z

uri_escape is used in the following files:

clients/uribase64_encode.pl
lib/WeBWorK/ContentGenerator/CourseAdmin.pm
lib/WeBWorK/ContentGenerator/Instructor/AchievementEditor.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor3.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor2.pm
lib/WeBWorK/ContentGenerator.pm

I replaced uri_escape by uri_escape_utf8 in all of these files expect the first one, because I am not sure how it is used and if I understand it correctly, uri_escape is only applied to a base64 string there, which should not contain any non-ASCII characters. Or am I mistaken?

I added a commit to my pull request #800 with these changes.

jutrembBDEB · 2017-08-09T13:47:23Z

I had an UTF8 problem with the simple.conf file that is saved when we modify the Config of a course. I have translated all the permission role in french and "nobody" is translated as "aucun rôle". So when I modified some options in the config that use "nobody", it's written in the simple.conf file. The problem happened when I changed the langage from french to english in the Config, the simple.conf that was saved didn't recognize the "ô" character.

So, in the file lib/WeBWorK/ContentGenerator/Instructor/Config.pm, I changed the 490 line:
if( open OUTPUTFILE, ">utf8:", $outputFilePath) {

There was also something else happening when I changed the langage from french to english. All the permission that was set to "aucun rôle", was modify as "guest" instead of "nobody". I found in the Config file the line responsible for that where there was missing a maketext call.

Here's the section of the code I modified (line 248 to 251)

    my $r = $self->{Module}->r;
return '' if($displayoldval eq $newval);
my $str = '$'. $varname . " = '$newval';\n";
    $str = '$'. $varname . " = undef;\n" if $newval eq $r->maketext('nobody');

I created a pull request on Heiderich develop_uft8_ver3 branch with these modifications.
#heiderich#1

mgage · 2018-07-23T03:59:01Z

This pull request has been replaced by pull request PR #800

This still has good discussion of the issues.

mgage · 2018-07-23T03:59:28Z

Closing this in favor of PR #800

goehle and others added 29 commits June 20, 2016 14:13

Tweaking how UTF8 encoding is done.

c7830d1

Adding more UTF8 support.

7f6a18a

Cleanup

5c20361

Added support for utf8 on xml

a54c760

Encoded results from freeze so they don't become long utf8 characters…

5c44802

… and mess up the database. Note: We do not need decode from thaw because as sequences of bytes nothing changes. (I think.)

Fix some broken css

fa1f954

small change.

ad37eb1

Added hardcopy support (assumin gyou have the fonts installed and ena…

0073969

…bled.)

Tracking down more open commands.

45f8bc1

corrected file encodings

7998565

Merge branch 'convert_to_utf8_encoding' of https://github.com/heideri…

eeb63a3

…ch/webwork2 into locbug Conflicts: courses.dist/modelCourse/course.conf

added [qw(Encode::Encoding)] to ${pg}{modules}) in defaults.config as…

e8b9947

… suggested by goehle

Merge pull request #14 from heiderich/Encode-error

ceb43be

added [qw(Encode::Encoding)] to ${pg}{modules}) in defaults.config as…

Merge branch 'add_maketext_calls_to_achievements' of https://github.c…

e1b5553

…om/heiderich/webwork2 into locbug

Merge branch 'locbug' of https://github.com/goehle/webwork2 into locbug

70ae45a

Merge branch 'develop' of https://github.com/openwebwork/webwork2 int…

80b663e

…o locbug

Merge branch 'develop' of https://github.com/openwebwork/webwork2 int…

f2b1564

…o locbug

Freeze/thaw to base64 because mysql fields are varchar

b0bfe73

update localization files.

200c5b8

Support for transition to freeze_base64

ea7147f

Polishing error handling

3d3795c

Misspelled method.

eaf0f2f

Merge branch 'locbug' of https://github.com/goehle/webwork2 into locbug

239790f

Used utf8::valid when I should have used utf8::is_utf8

1b083a0

Whoops.

6e2fbfd

Wrong maketext

e85eb00

Merge branch 'locbug' of https://github.com/goehle/webwork2 into deve…

054666f

…lop_uft8_ver2 # Conflicts: # lib/WeBWorK/ContentGenerator/Instructor/SendMail.pm # lib/WeBWorK/Utils.pm

local experimental changes

7541f5d

add utf-8 support to OPL-update (needs testing)

20c3237

add use utf8; to WWSafe.pm

4bee27e

Remove test messages from system.template and Problem.pm

bbec446

heiderich mentioned this pull request Aug 4, 2017

Develop uft8 ver3 #800

Merged

mgage closed this Jul 23, 2018

mgage deleted the develop_uft8_ver2 branch July 24, 2018 16:49

taniwallach mentioned this pull request Feb 21, 2020

UTF8 issue with permissions saved to simple.conf #1077

Closed

Uh oh!

Conversation

mgage commented Jul 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgage commented Jul 31, 2017

changes made to develop to add localization

creating develop_utf8_ver2

behavior

Uh oh!

mgage commented Aug 1, 2017

Uh oh!

dpvc commented Aug 1, 2017

Uh oh!

mgage commented Aug 1, 2017

Uh oh!

dpvc commented Aug 1, 2017

Uh oh!

jutrembBDEB commented Aug 3, 2017

Uh oh!

heiderich commented Aug 4, 2017

Uh oh!

heiderich commented Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgage commented Aug 4, 2017

Uh oh!

jutrembBDEB commented Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heiderich commented Aug 4, 2017

Uh oh!

mgage commented Aug 4, 2017

Uh oh!

jutrembBDEB commented Aug 4, 2017

Uh oh!

heiderich commented Aug 4, 2017

Uh oh!

heiderich commented Aug 4, 2017

Uh oh!

jutrembBDEB commented Aug 4, 2017

Uh oh!

jutrembBDEB commented Aug 5, 2017

Uh oh!

heiderich commented Aug 6, 2017

Uh oh!

jutrembBDEB commented Aug 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgage commented Jul 23, 2018

Uh oh!

mgage commented Jul 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mgage commented Jul 31, 2017 •

edited

Loading

heiderich commented Aug 4, 2017 •

edited

Loading

jutrembBDEB commented Aug 4, 2017 •

edited

Loading

jutrembBDEB commented Aug 9, 2017 •

edited

Loading