Skip to content

Develop uft8 ver2#798

Closed
mgage wants to merge 31 commits into
openwebwork:developfrom
mgage:develop_uft8_ver2
Closed

Develop uft8 ver2#798
mgage wants to merge 31 commits into
openwebwork:developfrom
mgage:develop_uft8_ver2

Conversation

@mgage
Copy link
Copy Markdown
Member

@mgage mgage commented Jul 31, 2017

This is experimental -- do not merge it yet. This is a summary of the the utf8 related changes I have made to get international characters to work. It has been successful to some extent and no longer fails on PGML material. The companion request for pg is #319.

It may well fail other test cases in which case I encourage you to report them to this pull request.

Here is a pg problem that can be used for initial testing. All of the bulgarian characters should
now appear properly.

##DESCRIPTION
##  Test problem 
##ENDDESCRIPTION


## DBsubject(WeBWorK)
## Date(7/30/2017)
## TitleText1('')
## AuthorText1('')
## EditionText1('')
## Section1('')
## Problem1('')
## KEYWORDS('test')

########################################################################

DOCUMENT();      

loadMacros(
   "PGstandard.pl",     # Standard macros for PG language
   "MathObjects.pl",
   "PGML.pl",
   #"source.pl",        # allows code to be displayed on certain sites.
   #"PGcourse.pl",      # Customization file for the course
);

# Print problem number and point value (weight) for the problem



TEXT($PAR, "bulgarian printed from within a TEXT() function:   ьеижъанч",$PAR);

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ -- it works");

BEGIN_TEXT
This is bulgarian printed from within BEGIN_TEXT/END_TEXT block
$PAR $HR
ьеижъан   it works
$HR $PAR
END_TEXT

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ -- it works");

BEGIN_PGML_SOLUTION
---
This is Bulgarian printed from within a pgml solution:  

ьеижъанч

It fails.
---
END_PGML_SOLUTION


BEGIN_PGML
---
This is bulgarian printed from within BEGIN_PGML/END_PGML block

pgml text ьеижъан

It also fails. 

---
And after PGML mode the TEXT doesn't work anymore either. 

END_PGML


TEXT($PAR, "from within a TEXT() function.  bulgarian:   ьеижъанч",$PAR);

DEBUG_MESSAGE("this is bulgarian in the debug messages  еияащнвЧ and it no longer works.");

BEGIN_TEXT
This is bulgarian printed from within BEGIN_TEXT/END_TEXT block
$PAR $HR
ьеижъан   it still works
$HR $PAR
END_TEXT
ENDDOCUMENT();        

goehle and others added 29 commits June 20, 2016 14:13
… and mess up the database. Note: We do not need decode from thaw because as sequences of bytes nothing changes. (I think.)
…ch/webwork2 into locbug

Conflicts:
	courses.dist/modelCourse/course.conf
added [qw(Encode::Encoding)] to ${pg}{modules}) in defaults.config as…
…lop_uft8_ver2

# Conflicts:
#	lib/WeBWorK/ContentGenerator/Instructor/SendMail.pm
#	lib/WeBWorK/Utils.pm
@mgage
Copy link
Copy Markdown
Member Author

mgage commented Jul 31, 2017

Here are the changes I've made and the observations on how the question above works for me.

changes made to develop to add localization

creating develop_utf8_ver2

  • merge PR UTF8 Support #712 Homework Manager: Library Browser #278 (goehle::locbugs)

  • Additional utf8 entries for webwork2

    • Apache/WeBWorK has use utf8 in addition to binmode(STDOUT, "utf8")
    • Problem.pm has use utf8; this was essential to allowing bulgarian to be printed from begin_text/end_text segments.
      • use open ':encoding(utf8)'; and
      • binmode(STDOUT,":utf8") were not needed here and were not sufficient on their own without use utf8;
    • PGproblemEditor3.pm needed open OUTPUTFILE ">:encoding(UTF-8)"
  • Cherrypick last commit on (heiderich:utf-8) which updates OPL and Tags.

  • testing code added

    • math4/systems.template This is a bulgarian character test....
    • Problem.pl line 1229,
      • output problem boddy -- more bulgarian
      • again ... (same bulgarian characters inside div())
      • pgml.pl ..bulgarian letters.. inside PGML.pl 788, pushText()
    • Additional utf8 entries for PG
      • change read_whole_file to specify utf8 coding on input
      • use utf8 and binmode added to PGcore.pm
      • add "use utf8", "use v5.12", and binmode(STDOUT, ":utf8" to Translator.pm. At least the first of these was essential. Also add "<:encoding(utf8)" to "source_file" macro in Translator.pm
      • add ":utf8" to input for PGloadfiles.pl

behavior

Some bulgarian characters

  1. print correctly inside TEXT()
  2. print correctly inside BEGIN_TEXT/END_TEXT
  3. print incorrectly inside BEGIN_SOLUTION_PGML/END_SOLUTION_PGML
  4. print incorrectly inside BEGIN_PGML/END_PGML
  5. no longer print correctly inside TEXT()
  6. it does print correctly inside BEGIN_TEXT/END_TEXT
  7. DEBUG_MESSAGE() prints correctly until a PGML segment is processes and afterwards it prints incorrectly

One other thing I noticed is that running the string through PG_restricted_eval(...) guarantees that it will print correctly, e.g. TEXT(PG_restricted_eval(...));
will work. This is why BEGIN_TEXT/END_TEXT prints correctly but TEXT() does not. PGML doesn't use PG_restricted_eval() at all.

If anyone has insights into the PGML failure please respond.

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Aug 1, 2017

Bug squished!!!!! Adding use utf8; to WWSafe.pm fixed the behavior.

Now to figure out exactly what it is I squished. It almost certainly involves $safe->reval().
It's not clear why PG_restricted_eval code was protected. Unprotected TEXT() strings
were ok until after the PGML blocks were evaluated.

Does anyone else who has been using characters beyond the ASCII character set have any insights or experiences to add with getting WeBWorK to work with an extended character set?

@dpvc
Copy link
Copy Markdown
Member

dpvc commented Aug 1, 2017

Does this mean it now works with PGML properly, or do I have to look into that yet?

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Aug 1, 2017

It now works with PGML properly so you can take this off your agenda. I'm still puzzled as to why only PGML failed -- and why TEXT() succeeds before PGML is run but not afterward. It's partially related to the fact that PGML doesn't use PG_restricted_eval(). I'm going to test the configuration I have here more extensively and then I may try to figure out which extra "use utf8" pragmas were not needed.

@dpvc
Copy link
Copy Markdown
Member

dpvc commented Aug 1, 2017

OK, sounds good. Thanks for figuring it out!

@jutrembBDEB
Copy link
Copy Markdown
Contributor

I have merged this pull request on my develop branch and there is still some characters that are not showing correctly. I suspect a problem with mysql table. The charset was latin1_sweedish_ci on all the tables, but I changed some to utf8_general_ci for the course I was testing and restarted everything, and still, it's not showing correctly.

This is the userList page : (the name is suppose to be Stéphane)
userlist

This is the CourseTitle : (Portail d'aide étudiant)
coursetitle

@heiderich
Copy link
Copy Markdown
Member

I still experience an encoding problem when the translation of the string "Your authentication failed. Please try again. Please speak with your instructor if you need help." contains non-ASCII characters. It it used in line 237 of lib/WeBWorK/Authen.pm.

@heiderich
Copy link
Copy Markdown
Member

heiderich commented Aug 4, 2017

@jutrembBDEB When I use utf8 for the MySQL tables, then titles of newly created courses and names of newly added students display correctly. I would suspect a conversion issue from latin1_sweedish_ci to utf8.

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Aug 4, 2017

I'm up against my deadline for packing for vacation so I'll be out of the loop for the next few weeks.

  1. As Julie @jutrembBDEB points out we will have to figure out how to configure the mysql database. We will also have to figure out the procedure for converting existing courses to the new format.

  2. There are still places where the extended characters aren't being recognized. @heiderich -- does putting "use utf8;" at the top of the Authen.pm file solve the problem you report?

Keep reporting anomalies like that and we'll tackle them one by one. Jason Aubrey should be back from vacation soon. He can help pull suggested merges into develop. He can also give you access to the Transifex account if that is helpful.

I'll be back around August 22.

-- Mike

@jutrembBDEB
Copy link
Copy Markdown
Contributor

jutrembBDEB commented Aug 4, 2017

Everything seems to work fine now when I create a new course. As Mike said, it's the existing courses that is the problem.

I found another anomalie though. I added a extra library button in localOverrides file, it's call "Banque de problèmes libres". the characters doesn't show correctly in the SetMaker page.

@heiderich
Copy link
Copy Markdown
Member

@mgage: Adding "use utf8;" at the top of the Authen.pm does not solve the problem I reported.

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Aug 4, 2017

@jutrembBDEB --I'll work on it when I get back if you guys haven't fixed it already Keep a list of things you find. There may be some things in your previous PR (which you have since withdrawn) that didn't get fixed in this current pull request -- please add them back in.

@heiderich or @jutrembBDEB feel free to create to clone this branch on my site (mgage:develop_utf8_ver2 and create a ver3 with more fixes. We can review it together
and merge it with develop once I get back. See you around August 22.

@jutrembBDEB
Copy link
Copy Markdown
Contributor

@heiderich : I found a solution to your problem :

To the lib/WeBWorK/ContentGenerator/Login.pm file, at the begining of the file, you add use Encode;

and arround line 188, under the line
my $authen_error = MP2 ? $r->notes->get("authen_error") : $r->notes("authen_error");

you add this new line : $authen_error = Encode::decode_utf8($authen_error);

It worked for me!

@heiderich
Copy link
Copy Markdown
Member

Thank you @jutrembBDEB. This indeed fixes the problem.

@heiderich heiderich mentioned this pull request Aug 4, 2017
@heiderich
Copy link
Copy Markdown
Member

As proposed by @mgage I created a new pull request #800. I already added the fix proposed by @jutrembBDEB.

@jutrembBDEB
Copy link
Copy Markdown
Contributor

There's seem to happen an extra conversion in utf8 with the commands addmessage and maketext. In the file ContentGenerator/Instructor/PGProblemEditor2.pm, some text that are called with a addgoodmessage or addmessage command have an "?" instead of an accent character :

I tested this action called by the line 1833 : $self->addgoodmessage($r->maketext("The set header for set [_1] has been renamed to '[_2]'.", $setName, $self->shortPath($outputFilePath))) ;

Here's what happened :

addgoodmessage

@jutrembBDEB
Copy link
Copy Markdown
Contributor

I found a solution for my previous problem with the question mark character. It was a problem with the escape handler call in the file PGProblemEditor2.pm.

We have to replace all the uri_escape by uri_escape_utf8.

I know that the file PGProblemEditor3.pm calls the uri_escape, but I don't know if there is other files and if we need to change everything to uri_escape_utf8.

But for now, it solved my problem.

@heiderich
Copy link
Copy Markdown
Member

uri_escape is used in the following files:

clients/uribase64_encode.pl
lib/WeBWorK/ContentGenerator/CourseAdmin.pm
lib/WeBWorK/ContentGenerator/Instructor/AchievementEditor.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor3.pm
lib/WeBWorK/ContentGenerator/Instructor/PGProblemEditor2.pm
lib/WeBWorK/ContentGenerator.pm

I replaced uri_escape by uri_escape_utf8 in all of these files expect the first one, because I am not sure how it is used and if I understand it correctly, uri_escape is only applied to a base64 string there, which should not contain any non-ASCII characters. Or am I mistaken?

I added a commit to my pull request #800 with these changes.

@jutrembBDEB
Copy link
Copy Markdown
Contributor

jutrembBDEB commented Aug 9, 2017

I had an UTF8 problem with the simple.conf file that is saved when we modify the Config of a course. I have translated all the permission role in french and "nobody" is translated as "aucun rôle". So when I modified some options in the config that use "nobody", it's written in the simple.conf file. The problem happened when I changed the langage from french to english in the Config, the simple.conf that was saved didn't recognize the "ô" character.

So, in the file lib/WeBWorK/ContentGenerator/Instructor/Config.pm, I changed the 490 line:
if( open OUTPUTFILE, ">utf8:", $outputFilePath) {

There was also something else happening when I changed the langage from french to english. All the permission that was set to "aucun rôle", was modify as "guest" instead of "nobody". I found in the Config file the line responsible for that where there was missing a maketext call.

Here's the section of the code I modified (line 248 to 251)

    my $r = $self->{Module}->r;
return '' if($displayoldval eq $newval);
my $str = '$'. $varname . " = '$newval';\n";
    $str = '$'. $varname . " = undef;\n" if $newval eq $r->maketext('nobody');

I created a pull request on Heiderich develop_uft8_ver3 branch with these modifications.
#heiderich#1

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Jul 23, 2018

This pull request has been replaced by pull request PR #800

This still has good discussion of the issues.

@mgage
Copy link
Copy Markdown
Member Author

mgage commented Jul 23, 2018

Closing this in favor of PR #800

@mgage mgage closed this Jul 23, 2018
@mgage mgage deleted the develop_uft8_ver2 branch July 24, 2018 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants