Skip to content

Develop uft8 ver3#800

Merged
mgage merged 45 commits into
openwebwork:develop_internationalfrom
heiderich:develop_uft8_ver3
Jul 23, 2018
Merged

Develop uft8 ver3#800
mgage merged 45 commits into
openwebwork:develop_internationalfrom
heiderich:develop_uft8_ver3

Conversation

@heiderich
Copy link
Copy Markdown
Member

@heiderich heiderich commented Aug 4, 2017

This pull request is a follow up to pull request #798 by @mgage.

This is experimental -- do not merge it yet.

Remaining problems (this list is not exhaustive):

As mentioned by @jutrembBDEB:

I added a extra library button in localOverrides file, it's call "Banque de problèmes libres". the characters doesn't show correctly in the SetMaker page.

Entries from exiting MySQL tables do not display correctly. Probably they need to be reencoded.
@jutrembBDEB: How did you convert your MySQL database?

As observed by @heiderich:

  • Non-ASCII characters in the metadata of problems (and the Taxonomy2 file) cause problems in the library browser. Entries are shown correctly, but once selected no problems are found.

goehle and others added 30 commits June 20, 2016 14:13
… and mess up the database. Note: We do not need decode from thaw because as sequences of bytes nothing changes. (I think.)
…ch/webwork2 into locbug

Conflicts:
	courses.dist/modelCourse/course.conf
added [qw(Encode::Encoding)] to ${pg}{modules}) in defaults.config as…
…lop_uft8_ver2

# Conflicts:
#	lib/WeBWorK/ContentGenerator/Instructor/SendMail.pm
#	lib/WeBWorK/Utils.pm
@jutrembBDEB
Copy link
Copy Markdown
Contributor

I also found a solution to my problem about the extra library button. It's the exact same solution I did with Login.pm.

In the lib/WeBWorK/ContentGenerator/Instructor/SetMaker.pm file, you add the line use Encode;

and you replace the line 406 : my $name = ($lib eq '')? $r->maketext('Local') : $problib{$lib}; by
my $name = ($lib eq '')? $r->maketext('Local') : Encode::decode_utf8($problib{$lib});

and also the line 837 : $libs .= ' '. CGI::submit(-name=>"browse_$lib", -value=>$problib{$lib}, by

$libs .= ' '. CGI::submit(-name=>"browse_$lib", -value=>Encode::decode_utf8($problib{$lib}),

@heiderich , can you make the change to your branch. I don't want to create another pullrequest just for that. Thank you!

@jutrembBDEB
Copy link
Copy Markdown
Contributor

I didn't convert my SQL database yet, i just change the charset to utf8 for new created courses, but the existing table have not been converted. Maybe I found a solution, but I don't want to test it yet until I showed it to someone.

@heiderich
Copy link
Copy Markdown
Member Author

@jutrembBDEB Let me just mention that @goehle posted an SQL command in his original pull request #712 to convert the database to utf8. I remember that I used a variant of it. The main difference was that I used utf8mb4 instead of utf8 for full utf-8 support (I think the names are a little misleading). I would also propose utf8mb4 as a default for WeBWorK. For more detail see
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql
and
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

I remember that I needed some more SQL commands to change encodings on different levels, but I do not recall them right now.

Use the mentioned SQL commands at your own risk and better create a backup beforehand. Please let us know about your experiences.

@heiderich
Copy link
Copy Markdown
Member Author

heiderich commented Aug 5, 2017

I made the following observation concerning the problem with non-ASCII characters in the metadata of problems / the Taxonomy2 file:

  • With the current version of this PR the following happens:

    When selecting a section containing a non-ASCII character the correct number of problems found is shown. Once you click on "View problems" the page reloads and the string "There are no matching WeBWorK problems" is shown.

  • When adding the line

    mysql_enable_utf8 => 1,

    to the options of the DBI->connect call (around line 112) in the file lib/WeBWorK/Utils/ListingDB.pm, then the opposite happens:

    When selecting a section as above, it is shown that there are no problems, but if you click on "View problems" the problems are actually shown (together with the right number of problems).

Note that the two strings "The are XXX/no matching WeBWorK problems" come from different places. The first one (shown before clicking on "View problems") is defined in htdocs/js/apps/SetMaker/setmaker.js. The second one (shown after clicking on "View problems") is defined in lib/WeBWorK/ContentGenerator/Instructor/SetMaker.pm.

Another issue here is the translation of this string in the .js file. I opened the issue #781 for this.

@heiderich
Copy link
Copy Markdown
Member Author

heiderich commented Aug 28, 2017

I still experience a problem when I try to create a homework set from the library browser with a name that contains non-ASCII characters. The error is:

Problem creating set ü
argument 1 contains invalid characters in 'set_id field: 'ü' (valid characters are [0-9]) at /opt/webwork/webwork2/lib/WeBWorK/ContentGenerator/Instructor/SetMaker.pm line 1483. at /opt/webwork/webwork2/lib/WeBWorK/ContentGenerator/Instructor/SetMaker.pm line 1483. 

Which are the restrictions on the set names? Can we allow non-ASCII characters? If not, maybe we should consider to provide a better error message.

@mgage
Copy link
Copy Markdown
Member

mgage commented Aug 28, 2017

Florian:

Thanks for the report -- this message is from webwork -- in
lib/WeBWorK/DB.pm subroutine validateKeyfieldValue() around 2316. We'll
need to modify the allowed characters (or perhaps make it a configuration choice) once
we are using mysql with extended characters.

@heiderich
Copy link
Copy Markdown
Member Author

I think that commit d24b860 and ac90685 should fix the problem in the library browser that no problems were shown when subject, chapter or section containe non-ASCII characters. It seems that this was caused by the following: Some strings were internally not stored in utf8 in perl. This probably causes problems when they were passed to MySQL (which I think expected utf8, at least after commit d24b860). Applying utf8::upgrade to these strings seems to solve this problem.

@heiderich
Copy link
Copy Markdown
Member Author

@mgage: I think the problem with non-ASCII characters in names of homework sets I mentioned above is not necessarily related to this pull request and I think it could be addressed later on.

@mgage mgage changed the base branch from develop to develop_international July 23, 2018 02:57
@mgage
Copy link
Copy Markdown
Member

mgage commented Jul 23, 2018

Here we go!!!!

@mgage mgage merged commit efafbba into openwebwork:develop_international Jul 23, 2018
@mgage
Copy link
Copy Markdown
Member

mgage commented Jul 23, 2018

See also PR #798 for comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants