Skip to content

change encoding to utf-8 in several files#327

Merged
jwj61 merged 2 commits into
openwebwork:masterfrom
heiderich:utf8-encoding
Jul 12, 2019
Merged

change encoding to utf-8 in several files#327
jwj61 merged 2 commits into
openwebwork:masterfrom
heiderich:utf8-encoding

Conversation

@heiderich
Copy link
Copy Markdown
Member

@heiderich heiderich commented Aug 4, 2017

I converted these pg files to utf-8 in order to prevent problems with the forthcoming utf-8 compatible version of WeBWorK. With the iso-8859 encoding the OPL-update script would complain (and probably these problems would not render properly). Probably this should be pulled when WeBWorK switches to utf-8. On installations that still use the old WeBWorK these problems may cause problems. I fear that this problem is hard to avoid. Fortunately it affects only a few problems.

@jwj61
Copy link
Copy Markdown
Member

jwj61 commented Nov 13, 2017

This has been on hold until webwork switches to utf-8. Has that happened yet?

@heiderich
Copy link
Copy Markdown
Member Author

No, not yet. I think the most recent pull request is the following: openwebwork/webwork2#800

Just today I added two more commits that fix a problem in the library browser (there were problems when subject, chapter or section contained non-ASCII characters).

@taniwallach
Copy link
Copy Markdown
Member

@jwj61 - WW 2.15 is expected to have the UTF-8 support. There is a branch for release 2.15, but it is still a beta version. See: openwebwork/webwork2#948

@heiderich - Maybe the change should just avoid use of either UTF-8 or Latin1 characters with the high-bit set (those which are invalid or could be misinterpreted when loaded as UTF-8).

  • About half of the changes are to the copyright symbol. Using © as a stand in for the special character will avoid there being any real use of UTF-8 in those problems. That change was adopted in many files in the webwork2 codebase.
  • The other half are the "display" value for "No_answer" in a True/False ra_pop_up_list and it seems to me that ??? could be a value which would avoid UTF-8 or byte sequences invalid in UTF-8.
  • If fixing the copyright line, maybe replace http://openwebwork.sf.net/ with a more current URL and fix the dates:
# Copyright © 2000-2019 The WeBWorK Project, https://github.com/openwebwork

or

# Copyright © 2000-2019 The WeBWorK Project, http://webwork.maa.org/wiki

@drjt @mgage - Any suggestion on what should be the URL in copyright notices?

@mgage
Copy link
Copy Markdown
Member

mgage commented Jul 10, 2019

I recommend setting the url to https://github.com/openwebwork. That's the most stable address at the moment and has the license files. Eventually we can use https://openwebwork.org but that is not set up properly right at the moment. Fortunately it's not to hard to make a global change.

@heiderich
Copy link
Copy Markdown
Member Author

heiderich commented Jul 10, 2019

@taniwallach Thank for your suggestions.

I agree that avoid using "high-bit set characters" would be preferable. This would probably make the OPL compatible with versions of WeBWorK that do not support UTF-8 as well as those that do.

© is the HTML encoding for the copyright sign, right? I would be surprised to find it at this place in .pg files. Maybe we can just drop the copyright sign all together. The word "copyright" should be clear enough.

Concerning the string for No_answer, the wiki http://webwork.maa.org/wiki/PopUpListsLong simply "?" is used. How about that?

As for the text in the copyright line, I wonder whether
The WeBWorK Project, https://github.com/openwebwork would not suggest that https://github.com/openwebwork is the website of "The WeBWorK Project". As far as I understand it, the WeBWorK software (webwork2 and pg) along with the OPL is not the same as "The WeBWorK Project". In this form this may lead to confusion.

@heiderich heiderich closed this Jul 10, 2019
@heiderich heiderich reopened this Jul 10, 2019
@mgage
Copy link
Copy Markdown
Member

mgage commented Jul 10, 2019

@jwj61 I think this is ready to be merged. I have already changed the copyright symbol globally in all of the instances of course.conf that I could find.

@taniwallach
Copy link
Copy Markdown
Member

@jwj61 @mgage - Please wait. Let Florian change it to avoid the use of "high-bit set characters" so the files will be compatible with both versions of WW with and without UTF-8 support.

@heiderich - The © HTML encoding for © can act now as a placeholder, so if one day a change to the UTF-8 character is wanted, it can be found easily by search and replace. But removing © and not putting the HTML encoding in should also be fine. About the string for No_answer, yes a single ? is fine. I'm not sure what the original characters were in these files, I see "��?" and "ÊÊ?" neither of which is particularly meaningful. Maybe in some other 8-bit encoding there was some other symbol there.

@heiderich
Copy link
Copy Markdown
Member Author

I also do not know what "��?" and "ÊÊ?" were.

If there is agreement, I can

  • replace the copyright sign by the HTML encoding of it (I get your point, Tani)
  • replace the strings for No_answer with "?"

@mgage
Copy link
Copy Markdown
Member

mgage commented Jul 10, 2019

This sounds good to me. Thanks.

and update copyright years from 2003 to 2019
Copy link
Copy Markdown
Member

@mgage mgage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes seem fine to me. I think JJ is on vacation this week and since this doesn't seem urgent I'll let him merge the changes if he approves them.

@jwj61 jwj61 merged commit faa9777 into openwebwork:master Jul 12, 2019
@heiderich heiderich deleted the utf8-encoding branch August 22, 2019 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants