Skip to content

ibus-engines.mozc-ut: init at 2.30.5544.102#314248

Merged
h7x4 merged 18 commits into
NixOS:masterfrom
pineapplehunter:mozc-updates
Sep 21, 2024
Merged

ibus-engines.mozc-ut: init at 2.30.5544.102#314248
h7x4 merged 18 commits into
NixOS:masterfrom
pineapplehunter:mozc-updates

Conversation

@pineapplehunter
Copy link
Copy Markdown
Contributor

@pineapplehunter pineapplehunter commented May 24, 2024

Description of changes

This is an attempt to package ibus-engines.mozc-ut which enables a new UT dictionaries option.

It copies a lot of the work in #296035 but changes every dictionary as an derivation. By making every dictionary a derivation each update to the dictionary will keep the IME up to date.

I am still unsure if this is the right way to implement it, so I will put this up as a draft for now.
Edit: changed the title so it only shows the mozc-ut

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@pineapplehunter pineapplehunter changed the title jp-zip-code: init at 0-unstable-2024-05-01,jawiki-all-titles-in-ns0: init at 20240520,mozcdic-ut-alt-cannadic: init at 0-unstable-2023-11-18,mozcdic-ut-edict2: init at 0-unstable-2024-04-23,mozcdic-ut-jawiki: init at 0-unstable-2024-05-02,mozcdic-ut-neologd: init at 0-unstable-2024-04-23,mozcdic-ut-personal-names: init at 0-unstable-2024-05-15,mozcdic-ut-place-names: init at 0-unstable-2024-05-15,mozcdic-ut-skk-jisyo: init at 0-unstable-2024-04-23,mozcdic-ut-sudachidict: init at 0-unstable-2024-04-23,merge-ut-dictionaries: init at 0-unstable-2024-05-18,ibus-engines.mozc-ut: init at 2.29.5374.102 ibus-engines.mozc-ut: init at 2.29.5374.102 May 24, 2024
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Comment thread pkgs/by-name/jp/jp-zip-codes/package.nix Outdated
Comment thread pkgs/by-name/me/merge-ut-dictionaries/package.nix Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These packages all look extremely similar. Ever considered maybe writing a wrapper for them? (You can then expose them in merge-ut-dictionaries or somewhere else)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking these out so quickly!
I think it is better to create a wrapper too, but It is very convenient having the auto-update script running. Is there a way to have the auto-update happen without creating multiple derivations?

Comment thread pkgs/tools/inputmethods/ibus-engines/ibus-mozc/default.nix Outdated
Comment thread pkgs/top-level/all-packages.nix Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make these the default value of the dictionaries variable in the deriv, like you've done in merge-ut-dictionaries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if it is a good design, but I wanted to show that dictionaries should be explicitly added and keep them out of the mozc derivation.
Further, I think it may be better if merge-ut-dictionaries were a function that takes some dictionaries and outputs a file. I wasn't quite sure where to put something like that in the by-name directory. Is there a place to put a file like that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is compelling but the current default value of dictionaries matches upstream's default.

  • mozcdic-ut-jawiki -> jawiki
  • mozcdic-ut-personal-names -> personal_names
  • mozcdic-ut-place-names -> place_names
  • mozcdic-ut-sudachidict -> sudachidict

@pluiedev
Copy link
Copy Markdown
Member

Also the old PR title is just way too long haha, caught my attention immediately

@ofborg ofborg Bot added the 8.has: package (new) This PR adds a new package label May 24, 2024
@ofborg ofborg Bot requested a review from gebner May 24, 2024 11:00
@ofborg ofborg Bot added 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels May 24, 2024
@ofborg ofborg Bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. and removed 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels May 24, 2024
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Copy link
Copy Markdown
Member

@stephen-huan stephen-huan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested against upstream dictionary (as of 2024-05-24). See instructions below.

git clone https://github.com/utuhiro78/merge-ut-dictionaries
cd merge-ut-dictionaries/src
# save below diff as enable_all.patch
git apply enable_all.patch
# ruby and wget should be on path, utf-8 locale
bash make.sh
sed -i "s|1843|1847|g" mozcdic-ut.txt
md5sum mozcdic-ut.txt
diff --git a/src/make.sh b/src/make.sh
index a5689a1..ef881ed 100644
--- a/src/make.sh
+++ b/src/make.sh
@@ -3,13 +3,13 @@
 # Author: UTUMI Hirosi (utuhiro78 at yahoo dot co dot jp)
 # License: Apache License, Version 2.0
 
-#alt_cannadic="true"
-#edict="true"
+alt_cannadic="true"
+edict="true"
 jawiki="true"
-#neologd="true"
+neologd="true"
 personal_names="true"
 place_names="true"
-#skk_jisyo="true"
+skk_jisyo="true"
 sudachidict="true"
 
 rm -f mozcdic-ut.txt
5a2d917e7185d8f196d4ea5fab9a0e65  mozcdic-ut.txt
5a2d917e7185d8f196d4ea5fab9a0e65  /nix/store/fpcb9rkppagaav45ry8ab5zqsfmlmch7-merge-ut-dictionaries-0-unstable-2024-05-18/mozcdic-ut.txt

Result of nixpkgs-review pr 314248 run on x86_64-linux 1

2 packages blacklisted:
  • nixos-install-tools
  • tests.nixos-functions.nixos-test
8 packages built:
  • emacsPackages.ac-mozc
  • emacsPackages.mozc
  • emacsPackages.mozc-cand-posframe
  • emacsPackages.mozc-im
  • emacsPackages.mozc-popup
  • emacsPackages.mozc-temp
  • ibus-engines.mozc
  • ibus-engines.mozc-ut

Did not test that ibus-engines.mozc-ut actually picked up the dictionary but in my experience if the dictionary is correct the IME should be fine.

@stephen-huan
Copy link
Copy Markdown
Member

@pineapplehunter thanks for the PR! Was going to take a crack at it myself but didn't get around to it in time...

All of the dictionaries have build scripts, e.g. mozcdic-ut-place-name (except for mozcdic-ut-personal-names). Is there any merit in packaging the dictionaries "from source" rather than relying on the manually updated .txt.tar.bz2? I looked at some of the dictionary URLs and they seem not stable (updated in place) so there would need to be mirroring. If so, it'd probably be more trouble than it's worth since these dictionary repositories are already essentially mirrors.

Secondly, what concrete steps need to be taken to go from draft status to a mergeable state? Given that this PR is already functionally correct (in that it matches upstream + works with IME) and dictionaries are easily updated by nixpkgs-update the PR looks good to me. Given the 3-4 PRs I've seen, missing the ut dictionary has been a sore point for some time now and personally speaking, I'd like to stop compiling ibus every time I update my system ;).

@pluiedev
Copy link
Copy Markdown
Member

All of the dictionaries have build scripts, e.g. mozcdic-ut-place-name (except for mozcdic-ut-personal-names). Is there any merit in packaging the dictionaries "from source" rather than relying on the manually updated .txt.tar.bz2?

It's more a principle thing, I think — aside from some core bootstrapping libraries, Nixpkgs tries to compile things from source as much as possible to ensure that the compilation process is reproducible. Unwrapping and repackaging prebuilt files should only be done if there's no way to build the package correctly and/or reproducibly

@stephen-huan
Copy link
Copy Markdown
Member

I see. I'll try to take a shot at it at some point. Either way, I don't think it's a blocker for this PR since a "from source" derivation can transparently replace any of the "repacked" derivations (which are their own derivations already).

@ofborg ofborg Bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. and removed 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Aug 23, 2024
@pineapplehunter pineapplehunter changed the title ibus-engines.mozc-ut: init at 2.29.5374.102 ibus-engines.mozc-ut: init at 2.30.5544.102 Aug 30, 2024
Comment thread pkgs/by-name/ja/jawiki-all-titles-in-ns0/package.nix Outdated
Comment thread pkgs/by-name/me/merge-ut-dictionaries/package.nix Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be turned into a new separate package instead (e.g. mozc/default.nix)? It looks like that this package could also be reused for fcitx5-mozc, so I don't think it should be an ibus-specific package.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure about fcitx5-mozc package. I may checkout later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #251706 which might try to coordinate with this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't think you need to make any fcitx5-specific changes. Simply moving the package to its own separate namespace would probably be enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@musjj I'll try moving this package to pkgs/by-name/mo/mozc/package.nix.
Let me know if this causes any issues with the fcitx5 implementation.

@h7x4
Copy link
Copy Markdown
Member

h7x4 commented Sep 19, 2024

@pineapplehunter Is there anything keeping this PR from being undrafted? @musjj would you be okay with merging this PR first, and rebasing your work onto it in #251706? I'd be happy if we could get the new mozc stuff in before 24.11

@musjj
Copy link
Copy Markdown
Contributor

musjj commented Sep 19, 2024

would you be okay with merging this PR first, and rebasing your work onto it

Yeah, I'd be okay with that.

@pineapplehunter
Copy link
Copy Markdown
Contributor Author

@h7x4 @musjj Thanks for the comments, and sorry for the late reply.
I originally started this PR as a PoC to see if making all the dictionaries as packages would work. I wasn't quite sure if this was the right way to go.
But I'm happy to undrafted and merge this.
Feel free to let me know if there's anything I should fix.

@pineapplehunter pineapplehunter marked this pull request as ready for review September 19, 2024 11:27
--replace-fail "urllib.request.urlopen" "open" \
--replace-fail "read().decode()" "read()"

for dir in ${builtins.toString dictionaries}; do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but I think the use of toString for lists like this is a bit of an antipattern, maybe apart from debugging purposes. It's better to be explicit.

Suggested change
for dir in ${builtins.toString dictionaries}; do
for dir in ${lib.concatStringsSep " " dictionaries}; do

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I did it in my PR:

''
  for name in ${lib.escapeShellArgs dictionaries}; do
    ...
  done
''

This ensures that everything is escaped correctly.

Comment thread pkgs/by-name/jp/jp-zip-codes/package.nix
rev = "119c888a38032a92e139c52cd26f45bb495c4d54";
hash = "sha256-uyAL2TcFJsYZACFDAxIQ4LE40Hi4PVrQRnJl5O5+RmU=";
};
ut-dictionary = merge-ut-dictionaries.override { inherit dictionaries; };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make the overriden merge-ut-dictionaries throw an error by default, considering the dictionaries ? [ ] and the non-empty assertion in merge-ut-dictionaries? Maybe it would be better to do something along the lines of this instead?

Suggested change
ut-dictionary = merge-ut-dictionaries.override { inherit dictionaries; };
ut-dictionary = merge-ut-dictionaries.override (lib.optionalAttrs (dictionaries != [ ]) { inherit dictionaries; });

Copy link
Copy Markdown
Contributor Author

@pineapplehunter pineapplehunter Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make the overriden merge-ut-dictionaries throw an error by default,

No, It is only evaluated when any dictionary is in the list, on line 78.
I made it so you could build mozc with or without the dictionaries.
But I can see this being kind of confusing. I'm happy to change it.

Comment thread pkgs/by-name/me/merge-ut-dictionaries/package.nix Outdated
@pineapplehunter
Copy link
Copy Markdown
Contributor Author

I just remembered I have to double-check the licenses on the dictionaries. I remember each of them having slightly different licenses.

@pineapplehunter
Copy link
Copy Markdown
Contributor Author

All licenses have been checked and should now be good to go.

Comment thread pkgs/by-name/me/merge-ut-dictionaries/package.nix Outdated
@pineapplehunter
Copy link
Copy Markdown
Contributor Author

Missed the license for jp-zip-codes

@h7x4 h7x4 requested a review from pluiedev September 19, 2024 14:08
Copy link
Copy Markdown
Member

@h7x4 h7x4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Merging in a few days unless someone says otherwise :)

@h7x4 h7x4 merged commit 9a0f009 into NixOS:master Sep 21, 2024
@h7x4
Copy link
Copy Markdown
Member

h7x4 commented Sep 21, 2024

Thank you all for the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

8.has: package (new) This PR adds a new package 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants