Skip to content
This repository was archived by the owner on Apr 23, 2024. It is now read-only.

[WIP] fast wordpiece tokenization#105

Open
gleb-kov wants to merge 11 commits intoVKCOM:masterfrom
gleb-kov:fast-wordpiece
Open

[WIP] fast wordpiece tokenization#105
gleb-kov wants to merge 11 commits intoVKCOM:masterfrom
gleb-kov:fast-wordpiece

Conversation

@gleb-kov
Copy link

что еще доделать:

  • CLI
  • README
  • тесты
  • таблица с бенчмарком
  • причесать VectorSegment, откатить в прежнее состояние, в новом коде использовать полиномиальное хеширование
  • форматинг кода

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant