returning char form if needed. by tahirrafiqueasad · Pull Request #78 · mpcabd/python-arabic-reshaper

tahirrafiqueasad · 2022-03-02T19:44:30Z

Your work is very impressive and work best for the Urdu language.

I am working on a project in which I have to know the character form. You found character form in you code but there is no specific function to get it. I added one more argument to reshape function. This argument will allow to get character form if needed.

mpcabd · 2022-03-08T10:45:44Z

Thanks Tahir.

I don't really see the use case, nor do I agree with the implementation - i.e. the extra argument that would change the return type -. Please explain the use case in more detail, and let's think about a better interface implementation to solve the case.

tahirrafiqueasad · 2022-03-09T17:35:09Z

It has a very important use in the training of the *Word Detector *(machine learning model to detect the words in image), that is the first module of OCR <https://towardsdatascience.com/a-gentle-introduction-to-ocr-ee1469a201aa> (Optical Character Recognition) pipeline. Most of the Detector models are trained in the English language, because English words have separate characters. In the case of Urdu and Arabic words, characters are not separate. Inorder to train the model (Specially CRAFT <https://github.com/clovaai/CRAFT-pytorch>) for the Urdu and Arabic we need character level annotations that are not possible. Instead of getting character level annotation we will get word part level annotation. Example is given below for more understanding: *Character Annotation:* [image: Screenshot from 2022-03-09 22-16-53.png] *Part Annotation:* [image: Screenshot from 2022-03-09 22-18-48.png] *Word Annotation:* [image: Screenshot from 2022-03-09 22-19-48.png] The work on word level annotation is already done by Adavoudi <https://github.com/adavoudi/SynthText>. But this word level annotation is not suitable for the training of the model. Character level annotation is good but sometimes character annotations overlap other characters because of connection. So, the best strategy is to use part level annotation. We can make such a type of annotation if we know the start and end character. Here your library comes into play. If your library provides extra information then we will be able to produce part level annotations.

…

On Tue, Mar 8, 2022 at 3:45 PM Abdullah Diab ***@***.***> wrote: Thanks Tahir. I don't really see the use case, nor do I agree with the implementation - i.e. the extra argument that would change the return type -. Please explain the use case in more detail, and let's think about a better interface implementation to solve the case. — Reply to this email directly, view it on GitHub <#78 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVGPPHYS2IQHDKIO65JXVBDU64VWJANCNFSM5PYNRY6A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Regards, Muhammad Tahir Rafique

returning char form if needed.

94c3de9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

returning char form if needed.#78

returning char form if needed.#78
tahirrafiqueasad wants to merge 1 commit intompcabd:masterfrom
tahirrafiqueasad:char_form

tahirrafiqueasad commented Mar 2, 2022

Uh oh!

mpcabd commented Mar 8, 2022

Uh oh!

tahirrafiqueasad commented Mar 9, 2022 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tahirrafiqueasad commented Mar 2, 2022

Uh oh!

mpcabd commented Mar 8, 2022

Uh oh!

tahirrafiqueasad commented Mar 9, 2022 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants