diff --git a/docs/CLI.rst b/docs/CLI.rst index de4b338..7b948b6 100644 --- a/docs/CLI.rst +++ b/docs/CLI.rst @@ -21,7 +21,7 @@ command ``polyglot`` -h, --help show this help message and exit --lang LANG Language to be processed --delimiter DELIMITER - Delimiter that seperates documents, records or even sentences. + Delimiter that separates documents, records or even sentences. --workers WORKERS Number of parallel processes. -l LOG, --log LOG log verbosity level --debug drop a debugger if an exception is raised. @@ -43,7 +43,7 @@ command ``polyglot`` Notice that most of the operations are language specific. For example, tokenization rules and part of speech taggers differ between languages. -Therefore, it is important that the lanaguage of the input is detected +Therefore, it is important that the language of the input is detected or given. The ``--lang`` option allows you to tell polyglot which language the input is written in. @@ -186,7 +186,7 @@ option ``workers``. Building Pipelines ------------------ -The previous subcommand ``count`` assumed that the words are separted by +The previous subcommand ``count`` assumed that the words are separated by spaces. Given that we never tokenized the text file, that may result in suboptimal word counting. Let us take a closer look at the tail of the word counts diff --git a/docs/Detection.rst b/docs/Detection.rst index 32baf73..ba3bab3 100644 --- a/docs/Detection.rst +++ b/docs/Detection.rst @@ -46,7 +46,7 @@ Mixed Text """ If the text contains snippets from different languages, the detector is -able to find the most probable langauges used in the text. For each +able to find the most probable languages used in the text. For each language, we can query the model confidence level: .. code:: python diff --git a/docs/Download.rst b/docs/Download.rst index a3bea7a..6fc0097 100644 --- a/docs/Download.rst +++ b/docs/Download.rst @@ -85,7 +85,7 @@ its name and the target language. Package name format is ``task_name.language_code`` -Langauge Collections +Language Collections ^^^^^^^^^^^^^^^^^^^^ Packages are grouped by language. For example, if we want to download @@ -152,7 +152,7 @@ Therefore, we can just run: -Langauge & Task Support +Language & Task Support ----------------------- We can query our download manager for which tasks are supported by diff --git a/docs/Embeddings.rst b/docs/Embeddings.rst index 07adad2..ddbdb78 100644 --- a/docs/Embeddings.rst +++ b/docs/Embeddings.rst @@ -29,7 +29,7 @@ Nearest Neighbors ----------------- A common way to investigate the space capture by the embeddings is to -query for the nearest neightbors of any word. +query for the nearest neighbors of any word. .. code:: python diff --git a/docs/NamedEntityRecognition.rst b/docs/NamedEntityRecognition.rst index 90b0db1..58994ad 100644 --- a/docs/NamedEntityRecognition.rst +++ b/docs/NamedEntityRecognition.rst @@ -96,7 +96,7 @@ We can query all entities mentioned in a text. -Or, we can query entites per sentence +Or, we can query entities per sentence .. code:: python diff --git a/docs/Sentiment.rst b/docs/Sentiment.rst index 4e69bb4..9ffaf2b 100644 --- a/docs/Sentiment.rst +++ b/docs/Sentiment.rst @@ -111,7 +111,7 @@ is mentioned in text as the following: text = Text(blob) First, we need split the text into sentneces, this will limit the words -tha affect the sentiment of an entity to the words mentioned in the +that affect the sentiment of an entity to the words mentioned in the sentnece. .. code:: python diff --git a/notebooks/CLI.ipynb b/notebooks/CLI.ipynb index 289730d..c7a5ee1 100644 --- a/notebooks/CLI.ipynb +++ b/notebooks/CLI.ipynb @@ -34,7 +34,7 @@ " -h, --help show this help message and exit\r\n", " --lang LANG Language to be processed\r\n", " --delimiter DELIMITER\r\n", - " Delimiter that seperates documents, records or even sentences.\r\n", + " Delimiter that separates documents, records or even sentences.\r\n", " --workers WORKERS Number of parallel processes.\r\n", " -l LOG, --log LOG log verbosity level\r\n", " --debug drop a debugger if an exception is raised.\r\n", @@ -65,7 +65,7 @@ "source": [ "Notice that most of the operations are language specific.\n", "For example, tokenization rules and part of speech taggers differ between languages.\n", - "Therefore, it is important that the lanaguage of the input is detected or given.\n", + "Therefore, it is important that the language of the input is detected or given.\n", "The `--lang` option allows you to tell polyglot which language the input is written in." ] }, @@ -284,7 +284,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The previous subcommand `count` assumed that the words are separted by spaces.\n", + "The previous subcommand `count` assumed that the words are separated by spaces.\n", "Given that we never tokenized the text file, that may result in suboptimal word counting.\n", "Let us take a closer look at the tail of the word counts" ] diff --git a/notebooks/Detection.ipynb b/notebooks/Detection.ipynb index 8e98c42..1b29af4 100644 --- a/notebooks/Detection.ipynb +++ b/notebooks/Detection.ipynb @@ -93,7 +93,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If the text contains snippets from different languages, the detector is able to find the most probable langauges used in the text.\n", + "If the text contains snippets from different languages, the detector is able to find the most probable languages used in the text.\n", "For each language, we can query the model confidence level:" ] }, diff --git a/notebooks/Download.ipynb b/notebooks/Download.ipynb index 7fbb76f..5c5816b 100644 --- a/notebooks/Download.ipynb +++ b/notebooks/Download.ipynb @@ -150,7 +150,7 @@ "\n", "Package name format is `task_name.language_code`\n", "\n", - "#### Langauge Collections\n", + "#### Language Collections\n", "\n", "Packages are grouped by language. For example, if we want to download all the models that are specific to Arabic, the arabic collection of models name is **LANG:** followed by the language code of Arabic which is `ar`.\n", "\n", @@ -238,7 +238,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Langauge & Task Support" + "## Language & Task Support" ] }, { diff --git a/notebooks/Embeddings.ipynb b/notebooks/Embeddings.ipynb index 8014bd6..e448eef 100644 --- a/notebooks/Embeddings.ipynb +++ b/notebooks/Embeddings.ipynb @@ -65,7 +65,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A common way to investigate the space capture by the embeddings is to query for the nearest neightbors of any word." + "A common way to investigate the space capture by the embeddings is to query for the nearest neighbors of any word." ] }, { diff --git a/notebooks/NamedEntityRecognition.ipynb b/notebooks/NamedEntityRecognition.ipynb index fe3528f..1c4a886 100644 --- a/notebooks/NamedEntityRecognition.ipynb +++ b/notebooks/NamedEntityRecognition.ipynb @@ -173,7 +173,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Or, we can query entites per sentence" + "Or, we can query entities per sentence" ] }, { diff --git a/notebooks/Sentiment.ipynb b/notebooks/Sentiment.ipynb index 9cfbead..33487a1 100644 --- a/notebooks/Sentiment.ipynb +++ b/notebooks/Sentiment.ipynb @@ -168,7 +168,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece." + "First, we need split the text into sentences, this will limit the words that affect the sentiment of an entity to the words mentioned in the sentence." ] }, { diff --git a/polyglot/downloader.py b/polyglot/downloader.py index bcc7a50..496948d 100644 --- a/polyglot/downloader.py +++ b/polyglot/downloader.py @@ -193,7 +193,7 @@ def __init__(self, id, url, name=None, subdir='', """The task this package is serving.""" self.language = language - """The langauge code this package belongs to.""" + """The language code this package belongs to.""" self.attrs = attrs """Extra attributes generated by Google Cloud Storage.""" @@ -396,7 +396,7 @@ class Downloader(object): task.""" #///////////////////////////////////////////////////////////////// - # Cosntructor + # Constructor #///////////////////////////////////////////////////////////////// def __init__(self, server_index_url=None, source=None, download_dir=None): @@ -808,7 +808,7 @@ def _update_index(self, url=None): """A helper function that ensures that self._index is up-to-date. If the index is older than self.INDEX_TIMEOUT, then download it again.""" - # Check if the index is aleady up-to-date. If so, do nothing. + # Check if the index is already up-to-date. If so, do nothing. if not (self._index is None or url is not None or time.time()-self._index_timestamp > self.INDEX_TIMEOUT): return diff --git a/polyglot/mapping/base.py b/polyglot/mapping/base.py index 39a4a0c..5d7dbea 100644 --- a/polyglot/mapping/base.py +++ b/polyglot/mapping/base.py @@ -18,7 +18,7 @@ from ..utils import _open def count(lines): - """ Counts the word frequences in a list of sentences. + """ Counts the word frequencies in a list of sentences. Note: This is a helper function for parallel execution of `Vocabulary.from_text` diff --git a/polyglot/mapping/embeddings.py b/polyglot/mapping/embeddings.py index 66ed912..21a05fb 100644 --- a/polyglot/mapping/embeddings.py +++ b/polyglot/mapping/embeddings.py @@ -201,7 +201,7 @@ def _from_word2vec_text(fname): except TypeError as e: parts = line.strip().split() except Exception as e: - logger.warning("We ignored line number {} because of erros in parsing" + logger.warning("We ignored line number {} because of errors in parsing" "\n{}".format(line_no, e)) continue # We differ from Gensim implementation. @@ -263,7 +263,7 @@ def _from_glove(fname): except TypeError as e: parts = line.strip().split() except Exception as e: - logger.warning("We ignored line number {} because of erros in parsing" + logger.warning("We ignored line number {} because of errors in parsing" "\n{}".format(line_no, e)) continue # We deduce layer1_size because GloVe files have no header. diff --git a/polyglot/mixins.py b/polyglot/mixins.py index ee1d82c..e188c01 100644 --- a/polyglot/mixins.py +++ b/polyglot/mixins.py @@ -119,7 +119,7 @@ def find(self, sub, start=0, end=sys.maxsize): def rfind(self, sub, start=0, end=sys.maxsize): '''Behaves like the built-in str.rfind() method. Returns an integer, - the index of he last (right-most) occurence of the substring argument + the index of he last (right-most) occurrence of the substring argument sub in the sub-sequence given by [start:end]. ''' return self._strkey().rfind(sub, start, end) @@ -189,7 +189,7 @@ def join(self, iterable): return self.__class__(self._strkey().join(iterable)) def replace(self, old, new, count=sys.maxsize): - """Return a new blob object with all the occurence of `old` replaced + """Return a new blob object with all the occurrence of `old` replaced by `new`. """ return self.__class__(self._strkey().replace(old, new, count)) diff --git a/polyglot/tag/base.py b/polyglot/tag/base.py index 3209184..ec0b809 100644 --- a/polyglot/tag/base.py +++ b/polyglot/tag/base.py @@ -60,7 +60,7 @@ def _load_network(self): raise NotImplementedError() def annotate(self, sent): - """Annotate a squence of words with entity tags. + """Annotate a sequence of words with entity tags. Args: sent: sequence of strings/words. diff --git a/polyglot/transliteration/base.py b/polyglot/transliteration/base.py index 5cab2c3..908af1e 100644 --- a/polyglot/transliteration/base.py +++ b/polyglot/transliteration/base.py @@ -19,8 +19,8 @@ class Transliterator(object): def __init__(self, source_lang="en", target_lang="en"): """ Args: - source_lang (string): language code of the input langauge. - target_lang (string): language code of the generated output langauge. + source_lang (string): language code of the input language. + target_lang (string): language code of the generated output language. """ self.source_lang = source_lang self.target_lang = target_lang