Dataset glove.840B.300d.txt character issue

The involved dataset, at line **52343**, presents what it seems to be **". . ."**, but it's not. 
At this line, the code of the example `sif_embedding.py` breaks because the `split()` at line 15 of `auxiliary_data/data_io.py` splits wrongly the word and its embedding. 
After a debugging on that line it turned out that the dots of **". . ."** are actually dots while the spaces are the code **160** of the extended ASCII table.
Probably this file is not encoded in ASCII but in Unicode, however (for practical reasons) the test has been made with `ord()` so the output is an ASCII code, but the problem doesn't change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset glove.840B.300d.txt character issue #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset glove.840B.300d.txt character issue #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions