Skip to content

Support Chinese Characters (non-ASCII) by UTF-8#40

Open
ElevenyChen wants to merge 2 commits into
radiolarian:masterfrom
ElevenyChen:master
Open

Support Chinese Characters (non-ASCII) by UTF-8#40
ElevenyChen wants to merge 2 commits into
radiolarian:masterfrom
ElevenyChen:master

Conversation

@ElevenyChen

@ElevenyChen ElevenyChen commented Dec 16, 2023

Copy link
Copy Markdown

1. Chinese Character Support: The script now correctly handles Chinese characters in CSV and text file outputs.
2. Progress Tracking: It prints the scraping progress, showing the current work ID and count, enhancing user experience during long scraping sessions.
3. Richer Text File Output: Output text files are now reformatted to include detailed metadata such as title, author, rating, relationship, language, status, and chapters, offering more context for each work.

1. Support Chinese characters in CVS and output txt.files now!
2. Print out working progress when scrapping
3. Reformatted the txt.files. Include more information about the work.
@caizhuoyue77

Copy link
Copy Markdown

When I try to scrape Chinese articles, I couldn't get the Chinese Character's, only the Pinyin?

@ElevenyChen ElevenyChen closed this Jun 5, 2024
@ElevenyChen ElevenyChen reopened this Jun 5, 2024
@ElevenyChen

Copy link
Copy Markdown
Author

When I try to scrape Chinese articles, I couldn't get the Chinese Character's, only the Pinyin?

Hi hi, I've successfully scraped Chinese characters in my version of code. See an example. Since the owner of the main branch haven't respond my request, you can directly use the version from my git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants