Fake-News-Detector

Learning basics of Deep Learning with TF-IDF and Fake News Detection. Original project at https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/, but will do some fun twists.

Goals

Overall: The goal is to have different types of models that will predict whether a post, article, or other text is considered fake news or not. I will try to split each article into a different topic (currently within political atmospheres, will hopefully spread to non-political news as well). Goal is to predict with >95% accuracy, and reduce the number of false negatives that exist (aka. classifying text as real when it is actually fake).

Create a Fake News classifier by implementing basic NLP preprocessing techniques (such as vectorization) and training on an SGD-augmented linear model. Should be applicable to any news article. Slight performance tuning adjustments to prioritize FPs/FNs may need to be done. - in progress in model.ipynb
Create a web scraping pipeline where if someone puts in an article link, then it will extract the article and print out the inference
Create a new model that focuses on short-form videos and/or small statements
Create models that focus on specific topics (like war, economy, disasters, etc.)

The goal is to have different types of models that will predict whether a post, article, or other text is considered fake news or not. I will try to split each article into a different topic (currently within political atmospheres, will hopefully spread to non-political news as well). Goal is to predict with >95% accuracy, and reduce the number of false negatives that exist (aka. classifying text as real when it is actually fake).

Progress

Got two models running and deployed on a simple website (see above).

The first is for long articles, using a SGD-boosted Logistic Regression classifier to determine whether text is fake or real. This model works best on longer bodies of text as compared to shorter paragraphs, and has around 96% accuracy.

The second is for Twitter posts, specifically shorter-form posts and comments over the site. This model uses a Histogram-based Gradient Boosting Classifier, but it performs worse than the long-text model (~80% accuracy) due to the shorter content. Some features may need to be removed due to web scrape parsing issues, so still up for some change.

Next: Want to add transcripts of videos from various social media platforms, analysis of pictures, and support for Instagram, Tiktok, Facebook, etc.

Requirements: requirements.txt

Data Citation: in data_sources.md

Also check out another project of mine, curly-waffle-spark

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.continue		.continue
.vscode		.vscode
api		api
app		app
data		data
eda		eda
model		model
mongodb		mongodb
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_sources.md		data_sources.md
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake-News-Detector

Goals

Progress

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fake-News-Detector

Goals

Progress

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages