Fake news detection data set

There are several fake news detection datasets available, each with its own characteristics and challenges. Here are some popular ones:

  1. LIAR Dataset: This dataset was created by Fabio Petroni et al. in 2019 and contains 12,836 labeled statements from Politifact, a fact-checking website. The dataset is divided into 6 categories: "pants-on-fire" (false), "false", "mostly-true", "half-true", "mostly-false", and "true".
  2. Fever Dataset: This dataset was created by Thorne et al. in 2018 and contains 5,500 labeled articles from the Daily Mail and the New York Times. The dataset is divided into 3 categories: "reliable" (true), "unreliable" (false), and "unclassified" (unknown).
  3. Fake News Detection Dataset: This dataset was created by Kumar et al. in 2019 and contains 10,000 labeled articles from various sources, including news websites, social media, and online forums. The dataset is divided into 2 categories: "fake" and "real".
  4. SciFact Dataset: This dataset was created by Wang et al. in 2020 and contains 10,000 labeled scientific articles from the arXiv and the PLOS ONE journals. The dataset is divided into 2 categories: "true" and "false".
  5. Fake News Corpus: This dataset was created by Shu et al. in 2017 and contains 1,000 labeled articles from various sources, including news websites, social media, and online forums. The dataset is divided into 2 categories: "fake" and "real".
  6. FactCheck Dataset: This dataset was created by Popat et al. in 2018 and contains 1,000 labeled articles from various sources, including news websites, social media, and online forums. The dataset is divided into 2 categories: "true" and "false".
  7. Hoaxy Dataset: This dataset was created by Shu et al. in 2017 and contains 1,000 labeled articles from various sources, including news websites, social media, and online forums. The dataset is divided into 2 categories: "fake" and "real".

These datasets can be used to train and evaluate machine learning models for fake news detection. However, it's important to note that the quality and diversity of the datasets can affect the performance of the models.

Here are some key characteristics of these datasets:

When using these datasets, it's important to consider the following:

By using these datasets and considering the above factors, you can develop effective fake news detection models that can help identify and mitigate the spread of misinformation.