Fake News Classification Using Random Forest and Decision Tree (J48)
Keywords:
Machine learning, Text classification, Natural language processing, Decision Tree, Random ForestAbstract
Fake News is one of the most popular phenomena that have considerable effects on our social life, especially in the political domain. Nowadays, creating fake news becomes very easy because of users' widespread using the internet and social media. Therefore, the detection of elusiveness news is a crucial problem that needs to be considerable mainly because of its challenges like the limited amount of the benchmark datasets and the amount of the published news every second. This research proposed utilizing two different machine learning algorithms (random forest and decision tree (J48)) to detect the fake news. In this paper, the full dataset size equals 20,761 samples, while the testing sample size equals 4,345 samples. The preprocessing steps start with cleaning data by removing unnecessary special characters, numbers, English letters, and white spaces, and finally, removing stop words is implemented. After that, the most popular feature extraction method (TF-IDF) is used before applying the two suggested classification algorithms. The results show that the best accuracy achieved equals 89.11% using the decision tree model while using the random forest; the accuracy achieved equals 84.97 %.