Costin BUSIOC, Stefan RUSETI, Mihai DASCALU
Universitatea Politehnica din București; Politehnica University of Bucharest
E-mail personal autor:
A Literature Review of NLP Approaches to Fake News Detection and Their
Applicability to Romanian-Language News Analysis
Fighting fake news is a difficult and challenging task. With an increasing impact on the social and political environment, fake news exert an unprecedently dramatic influence on people’s lives. In response to this phenomenon, initiatives addressing automated fake news detection have gained popularity, generating widespread research interest.
However, most approaches targeting English and low-resource languages experience problems when devising such solutions. This study focuses on the progress of such investigations, while highlighting existing solutions, challenges, and observations shared by various research groups. In addition, given the limited amount of automated analyses
performed on Romanian fake news, we inspect the applicability of the available approaches in the Romanian context, while identifying future research paths.
Keywords: fake news identification, Natural Language Processing Techniques, Romanian language
Allcott, Hunt, Matthew Gentzkow, and Chuan Yu. “Trends in the Diffusion of Misinformation on Social Media.” Research and Politics 6, no. 2 (2019): 1-8.
Bhatt, Gaurav, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, and Ankush Mittal. “Combining Neural, Statistical and External Features for Fake News Stance Identification.” In WWW ’18: Companion Proceedings of the The Web Conference, 1353–1357. International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, Switzerland, 2018.
CNN. Retrieved November 2, 2020, from https://edition.cnn.com/
Crammer, Koby and Yoram Singer. “On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines.” Journal of Machine Learning Research 2 (2001): 265–292.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of NAACL-HLT 2019, 4171–4186. Minneapolis, Minnesota: ACL, 2019.
Dragomir, Isabela-Anda, Maria-Dorinela Sirbu, Mihai Dascalu, Simina-Maria Terian, and Stefan Trausan-Matu. “Exploring Differences in NATO Discourses Using the ReaderBench Framework.” University Politehnica of Bucharest Scientific Bulletin – Series C 82, no. 1 (2020): 3–18.
Ferreira, William, and Andreas Vlachos. “Emergent: A Novel Data-set for Stance Classification.” In Proceedings of the 2016
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016, 1, 1163–1168. San Diego: ACL, 2016.
Graves, Alex, and Jürgen Schmidhuber. “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures.” Neural Networks 18, no. 5-6 (2005): 602–610.
Grigore, Iuliana, and Andrada Halgaș. “‘Fabricate’ pentru România. Interese ascunse în ştirile false [‘Made’ for Romania: Hidden Interests in Fake News].” Intelligence, 9 Jan. 2019. https://intelligence.sri.ro/fabricate-pentru-romania-intereseascunse-
stirile-false/Hassan, Naeemul, Fatma Arslan, Chengkai Li, and Mark Tremayne. “Toward Automated Fact-checking: Detecting Checkworthy Factual Claims by Claimbuster.” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, Part F1296, 1803–1812. New York: ACM, 2017.
Hochreiter, Sepp, and Jürgen Schmidhuber. “Long Short-term Memory.” Neural Computation 9, no. 8 (1997): 1735-1780.
Ioniță, Sorin, and Septimiu Pârvu. Fake news în România şi Moldova. Dezinformarea, proasta guvernare şi combaterea lor în Europa de Est [Fake News in Romania and Moldova: Misinformation, Bad Governance, and the Fight Against Them in Eastern
Europe]. Bucharest: EFOR, 2019.
Kibriya, Ashraf M., Eibe Frank, Bernhard Pfahringer, and Geoffrey Holmes. “Multinomial Naive Bayes for Text Categorization Revisited.” In Australasian Joint Conference on Artificial Intelligence, 488–499. Berlin-Heidelberg: Springer, 2004.
Kim, Yoon. “Convolutional Neural Networks for Sentence Classification.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746–1751. Doha, Qatar: ACL, 2014.
Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” arXiv preprint arXiv:1907.11692, 2019.
Martin, Anca-Simina, and Simina-Maria Terian. “Știrile false: limite și perspective ale analizei lingvistice [Fake News: Limits and Perspectives of Linguistic Analysis].” Transilvania 49, no. 10 (2020): 72-77.
Masala, Mihai, Stefan Ruseti, and Mihai Dascalu. “RoBERT – A Romanian BERT Mode.” In Proceedings of COLING 2020. Online. Barcelona: ACL, 2020.
Pennycook, Gordon, Jonathon McPhetres, Yunhao Zhang, Jackson J. Lu, and David G. Rand. “Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-nudge Intervention.” Psychological Science 31, no. 7 (2020): 770–780.
PolitiFact. Retrieved November 2, 2020, from https://www.politifact.com/
Rubin, Victoria, Niall Conroy, Yimin Chen, and Sarah Cornwell. “Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News.” In Proceedings of the Second Workshop on Computational Approaches to Deception Detection, 7–17. San Diego,
Slovikovskaya, Valeriya, and Giuseppe Attardi. “Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task.” In Proceedings of the 12th Language Resources and Evaluation Conference, 1211–1218. Marseille: European Languages Resources Association, 2020.
Terian, Simina-Maria, Teodor-Mihai Cotet, Maria-Dorinela Sirbu, Mihai Dascalu, and Stefan Trausan-Matu. “Discourses of Economic Crisis in Romanian Media: An Automated Analysis Using the ReaderBench Framework.” Transylvanian Review 28, Suppl. 1 (2020): 245–260.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” In Advances in Neural Information Processing Systems, 5998–6008. New York: Curran Associates, 2017.
Yang, Zhilin, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V. Le. “XLNet: Generalized
Autoregressive Pretraining for Language Understanding.” In Advances in Neural Information Processing Systems 32, 5753–5763. New York: Curran Associates, 2019.
Wang, William Yang. “‘Liar, liar pants on fire’: A New Benchmark Dataset for Fake News Detection.” In ACL 2017: 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2, 422–426. Vancouver:
Zhang, Chao, Ashish Gupta, Christian Kauten, Amit V. Deokar, and Xiao Qin. “Detecting Fake News for Reducing Misinformation Risks Using Analytics Approaches.” European Journal of Operational Research 279, no. 3 (2019): 1036–1052.