Robotics and Intelligent Machines
Das, Arnav (School: Hiranandani Foundation School - Powai)
With the exponential growth of data on the internet, it has become nearly impossible to identify whether a given piece of information is legitimate or not. This has led to an immense rise in the dissemination of unchecked, fabricated information online. Most fact-checking methods involving humans are both laborious and expensive, and most implemented computational methods rely either on stance detection or isolated content analysis. This project employs purely computational independent techniques to determine the credibility of information based on analyses on two levels- content and source (specifically, websites engaging in disinformation)- using Ensemble Learning models combined with Neural Networks. The program scrapes a website to obtain the page's data, on which standard preprocessing techniques are employed to engineer features for the models. For the source level analysis, the architecture, metadata and media content of the webpage are analyzed by a Nested Ensemble of 6-10 different supervised learners voting amongst each other. For the content level analysis, a Deep Neural Network specifically performs sentiment aware stylistic analysis on language-based attributes of the published text to make its classifications. Transfer learning is employed through vectorization using BERT and GloVe’s text embeddings. The final meta-classifier combines these two approaches and can correctly detect websites on testing datasets with an accuracy of over 90%, simply by taking the URL as an input, thereby ensuring minimal friction for the user. In conclusion, these models acting together succeed in identifying fake news websites reliably in real time .