Inverse Document Frequency
Inverse Document Frequency (IDF) is a weighting scheme used in information retrieval and text mining to assess the importance of a word within a collection of documents, emphasizing words that are rare across the corpus but frequent within specific documents. Current research focuses on integrating IDF with various machine learning models, such as Naive Bayes, Support Vector Machines, and sentence transformers, to improve performance in tasks like dialect identification, stance detection, and requirement traceability. The effectiveness of IDF, often combined with Term Frequency (TF) as TF-IDF, continues to be explored and refined, particularly in addressing challenges related to vocabulary gaps, domain adaptation, and handling synonyms to enhance the accuracy and efficiency of text analysis applications.
Papers
dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features
Mohamed Lichouri, Khaled Lounnas, Boualem Nadjib Zahaf, Mehdi Ayoub Rabiai
dzStance at StanceEval2024: Arabic Stance Detection based on Sentence Transformers
Mohamed Lichouri, Khaled Lounnas, Khelil Rafik Ouaras, Mohamed Abi, Anis Guechtouli
Utilization of Multinomial Naive Bayes Algorithm and Term Frequency Inverse Document Frequency (TF-IDF Vectorizer) in Checking the Credibility of News Tweet in the Philippines
Neil Christian R. Riego, Danny Bell Villarba
Utilizing Social Media Attributes for Enhanced Keyword Detection: An IDF-LDA Model Applied to Sina Weibo
Yifei Yue