Inverse Document Frequency

Inverse Document Frequency (IDF) is a weighting scheme used in information retrieval and text mining to assess the importance of a word within a collection of documents, emphasizing words that are rare across the corpus but frequent within specific documents. Current research focuses on integrating IDF with various machine learning models, such as Naive Bayes, Support Vector Machines, and sentence transformers, to improve performance in tasks like dialect identification, stance detection, and requirement traceability. The effectiveness of IDF, often combined with Term Frequency (TF) as TF-IDF, continues to be explored and refined, particularly in addressing challenges related to vocabulary gaps, domain adaptation, and handling synonyms to enhance the accuracy and efficiency of text analysis applications.

Papers

November 18, 2024

Accelerating spherical K-means clustering for large-scale sparse document data
Kazuo Aoyama, Kazumi Saito
High Dimensional K Mean Sparse Model Full Length Document Inverse Document Frequency

November 15, 2024

Leveraging large language models for efficient representation learning for entity resolution
Xiaowei Xu, Bi T. Foua, Xingqiao Wang, Vivek Gunasekaran, John R. Talburt
Large Language Model Representation Learning Entity Matching Entity Resolution Inverse Document Frequency

July 18, 2024

June 20, 2024

Cross-level Requirement Traceability: A Novel Approach Integrating Bag-of-Words and Word Embedding for Enhanced Similarity Functionality
Baher Mohammad, Riad Sonbol, Ghaida Rebdawi
Real Text Word Tf Idf Bag of Word Cosine Similarity Similarity Search Software Requirement Copyright Traceability Inverse Document Frequency

July 12, 2023

Testing different Log Bases For Vector Model Weighting Technique
Kamel Assaf
Information Retrieval Tf Idf Dynamic Weight Inverse Document Frequency

May 30, 2023

November 28, 2022

Is it Required? Ranking the Skills Required for a Job-Title
Sarthak Anand, Jens-Joris Decorte, Niels Lowie
Weak Supervision Robust Skill Job Description Inverse Document Frequency

November 22, 2022

Method for Determining the Similarity of Text Documents for the Kazakh language, Taking Into Account Synonyms: Extension to TF-IDF
Bakhyt Bakiyev
Information Retrieval High Similarity Tf Idf Text Document Inverse Document Frequency Mean End Account

November 8, 2022

Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps
Hiroki Iida, Naoaki Okazaki
Language Model Domain Adaptation Unsupervised Domain Adaptation Sparse Retrieval Inverse Document Frequency Vocabulary Expansion Lexical Gap

October 14, 2022

Shadfa 0.1: The Iranian Movie Knowledge Graph and Graph-Embedding-Based Recommender System
Rayhane Pouyan, Hadi Kalamati, Hannane Ebrahimian, Mohammad Karrabi, Mohammad-R. Akbarzadeh-T
Recommender System Tf Idf Movie Dataset Inverse Document Frequency

May 27, 2022

A Sea of Words: An In-Depth Analysis of Anchors for Text Data
Gianluigi Lopardo, Frederic Precioso, Damien Garreau
Text Classification Word List Linear Classifier Tf Idf Text Data Depth Analysis Upper Ocean Interpretable Rule Visual Information Anchor Inverse Document Frequency

Inverse Document Frequency

Papers

Accelerating spherical K-means clustering for large-scale sparse document data

Leveraging large language models for efficient representation learning for entity resolution

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

dzStance at StanceEval2024: Arabic Stance Detection based on Sentence Transformers

Cross-level Requirement Traceability: A Novel Approach Integrating Bag-of-Words and Word Embedding for Enhanced Similarity Functionality

Testing different Log Bases For Vector Model Weighting Technique

Utilization of Multinomial Naive Bayes Algorithm and Term Frequency Inverse Document Frequency (TF-IDF Vectorizer) in Checking the Credibility of News Tweet in the Philippines

Utilizing Social Media Attributes for Enhanced Keyword Detection: An IDF-LDA Model Applied to Sina Weibo

Is it Required? Ranking the Skills Required for a Job-Title

Method for Determining the Similarity of Text Documents for the Kazakh language, Taking Into Account Synonyms: Extension to TF-IDF

Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps

Shadfa 0.1: The Iranian Movie Knowledge Graph and Graph-Embedding-Based Recommender System

A Sea of Words: An In-Depth Analysis of Anchors for Text Data