File Classification

File classification aims to automatically identify the type of a file, crucial for various applications like digital forensics, e-commerce, and malware detection. Current research focuses on improving classification speed and accuracy using diverse approaches, including lightweight supervised learning models for fast processing of file names, convolutional neural networks analyzing byte sequences as images to capture intra-byte information, and transformer-based models adapted for efficient handling of large files via correlated multiple instance learning. These advancements are significant for enhancing the speed and reliability of file analysis across numerous domains, improving security and enabling more efficient data management.

Papers