Paper ID: 2409.12289

MetaPix: A Data-Centric AI Development Platform for Efficient Management and Utilization of Unstructured Computer Vision Data

Sai Vishwanath Venkatesh, Atra Akandeh, Madhu Lokanath

In today's world of advanced AI technologies, data management is a critical component of any AI/ML solution. Effective data management is vital for the creation and maintenance of high-quality, diverse datasets, which significantly enhance predictive capabilities and lead to smarter business solutions. In this work, we introduce MetaPix, a Data-centric AI platform offering comprehensive data management solutions specifically designed for unstructured data. MetaPix offers robust tools for data ingestion, processing, storage, versioning, governance, and discovery. The platform operates on four key concepts: DataSources, Datasets, Extensions and Extractors. A DataSource serves as MetaPix top level asset, representing a narrow-scoped source of data for a specific use. Datasets are MetaPix second level object, structured collections of data. Extractors are internal tools integrated into MetaPix's backend processing, facilitate data processing and enhancement. Additionally, MetaPix supports extensions, enabling integration with external third-party tools to enhance platform functionality. This paper delves into each MetaPix concept in detail, illustrating how they collectively contribute to the platform's objectives. By providing a comprehensive solution for managing and utilizing unstructured computer vision data, MetaPix equips organizations with a powerful toolset to develop AI applications effectively.

Submitted: Sep 18, 2024