Paper ID: 2201.07201

Studying Popular Open Source Machine Learning Libraries and Their Cross-Ecosystem Bindings

Hao Li, Cor-Paul Bezemer

Open source machine learning (ML) libraries allow developers to integrate advanced ML functionality into their own applications. However, popular ML libraries, such as TensorFlow, are not available natively in all programming languages and software package ecosystems. Hence, developers who wish to use an ML library which is not available in their programming language or ecosystem of choice, may need to resort to using a so-called binding library. Binding libraries provide support across programming languages and package ecosystems for a source library. For example, the Keras .NET binding provides support for the Keras library in the NuGet (.NET) ecosystem even though the Keras library was written in Python. In this paper, we conduct an in-depth study of 155 cross-ecosystem bindings and their development for 36 popular open source ML libraries. Our study shows that for most popular ML libraries, only one package ecosystem is officially supported (usually PyPI). Cross-ecosystem support, which is available for 25% of the studied ML libraries, is usually provided through community-maintained bindings, e.g., 73% of the bindings in the npm ecosystem are community-maintained. Our study shows that the vast majority of the studied bindings cover only a small portion of the source library releases, and the delay for receiving support for a source library release is large.

Submitted: Jan 18, 2022