Paper ID: 2111.13962

Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack Overflow

AmirHossein Naghshzan, Latifa Guerrouj, Olga Baysal

Automated source code summarization is a task that generates summarized information about the purpose, usage, and--or implementation of methods and classes to support understanding of these code entities. Multiple approaches and techniques have been proposed for supervised and unsupervised learning in code summarization, however, they were mostly focused on generating a summary for a piece of code. In addition, very few works have leveraged unofficial documentation. This paper proposes an automatic and novel approach for summarizing Android API methods discussed in Stack Overflow that we consider as unofficial documentation in this research. Our approach takes the API method's name as an input and generates a natural language summary based on Stack Overflow discussions of that API method. We have conducted a survey that involves 16 Android developers to evaluate the quality of our automatically generated summaries and compare them with the official Android documentation. Our results demonstrate that while developers find the official documentation more useful in general, the generated summaries are also competitive, in particular for offering implementation details, and can be used as a complementary source for guiding developers in software development and maintenance tasks.

Submitted: Nov 27, 2021