Publication
Aspects of software naturalness through the generation of identifier names
Abstract
Modern-day programming can be viewed as a form of communication between the person who is writing code and the one reading it. Nevertheless, very often developers neglect readability of software, and even well-written code becomes less comprehensive through the course of software evolution. In this work, we study how naturalness of source code written in Pharo allows us to train machine learning models that extract semantic information from method’s body and map it to a short descriptive name. We collect a dataset of methods from 10 biggest projects written in Pharo and build an attention-based sequence to sequence network that generates method names by translating source code into a couple of English words. We evaluate our model on an independent test set and report the precision of over 50%. To our knowledge, this is the first application of machine learning and natural language processing to the source code of Pharo.
Keywords
software naturalness, identifier naming, pharo, machine learning
Links
BibTeX
@mastersthesis{zaitsev2019aspects,
title = {Aspects of software naturalness through the generation of identifier names},
author = {Zaitsev, Oleksandr},
year = {2019},
school = {Ukrainian Catholic University},
url = {https://er.ucu.edu.ua/items/f313b1b6-7a95-49f8-81c4-1eed951a7096}
}