Next Article in Journal
Foreword to the Special Issue: “Semantics for Big Data Integration”
Next Article in Special Issue
Evolution, Robustness and Generality of a Team of Simple Agents with Asymmetric Morphology in Predator-Prey Pursuit Problem
Previous Article in Journal
Design and Comparative Study of Advanced Adaptive Control Schemes for Position Control of Electronic Throttle Valve
Previous Article in Special Issue
MOLI: Smart Conversation Agent for Mobile Customer Service
Open AccessArticle

Automatic Acquisition of Annotated Training Corpora for Test-Code Generation

Innovation Exchange, IBM Ireland, Dublin 4, Ireland
ADAPT Centre & ICE Research Institute, Technological University Dublin, Dublin 2, D08 X622, Ireland
Author to whom correspondence should be addressed.
Information 2019, 10(2), 66;
Received: 21 January 2019 / Revised: 9 February 2019 / Accepted: 13 February 2019 / Published: 17 February 2019
(This article belongs to the Special Issue Artificial Intelligence—Methodology, Systems, and Applications)
Open software repositories make large amounts of source code publicly available. Potentially, this source code could be used as training data to develop new, machine learning-based programming tools. For many applications, however, raw code scraped from online repositories does not constitute an adequate training dataset. Building on the recent and rapid improvements in machine translation (MT), one possibly very interesting application is code generation from natural language descriptions. One of the bottlenecks in developing these MT-inspired systems is the acquisition of parallel text-code corpora required for training code-generative models. This paper addresses the problem of automatically synthetizing parallel text-code corpora in the software testing domain. Our approach is based on the observation that self-documentation through descriptive method names is widely adopted in test automation, in particular for unit testing. Therefore, we propose synthesizing parallel corpora comprised of parsed test function names serving as code descriptions, aligned with the corresponding function bodies. We present the results of applying one of the state-of-the-art MT methods on such a generated dataset. Our experiments show that a neural MT model trained on our dataset can generate syntactically correct and semantically relevant short Java functions from quasi-natural language descriptions of functionality. View Full-Text
Keywords: test automation; code generation; neural machine translation; naturalness of software; statistical semantics test automation; code generation; neural machine translation; naturalness of software; statistical semantics
Show Figures

Figure 1

MDPI and ACS Style

Kacmajor, M.; Kelleher, J.D. Automatic Acquisition of Annotated Training Corpora for Test-Code Generation. Information 2019, 10, 66.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop