Computer Science and Engineering, Department of

 

Date of this Version

6-2012

Abstract

Searching for code is a common task among programmers, with the ultimate goal of finding and reusing code or getting ideas for implementation. While the process of searching for code - issuing a query and selecting a relevant match - is straightforward, several costs must be balanced, including the costs of specifying the query, examining the results to find desired code, and not finding a relevant result. For the popular syntactic searches the query cost is quite low, but the results are often vague or irrelevant, so the examination cost is high and matches may not be found. Semantic searches may return more relevant results, but current techniques that involve writing complex specifications or executing code against test cases are costly to the developer, and close matches cannot be easily identified. In this work, we address these limitations and propose an approach for semantic search in which developers specify lightweight, incomplete specifications and an SMT solver automatically identifies programs from a repository that match the specifications. The program repository is automatically encoded as constraints offline so the search for programs is efficient. The program encodings cover various levels of abstraction to enable partial matches when no, or few, exact matches exists. We present empirical evidence showing the lightweight specifications can be accurately defined by developers, instantiate this approach on a subset of the Yahoo! Pipes mashup language, and outline extensions to other programming languages.