Applications of information retrieval to software development

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering challenges during the initial development process is presented.

Download Free PDF View PDF

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering (SE) challenges during the initial development process is presented.

Download Free PDF View PDF

Mining textual artifacts is important for a large array of software engineering tasks: software reuse, software maintenance, software quality assurance, to name a few. Much of the work on mining software repositories has tended to " exclude " such non-structured artifacts. At the same time, we find these items to be rich in semantic information and feel that mining techniques should treat text as software and address their efficient mining. We investigate the application of information retrieval (IR) techniques to the tracing of textual elements in the software repository. Some textual mining activities are very critical (e.g., tracing artifacts to assure satisfaction of safety requirements) and require analyst participation. We describe our approach to eliciting and processing analyst feedback for the tracing of textual elements of a repository. We then present a study that shows that standard IR methods combined with analyst feedback outperform IR methods alone in terms of coverage (recall-did we find all the relevant links?) and signal-to-noise ratio (precision-were the links we found relevant?). With the analyst " in the loop, " it is necessary to ensure that the tracing software possesses quality from the perspective of the analyst. We examined standard measures for evaluating IR methods and found that they do not always suffice for examining a tool from the analyst's perspective. To address this, we developed a set of secondary measures for evaluating the tracing software. We show, by counterexamples from two projects, that standard measures alone do not provide the detail necessary for adequately evaluating mining tools from the analyst's perspective.

Download Free PDF View PDF