The Effects of Pre-processing and Parameter Choices on Searches Through Large Gene Expression Data Collections

Gene expression microarray data collections contain in-formation that can shed light on a variety of systems-level biological problems, including the functional roles of proteins and the regulatory networks governing their transcription and translation. However, the analysis of these data is complicated by unusual noise characteris-tics and variation between experimental protocols and technologies. Many of the efforts to confront these diffi-culties utilize additional pre-processing strategies to ad-just the input data and/or alter parameter choices of their algorithmic approach. Here, we examine the effect of some of these techniques in the context of the SPELL similarity search algorithm. Our results demonstrate that pre-processing and parameter choices can greatly affect the performance of this approach. As such, these choices should be carefully considered and evaluated when per-forming a broad range of analyses of gene expression data.

