Research Areas

Computationally Uncovering Protein Roles

While we now know the sequence of many organisms (including mice and humans) we still know relatively little about the roles played by the genes and proteins encoded within these sequences. Researchers have generated a great deal of data that can potentially shed light on to the functions and processes performed by proteins, however, these datasets are generally noisy, heterogeneous, and very large. Our group develops and applies machine learning and data mining techniques to these data that overcome these challenges in order to form highly confident predictions of protein roles. We then take these predictions back to the lab bench with our collaborators to confirm their validity.


Related publications:

Search Algorithms and Data Organization

Among our efforts in data mining, our group develops similarity search algorithms in order to investigate biological hypotheses and create community resources of data that are easily searchable. With the huge expansion of data generation that has occurred over the past years, it has become impossible for researchers to understand, or even examine, all of the data publically available. Our group develops algorithms and systems designed to organize all of this available data and provide intuitive and useful interfaces to this data in order for researchers to find the information they need.


Related publications:

Gene Expression Analysis

Gene expression microarray technology has been responsible for much of the functional genomics data generated in recent years. These data promise to help researchers investigate the regulation and transcription of genes, which is vital for understanding the ultimate roles that proteins play within cells as well as developing diagnostic tests and finding new drug targets. Microarray data can be particularly difficult to analyze and comprehend due to unusual noise characteristics and variation between protocols and technologies. We are developing methods to harness large collections of microarray data that make it more accessible to researchers. Also, our approaches are readily adaptable to new technologies that can measure transcription, such as deep sequencing approaches.


Related publications:

Large-Scale Data Visualization

One of the best ways for researchers to understand their data is to visually look for patterns within that data. However, the scale of genome-wide datasets prevents traditional methods and devices from fully displaying these data. We are developing techniques that utilize large-scale display devices as well as traditional displays in order to show researchers the information that they need to extract from their data.


Related publications: