Big Data in Biology

Our Research

We explore methods to find biologically meaningful patterns from large scale data, typically generated by Next Generation Sequencing (NGS) technologies using bioinformatics algorithms, statistical tools and data visualizations. Currently, these technologies can produce ~2 terabases of data in one sequencing run, making efficient data storage, parsing, and analysis even more vital.

Big Data in Biology Research Projects

Brain Transcriptomics. Parsing through datasets to identify genes and pathways linked to addiction.
Plant Genomics. Assembling and annotating plant genomes to identify pathways of potential medicinal interest.

Our Strategy

We explore methods for analyzing large-scale Next Generation Sequencing (NGS) datasets using computational algorithms, statistical tools, and supercomputers. The skills required for analysis of large-scale sequence data can be applied to answer many different biological questions.

Big Data in Biology Research Techniques

writing & programming in Python, R, and bash scripts
running computational analyses and command line tools on unix servers
data clustering and making visualizations using large scale datasets (PCA, heat maps, volcano plots, etc.)
hypothesis testing (T-tests, ANOVA).

We also focus on:

data analysis
scientific communication (written and verbal)
reading scientific articles and leading journal club discussions
collaboration and teamwork

Our Team

Dhivya Arasappan

Clinical Assistant Professor

Freshman Research Initiative
College of Natural Sciences

Research Educator | Big Data in Biology Stream

View Profile

Vishwanath Iyer

Professor

Molecular Biosciences
Interdisciplinary Life Sciences Graduate Programs

vishy@utexas.edu

MBB

3.212A

View Profile

Johann (Hans) Hofmann

Professor

Integrative Biology
Texas Field Station Network
Biodiversity Center
Interdisciplinary Life Sciences Graduate Programs

Recruiting Students 26-27 Academic Year

hans@utexas.edu

PAT

141

View Profile

Our Impact

Advances in Next Generation Sequencing (NGS) technologies allow us to generate data at unprecedented speed and throughput. As a consequence, we can now study biological systems at the level of whole genomes and whole transcriptomes instead of at the single gene level. Importantly, this technology not only impacts research, but also how medical care is provided; hospitals will soon be generating sequence data for every patient who walks in the door in an effort to customize diagnosis and treatment to that patient. However, the biggest challenge for utilizing the power of such data is our limited ability to quickly and reliably obtain insights from this data.

Resources

Course Credit

Research Outcomes

Arasappan, D., Eickhoff, S.B., Nemeroff, C.B. et al. Transcription Factor Motifs Associated with Anterior Insula Gene Expression Underlying Mood Disorder Phenotypes. Mol. Neurobiol. 2021, 58, 1978–1989.
Pugalenthi, L., Nanduri, R., et al. Structural variant detection tools struggle with whole exam (WES) data.
Richardson, J., Pritha, J., Jiang, W., et al. Finding expressed mutations in multiple myeloma cell lines. TACCster conference proceedings, 2020.