Our Research
We explore methods to find biologically meaningful patterns from large scale data, typically generated by Next Generation Sequencing (NGS) technologies using bioinformatics algorithms, statistical tools and data visualizations. Currently, these technologies can produce ~2 terabases of data in one sequencing run, making efficient data storage, parsing, and analysis even more vital.
Big Data in Biology Research Projects
- Brain Transcriptomics. Parsing through datasets to identify genes and pathways linked to addiction.
- Plant Genomics. Assembling and annotating plant genomes to identify pathways of potential medicinal interest.
Our Strategy
We explore methods for analyzing large-scale Next Generation Sequencing (NGS) datasets using computational algorithms, statistical tools, and supercomputers. The skills required for analysis of large-scale sequence data can be applied to answer many different biological questions.
Big Data in Biology Research Techniques
- writing & programming in Python, R, and bash scripts
- running computational analyses and command line tools on unix servers
- data clustering and making visualizations using large scale datasets (PCA, heat maps, volcano plots, etc.)
- hypothesis testing (T-tests, ANOVA).
We also focus on:
- data analysis
- scientific communication (written and verbal)
- reading scientific articles and leading journal club discussions
- collaboration and teamwork
Our Team
Dhivya Arasappan
- Clinical Assistant Professor
- Freshman Research Initiative
- College of Natural Sciences
Research Educator | Big Data in Biology Stream
Vishwanath Iyer
- Professor
- Professor of Oncology, Dell Medical School
- Molecular Biosciences
Resources
Our Impact
Advances in Next Generation Sequencing (NGS) technologies allow us to generate data at unprecedented speed and throughput. As a consequence, we can now study biological systems at the level of whole genomes and whole transcriptomes instead of at the single gene level. Importantly, this technology not only impacts research, but also how medical care is provided; hospitals will soon be generating sequence data for every patient who walks in the door in an effort to customize diagnosis and treatment to that patient. However, the biggest challenge for utilizing the power of such data is our limited ability to quickly and reliably obtain insights from this data.
Course Credit
Research Outcomes
- Arasappan, D., Eickhoff, S.B., Nemeroff, C.B. et al. Transcription Factor Motifs Associated with Anterior Insula Gene Expression Underlying Mood Disorder Phenotypes. Mol. Neurobiol. 2021, 58, 1978–1989.
- Pugalenthi, L., Nanduri, R., et al. Structural variant detection tools struggle with whole exam (WES) data.
- Richardson, J., Pritha, J., Jiang, W., et al. Finding expressed mutations in multiple myeloma cell lines. TACCster conference proceedings, 2020.