FRI Computational Biology Stream

Big Data in Biology

bioinformatics, computational biology

our research

Our Research

We explore methods to find biologically meaningful patterns from large scale data, typically generated by Next Generation Sequencing (NGS) technologies using bioinformatics algorithms, statistical tools and data visualizations. Currently, these technologies can produce ~2 terabases of data in one sequencing run, making efficient data storage, parsing, and analysis even more vital.

Big Data in Biology Research Projects

  • Brain Transcriptomics. Parsing through datasets to identify genes and pathways linked to addiction.
  • Plant Genomics. Assembling and annotating plant genomes to identify pathways of potential medicinal interest. 



Our Strategy

We explore methods for analyzing large-scale Next Generation Sequencing (NGS) datasets using computational algorithms, statistical tools, and supercomputers. The skills required for analysis of large-scale sequence data can be applied to answer many different biological questions.

Big Data in Biology Research Techniques

  • writing & programming in Python, R, and bash scripts
  • running computational analyses  and command line tools on unix servers
  • data clustering and making visualizations using large scale datasets (PCA, heat maps, volcano plots, etc.)
  • hypothesis testing (T-tests, ANOVA).

 We also focus on:

  • data analysis
  • scientific communication (written and verbal)
  • reading scientific articles and leading journal club discussions
  • collaboration and teamwork

Our Team

Profile image of Dhivya Arasappan

Dhivya Arasappan

  • Clinical Assistant Professor
  • Freshman Research Initiative
  • College of Natural Sciences

Research Educator | Big Data in Biology Stream

Profile image of Vishwanath Iyer

Vishwanath Iyer

  • Professor
  • Professor of Oncology, Dell Medical School
  • Molecular Biosciences
Building: MBB
Room Number:3.212A
View Profile



Our Impact

Advances in Next Generation Sequencing (NGS) technologies allow us to generate data at unprecedented speed and throughput. As a consequence, we can now study biological systems at the level of whole genomes and whole transcriptomes instead of at the single gene level. Importantly, this technology not only impacts research, but also how medical care is provided; hospitals will soon be generating sequence data for every patient who walks in the door in an effort to customize diagnosis and treatment to that patient. However, the biggest challenge for utilizing the power of such data is our limited ability to quickly and reliably obtain insights from this data.

Course Credit
Research Outcomes