About This Curriculum

The Computational Biology Fellowship is a 12-week collaborative learning experience designed to build practical skills in bioinformatics and computational biology. Our curriculum focuses on hands-on projects, peer learning, and real-world applications.

Collaborative Development: As a founding cohort, participants will actively shape this curriculum based on collective interests and expertise. This framework serves as a starting point that we'll refine together.

The program is structured into three phases, each building on the previous:

  • Foundations (Weeks 1-4): Building core computational skills and biological context
  • Applications (Weeks 5-8): Applying techniques to biological problems and datasets
  • Projects (Weeks 9-12): Developing comprehensive projects in specialized areas
Phase 1: Foundations (Weeks 1-4)

Week 1: Programming Fundamentals for Biology

Learning Focus

Establishing or refreshing programming foundations with a biological perspective.

Core Topics

  • Python programming fundamentals for biological data
  • Working with biological data structures
  • Introduction to Jupyter notebooks for reproducible research
  • Version control with Git for collaborative projects
  • Navigating command line interfaces for bioinformatics tools

Collaborative Projects

Build a sequence analysis tool to perform basic DNA/RNA/protein sequence manipulation and statistics.

Week 2: Biological Data Structures & Databases

Learning Focus

Understanding how biological data is organized, stored, and accessed.

Core Topics

  • Common biological file formats (FASTA, FASTQ, GenBank, PDB)
  • Parsing and working with biological file formats
  • Introduction to major biological databases (NCBI, Ensembl, UniProt)
  • API access to biological data resources
  • Data quality assessment and preprocessing

Collaborative Projects

Develop a data fetching and processing pipeline that retrieves information from biological databases and prepares it for analysis.

Week 3: Sequence Alignment & Analysis

Learning Focus

Understanding the computational methods behind sequence comparison and analysis.

Core Topics

  • Principles of sequence alignment (local vs. global)
  • Alignment algorithms (Needleman-Wunsch, Smith-Waterman, BLAST)
  • Multiple sequence alignment techniques
  • Sequence homology and evolutionary relationships
  • Practical tools for sequence analysis (Biopython, BLAST)

Collaborative Projects

Implement a basic alignment algorithm and apply it to compare sequences from related species to identify conserved regions.

Week 4: Statistical Methods for Biological Data

Learning Focus

Applying appropriate statistical methods to derive insights from biological data.

Core Topics

  • Descriptive statistics for biological datasets
  • Statistical hypothesis testing in biological contexts
  • Multiple testing correction
  • Bootstrapping and permutation tests
  • Statistical packages in Python (NumPy, SciPy, statsmodels)

Collaborative Projects

Analyze a biological dataset to identify statistically significant patterns and present findings with appropriate visualizations.

Phase 2: Applications (Weeks 5-8)

Week 5: Genomics & Next-Generation Sequencing

Learning Focus

Working with modern genomic data and sequencing technologies.

Core Topics

  • Next-generation sequencing technologies and data characteristics
  • Genomic data processing pipelines
  • Variant calling and annotation
  • Genome assembly principles
  • Working with genomic intervals and features

Collaborative Projects

Implement a simple variant calling pipeline from raw sequencing data and annotate the identified variants.

Week 6: Transcriptomics & RNA-Seq Analysis

Learning Focus

Analyzing gene expression data from RNA sequencing experiments.

Core Topics

  • RNA-Seq experimental design and data characteristics
  • Alignment and quantification of RNA-Seq reads
  • Differential expression analysis
  • Functional enrichment analysis
  • Visualizing transcriptomic data

Collaborative Projects

Perform differential expression analysis on an RNA-Seq dataset and interpret the biological significance of the results.

Week 7: Protein Structure & Molecular Modeling

Learning Focus

Understanding computational approaches to protein structure analysis and prediction.

Core Topics

  • Principles of protein structure (primary to quaternary)
  • Protein structure visualization and analysis
  • Introduction to molecular dynamics
  • Protein-protein and protein-ligand interactions
  • Structure prediction methods and tools

Collaborative Projects

Analyze protein structures to identify binding sites and compare structural features across protein families.

Week 8: Machine Learning for Biological Data

Learning Focus

Applying machine learning techniques to extract patterns from complex biological datasets.

Core Topics

  • Machine learning fundamentals for biological applications
  • Supervised learning for classification and regression problems
  • Unsupervised learning for clustering and dimensionality reduction
  • Feature selection and engineering for biological data
  • Model evaluation and interpretation in biological contexts

Collaborative Projects

Develop a machine learning model to predict a biological property or outcome from molecular or genomic data.

Phase 3: Projects (Weeks 9-12)

Capstone Projects

In the final phase, participants will work on comprehensive projects that integrate multiple techniques and address complex biological questions. Projects will be developed based on participant interests and may include:

  • Genomic Analysis Pipeline: Build an end-to-end pipeline for analyzing genomic data from raw sequences to biological interpretation.
  • Protein Engineering Tool: Develop a tool to predict the effects of mutations on protein structure and function.
  • Gene Expression Classifier: Create a machine learning system to classify disease states based on gene expression profiles.
  • Biological Network Analysis: Implement methods to analyze and visualize biological networks such as protein-protein interactions or gene regulatory networks.
  • Drug Discovery Application: Build tools for virtual screening or target identification for drug discovery.
Project Development: Weeks 9-12 will follow a structured approach including project planning, implementation, iteration based on peer feedback, and final presentation. Fellows will both lead their own projects and contribute to others.

Learning Resources

The following resources will support our learning journey. As a collaborative community, we'll continuously expand this collection.

Programming & Data Science

Python programming fundamentals, data manipulation, and scientific computing libraries.

  • Python for Data Science
  • NumPy, Pandas, Matplotlib
  • Jupyter Notebooks

Bioinformatics Tools

Specialized libraries and frameworks for biological data analysis.

  • Biopython
  • Scikit-bio
  • BioConductor

Open Datasets

Publicly available biological datasets for learning and project development.

  • The Cancer Genome Atlas
  • 1000 Genomes Project
  • Gene Expression Omnibus