Computational Biology Fellowship Curriculum

About This Curriculum

The Computational Biology Fellowship is a 12-week collaborative learning experience designed to build practical skills in bioinformatics and computational biology. Our curriculum focuses on hands-on projects, peer learning, and real-world applications.

Collaborative Development: As a founding cohort, participants will actively shape this curriculum based on collective interests and expertise. This framework serves as a starting point that we'll refine together.

The program is structured into three phases, each building on the previous:

Foundations (Weeks 1-4): Building core computational skills and biological context
Applications (Weeks 5-8): Applying techniques to biological problems and datasets
Projects (Weeks 9-12): Developing comprehensive projects in specialized areas

Phase 1: Foundations (Weeks 1-4)

Week 1: Programming Fundamentals for Biology

Learning Focus

Establishing or refreshing programming foundations with a biological perspective.

Core Topics

Python programming fundamentals for biological data
Working with biological data structures
Introduction to Jupyter notebooks for reproducible research
Version control with Git for collaborative projects
Navigating command line interfaces for bioinformatics tools

Collaborative Projects

Build a sequence analysis tool to perform basic DNA/RNA/protein sequence manipulation and statistics.

Week 2: Biological Data Structures & Databases

Learning Focus

Understanding how biological data is organized, stored, and accessed.

Core Topics

Common biological file formats (FASTA, FASTQ, GenBank, PDB)
Parsing and working with biological file formats
Introduction to major biological databases (NCBI, Ensembl, UniProt)
API access to biological data resources
Data quality assessment and preprocessing

Collaborative Projects

Develop a data fetching and processing pipeline that retrieves information from biological databases and prepares it for analysis.

Week 3: Sequence Alignment & Analysis

Learning Focus

Understanding the computational methods behind sequence comparison and analysis.

Core Topics

Principles of sequence alignment (local vs. global)
Alignment algorithms (Needleman-Wunsch, Smith-Waterman, BLAST)
Multiple sequence alignment techniques
Sequence homology and evolutionary relationships
Practical tools for sequence analysis (Biopython, BLAST)

Collaborative Projects

Implement a basic alignment algorithm and apply it to compare sequences from related species to identify conserved regions.

Week 4: Statistical Methods for Biological Data

Learning Focus

Applying appropriate statistical methods to derive insights from biological data.

Core Topics

Descriptive statistics for biological datasets
Statistical hypothesis testing in biological contexts
Multiple testing correction
Bootstrapping and permutation tests
Statistical packages in Python (NumPy, SciPy, statsmodels)

Collaborative Projects

Analyze a biological dataset to identify statistically significant patterns and present findings with appropriate visualizations.

Phase 2: Applications (Weeks 5-8)

Week 5: Genomics & Next-Generation Sequencing

Learning Focus

Working with modern genomic data and sequencing technologies.

Core Topics

Next-generation sequencing technologies and data characteristics
Genomic data processing pipelines
Variant calling and annotation
Genome assembly principles
Working with genomic intervals and features

Collaborative Projects

Implement a simple variant calling pipeline from raw sequencing data and annotate the identified variants.

Week 6: Transcriptomics & RNA-Seq Analysis

Learning Focus

Analyzing gene expression data from RNA sequencing experiments.

Core Topics

RNA-Seq experimental design and data characteristics
Alignment and quantification of RNA-Seq reads
Differential expression analysis
Functional enrichment analysis
Visualizing transcriptomic data

Collaborative Projects

Perform differential expression analysis on an RNA-Seq dataset and interpret the biological significance of the results.

Week 7: Protein Structure & Molecular Modeling

Learning Focus

Understanding computational approaches to protein structure analysis and prediction.

Core Topics

Principles of protein structure (primary to quaternary)
Protein structure visualization and analysis
Introduction to molecular dynamics
Protein-protein and protein-ligand interactions
Structure prediction methods and tools

Collaborative Projects

Analyze protein structures to identify binding sites and compare structural features across protein families.

Week 8: Machine Learning for Biological Data

Learning Focus

Applying machine learning techniques to extract patterns from complex biological datasets.

Core Topics

Machine learning fundamentals for biological applications
Supervised learning for classification and regression problems
Unsupervised learning for clustering and dimensionality reduction
Feature selection and engineering for biological data
Model evaluation and interpretation in biological contexts

Collaborative Projects

Develop a machine learning model to predict a biological property or outcome from molecular or genomic data.

Phase 3: Projects (Weeks 9-12)

Capstone Projects

In the final phase, participants will work on comprehensive projects that integrate multiple techniques and address complex biological questions. Projects will be developed based on participant interests and may include:

Genomic Analysis Pipeline: Build an end-to-end pipeline for analyzing genomic data from raw sequences to biological interpretation.
Protein Engineering Tool: Develop a tool to predict the effects of mutations on protein structure and function.
Gene Expression Classifier: Create a machine learning system to classify disease states based on gene expression profiles.
Biological Network Analysis: Implement methods to analyze and visualize biological networks such as protein-protein interactions or gene regulatory networks.
Drug Discovery Application: Build tools for virtual screening or target identification for drug discovery.

Project Development: Weeks 9-12 will follow a structured approach including project planning, implementation, iteration based on peer feedback, and final presentation. Fellows will both lead their own projects and contribute to others.

Learning Resources

The following resources will support our learning journey. As a collaborative community, we'll continuously expand this collection.

Programming & Data Science

Python programming fundamentals, data manipulation, and scientific computing libraries.

Python for Data Science
NumPy, Pandas, Matplotlib
Jupyter Notebooks

Bioinformatics Tools

Specialized libraries and frameworks for biological data analysis.

Biopython
Scikit-bio
BioConductor

Open Datasets

Publicly available biological datasets for learning and project development.

The Cancer Genome Atlas
1000 Genomes Project
Gene Expression Omnibus