Computational Biology Fellowship Curriculum
A collaborative learning journey exploring the intersection of biology and computational methods
About This Curriculum
The Computational Biology Fellowship is a 12-week collaborative learning experience designed to build practical skills in bioinformatics and computational biology. Our curriculum focuses on hands-on projects, peer learning, and real-world applications.
The program is structured into three phases, each building on the previous:
- Foundations (Weeks 1-4): Building core computational skills and biological context
- Applications (Weeks 5-8): Applying techniques to biological problems and datasets
- Projects (Weeks 9-12): Developing comprehensive projects in specialized areas
Week 1: Programming Fundamentals for Biology
Learning Focus
Establishing or refreshing programming foundations with a biological perspective.
Core Topics
- Python programming fundamentals for biological data
- Working with biological data structures
- Introduction to Jupyter notebooks for reproducible research
- Version control with Git for collaborative projects
- Navigating command line interfaces for bioinformatics tools
Collaborative Projects
Build a sequence analysis tool to perform basic DNA/RNA/protein sequence manipulation and statistics.
Week 2: Biological Data Structures & Databases
Learning Focus
Understanding how biological data is organized, stored, and accessed.
Core Topics
- Common biological file formats (FASTA, FASTQ, GenBank, PDB)
- Parsing and working with biological file formats
- Introduction to major biological databases (NCBI, Ensembl, UniProt)
- API access to biological data resources
- Data quality assessment and preprocessing
Collaborative Projects
Develop a data fetching and processing pipeline that retrieves information from biological databases and prepares it for analysis.
Week 3: Sequence Alignment & Analysis
Learning Focus
Understanding the computational methods behind sequence comparison and analysis.
Core Topics
- Principles of sequence alignment (local vs. global)
- Alignment algorithms (Needleman-Wunsch, Smith-Waterman, BLAST)
- Multiple sequence alignment techniques
- Sequence homology and evolutionary relationships
- Practical tools for sequence analysis (Biopython, BLAST)
Collaborative Projects
Implement a basic alignment algorithm and apply it to compare sequences from related species to identify conserved regions.
Week 4: Statistical Methods for Biological Data
Learning Focus
Applying appropriate statistical methods to derive insights from biological data.
Core Topics
- Descriptive statistics for biological datasets
- Statistical hypothesis testing in biological contexts
- Multiple testing correction
- Bootstrapping and permutation tests
- Statistical packages in Python (NumPy, SciPy, statsmodels)
Collaborative Projects
Analyze a biological dataset to identify statistically significant patterns and present findings with appropriate visualizations.
Week 5: Genomics & Next-Generation Sequencing
Learning Focus
Working with modern genomic data and sequencing technologies.
Core Topics
- Next-generation sequencing technologies and data characteristics
- Genomic data processing pipelines
- Variant calling and annotation
- Genome assembly principles
- Working with genomic intervals and features
Collaborative Projects
Implement a simple variant calling pipeline from raw sequencing data and annotate the identified variants.
Week 6: Transcriptomics & RNA-Seq Analysis
Learning Focus
Analyzing gene expression data from RNA sequencing experiments.
Core Topics
- RNA-Seq experimental design and data characteristics
- Alignment and quantification of RNA-Seq reads
- Differential expression analysis
- Functional enrichment analysis
- Visualizing transcriptomic data
Collaborative Projects
Perform differential expression analysis on an RNA-Seq dataset and interpret the biological significance of the results.
Week 7: Protein Structure & Molecular Modeling
Learning Focus
Understanding computational approaches to protein structure analysis and prediction.
Core Topics
- Principles of protein structure (primary to quaternary)
- Protein structure visualization and analysis
- Introduction to molecular dynamics
- Protein-protein and protein-ligand interactions
- Structure prediction methods and tools
Collaborative Projects
Analyze protein structures to identify binding sites and compare structural features across protein families.
Week 8: Machine Learning for Biological Data
Learning Focus
Applying machine learning techniques to extract patterns from complex biological datasets.
Core Topics
- Machine learning fundamentals for biological applications
- Supervised learning for classification and regression problems
- Unsupervised learning for clustering and dimensionality reduction
- Feature selection and engineering for biological data
- Model evaluation and interpretation in biological contexts
Collaborative Projects
Develop a machine learning model to predict a biological property or outcome from molecular or genomic data.
Capstone Projects
In the final phase, participants will work on comprehensive projects that integrate multiple techniques and address complex biological questions. Projects will be developed based on participant interests and may include:
- Genomic Analysis Pipeline: Build an end-to-end pipeline for analyzing genomic data from raw sequences to biological interpretation.
- Protein Engineering Tool: Develop a tool to predict the effects of mutations on protein structure and function.
- Gene Expression Classifier: Create a machine learning system to classify disease states based on gene expression profiles.
- Biological Network Analysis: Implement methods to analyze and visualize biological networks such as protein-protein interactions or gene regulatory networks.
- Drug Discovery Application: Build tools for virtual screening or target identification for drug discovery.
Learning Resources
The following resources will support our learning journey. As a collaborative community, we'll continuously expand this collection.
Programming & Data Science
Python programming fundamentals, data manipulation, and scientific computing libraries.
- Python for Data Science
- NumPy, Pandas, Matplotlib
- Jupyter Notebooks
Bioinformatics Tools
Specialized libraries and frameworks for biological data analysis.
- Biopython
- Scikit-bio
- BioConductor
Open Datasets
Publicly available biological datasets for learning and project development.
- The Cancer Genome Atlas
- 1000 Genomes Project
- Gene Expression Omnibus