Back to Projects
CortexLab
Multimodal fMRI brain encoding toolkit with GPU voxelwise ridge, causal modality lesion analysis, 3D brain visualization, and live inference
CortexLab extends Meta's TRIBE v2 foundation model for in-silico neuroscience. TRIBE v2 predicts fMRI brain activation from video, audio, and text inputs using a LLaMA 3.2-3B backbone. CortexLab adds the tooling researchers need to turn predictions into scientific conclusions: GPU-accelerated voxelwise ridge regression, causal modality lesion analysis, brain-alignment benchmarking with statistical testing, cognitive load scoring, temporal dynamics, ROI connectivity, streaming inference, and cross-subject adaptation.
The toolkit includes a brain-alignment benchmark (RSA, CKA, Procrustes with permutation tests, bootstrap CIs, FDR correction, and noise ceiling estimation) and a causal analysis pipeline that ablates individual input modalities to identify which modality each cortical region depends on. A GPU ridge encoder with torch + Triton backends enables population-scale voxelwise regression (200K voxels × alpha grid × CV folds). Foundation-model feature extractors (CLIP, SigLIP2, DINOv2, V-JEPA2, PaLiGemma2) provide baselines for representational alignment comparisons.
A futuristic Streamlit dashboard with glassmorphism UI features an interactive 3D brain viewer (rotatable fsaverage mesh with activation overlays), live brain prediction from webcam/screen/video, publication-quality 4-panel brain views, and 6 analysis pages. Biologically realistic synthetic data (HRF convolution, modality-specific activation) runs without GPU. 143 tests, 4 community contributors, published on PyPI (cortexlab-toolkit) and HuggingFace.
Key Highlights
3
Input Modalities (Video, Audio, Text)
143
Tests Passing
5
Foundation Models (CLIP, DINOv2, SigLIP2, V-JEPA2, PaLiGemma2)
GPU
Triton + torch Voxelwise Ridge
Architecture Details
- Streaming Inference: Sliding-window predictor processes live feature streams with configurable window and step sizes for real-time brain activation prediction.
- ROI Attention Maps: Extracts and visualizes attention patterns from the transformer backbone, showing which brain regions attend to which temporal moments.
- Modality Attribution: Computes per-vertex importance scores for each input modality using ablation-based attribution, revealing what drives each brain region.
- Cross-Subject Adaptation: Ridge regression and nearest-neighbour methods adapt the pretrained model to new subjects with minimal calibration data.
- Brain-Alignment Benchmark: Quantitative framework with permutation tests and bootstrap confidence intervals to score how closely any AI model's representations match brain activation patterns.
- Cognitive Load Scorer: Predicts visual complexity, auditory demand, language processing, executive load, and overall cognitive demand from brain activations.
- Temporal Dynamics: Analyzes peak response latency per ROI, lag-shifted correlation between model features and brain responses, and sustained vs. transient response decomposition.
- ROI Connectivity: Computes functional connectivity matrices, clusters brain regions into networks via agglomerative clustering, and derives graph metrics (degree centrality, modularity).
- Performance: Gradient checkpointing, half-precision inference (FP16/BF16), ONNX export, and CUDA memory profiling.
- GPU Voxelwise Ridge: Cross-validated voxelwise ridge encoder with torch and Triton backends. Fused Triton kernel batches per-alpha scaling and final matmul along voxel and alpha tiles, enabling population-scale runs (200K voxels × alpha grid × CV folds) on a single H200. Scikit-learn-compatible API, numerics match RidgeCV to 1e-5.
- Causal Modality Lesion: Interventional analysis pipeline that fits a predictive encoder once then ablates individual input modalities (zero-mask and learned-mask) to measure per-voxel delta R². Identifies which cortical regions causally depend on each modality.
- Noise Ceiling Estimation: Inter-subject (leave-one-subject-out) and split-half (Spearman-Brown corrected) ceiling estimators, with fraction-of-explainable-variance normalization.
- Foundation Model Features: Five pretrained presets (CLIP ViT-L/14, SigLIP2 ViT-L, DINOv2 ViT-L, V-JEPA2 ViT-L, PaLiGemma2-3B) with lazy HuggingFace loading, configurable pooling, and caching helpers for representational-alignment baselines.
- 3D Brain Viewer: Interactive rotatable fsaverage brain with activation overlays, publication-quality 4-panel views (lateral, medial, dorsal), ROI highlighting, sulcal depth blending.
- Live Inference: Real-time brain prediction from webcam, screen capture, or video file with live-updating 3D brain, cognitive load timeline, and FPS/latency metrics. Simulation mode works without GPU.
- Futuristic Dashboard: Glassmorphism Streamlit app with 6 analysis pages, 3D brain hero, neon accents, cross-page state, file upload/export, and methodology documentation with references.
- Production Infrastructure: Portable SLURM submission templates (env setup, smoke tests, ridge benchmark, feature extraction array job, lesion pipeline) and cross-backend benchmark harness.
Tech Stack
PyTorchTritonLLaMA 3.2TRIBE v2
CLIPDINOv2SigLIP2V-JEPA2
fMRInilearnPyVistaNumPy
SciPyscikit-learnPyTorch Lightning
HuggingFaceStreamlitPlotly
ONNXOpenCVSLURM