CortexLab

Multimodal fMRI brain encoding toolkit with GPU voxelwise ridge, causal modality lesion analysis, 3D brain visualization, and live inference

CortexLab extends Meta's TRIBE v2 foundation model for in-silico neuroscience. TRIBE v2 predicts fMRI brain activation from video, audio, and text inputs using a LLaMA 3.2-3B backbone. CortexLab adds the tooling researchers need to turn predictions into scientific conclusions: GPU-accelerated voxelwise ridge regression, causal modality lesion analysis, brain-alignment benchmarking with statistical testing, cognitive load scoring, temporal dynamics, ROI connectivity, streaming inference, and cross-subject adaptation.

The toolkit includes a brain-alignment benchmark (RSA, CKA, Procrustes with permutation tests, bootstrap CIs, FDR correction, and noise ceiling estimation) and a causal analysis pipeline that ablates individual input modalities to identify which modality each cortical region depends on. A GPU ridge encoder with torch + Triton backends enables population-scale voxelwise regression (200K voxels × alpha grid × CV folds). Foundation-model feature extractors (CLIP, SigLIP2, DINOv2, V-JEPA2, PaLiGemma2) provide baselines for representational alignment comparisons.

A futuristic Streamlit dashboard with glassmorphism UI features an interactive 3D brain viewer (rotatable fsaverage mesh with activation overlays), live brain prediction from webcam/screen/video, publication-quality 4-panel brain views, and 6 analysis pages. Biologically realistic synthetic data (HRF convolution, modality-specific activation) runs without GPU. 143 tests, 4 community contributors, published on PyPI (cortexlab-toolkit) and HuggingFace.

Key Highlights

          3
          Input Modalities (Video, Audio, Text)
        

          143
          Tests Passing
        

          5
          Foundation Models (CLIP, DINOv2, SigLIP2, V-JEPA2, PaLiGemma2)
        

          GPU
          Triton + torch Voxelwise Ridge
        

Architecture Details

Streaming Inference: Sliding-window predictor processes live feature streams with configurable window and step sizes for real-time brain activation prediction.
ROI Attention Maps: Extracts and visualizes attention patterns from the transformer backbone, showing which brain regions attend to which temporal moments.
Modality Attribution: Computes per-vertex importance scores for each input modality using ablation-based attribution, revealing what drives each brain region.
Cross-Subject Adaptation: Ridge regression and nearest-neighbour methods adapt the pretrained model to new subjects with minimal calibration data.
Brain-Alignment Benchmark: Quantitative framework with permutation tests and bootstrap confidence intervals to score how closely any AI model's representations match brain activation patterns.
Cognitive Load Scorer: Predicts visual complexity, auditory demand, language processing, executive load, and overall cognitive demand from brain activations.
Temporal Dynamics: Analyzes peak response latency per ROI, lag-shifted correlation between model features and brain responses, and sustained vs. transient response decomposition.
ROI Connectivity: Computes functional connectivity matrices, clusters brain regions into networks via agglomerative clustering, and derives graph metrics (degree centrality, modularity).
Performance: Gradient checkpointing, half-precision inference (FP16/BF16), ONNX export, and CUDA memory profiling.
GPU Voxelwise Ridge: Cross-validated voxelwise ridge encoder with torch and Triton backends. Fused Triton kernel batches per-alpha scaling and final matmul along voxel and alpha tiles, enabling population-scale runs (200K voxels × alpha grid × CV folds) on a single H200. Scikit-learn-compatible API, numerics match RidgeCV to 1e-5.
Causal Modality Lesion: Interventional analysis pipeline that fits a predictive encoder once then ablates individual input modalities (zero-mask and learned-mask) to measure per-voxel delta R². Identifies which cortical regions causally depend on each modality.
Noise Ceiling Estimation: Inter-subject (leave-one-subject-out) and split-half (Spearman-Brown corrected) ceiling estimators, with fraction-of-explainable-variance normalization.
Foundation Model Features: Five pretrained presets (CLIP ViT-L/14, SigLIP2 ViT-L, DINOv2 ViT-L, V-JEPA2 ViT-L, PaLiGemma2-3B) with lazy HuggingFace loading, configurable pooling, and caching helpers for representational-alignment baselines.
3D Brain Viewer: Interactive rotatable fsaverage brain with activation overlays, publication-quality 4-panel views (lateral, medial, dorsal), ROI highlighting, sulcal depth blending.
Live Inference: Real-time brain prediction from webcam, screen capture, or video file with live-updating 3D brain, cognitive load timeline, and FPS/latency metrics. Simulation mode works without GPU.
Futuristic Dashboard: Glassmorphism Streamlit app with 6 analysis pages, 3D brain hero, neon accents, cross-page state, file upload/export, and methodology documentation with references.
Production Infrastructure: Portable SLURM submission templates (env setup, smoke tests, ridge benchmark, feature extraction array job, lesion pipeline) and cross-backend benchmark harness.

Tech Stack

PyTorchTritonLLaMA 3.2TRIBE v2 CLIPDINOv2SigLIP2V-JEPA2 fMRInilearnPyVistaNumPy SciPyscikit-learnPyTorch Lightning HuggingFaceStreamlitPlotly ONNXOpenCVSLURM

View on GitHub Dashboard Live Demo HuggingFace Read the Blog Post