Projects

BLOGS

& MORE

16 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / SPEECH SYNTHESIS / SELF-SUPERVISED LEARNING
DINO-VITS: A Self-Supervised Sidecar for Noise-Robust Zero-Shot Voice Cloning

DINO-VITS: A SELF-SUPERVISED SIDECAR FOR NOISE-ROBUST ZERO-SHOT VOICE CLONING

Pankov et al. attach a DINO self-supervised loss to the speaker encoder of a VITS-based zero-shot TTS system. Noise robustness improves; here is what survives an honest read.

15 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / LANGUAGE MODELS / REINFORCEMENT LEARNING / AGENTS
Training-Free GRPO: Doing RL on the Prompt, Not the Weights

TRAINING-FREE GRPO: DOING RL ON THE PROMPT, NOT THE WEIGHTS

Tencent's Youtu-Agent team adapts GRPO to a frozen LLM by replacing the gradient with a natural-language experience buffer, beating fine-tuned 32B models for around eighteen dollars.

14 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / LANGUAGE MODELS / INTERPRETABILITY / QUANTIZATION
The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

THE SPIKE, THE SPARSE AND THE SINK: ANATOMY OF MASSIVE ACTIVATIONS AND ATTENTION SINKS

Sun, Canziani, LeCun, and Zhu dissect why pre-norm LLMs grow giant outlier activations and attention sinks together, then show the two phenomena are decoupled architectural artifacts you can suppress independently.

13 / JOURNAL CLUB / GENOMICS / BIORXIV / LONG-READ SEQUENCING / RARE DISEASE
Portello: Making Global Assembly More Effective for Rare-Disease Whole Genome Sequencing

PORTELLO: MAKING GLOBAL ASSEMBLY MORE EFFECTIVE FOR RARE-DISEASE WHOLE GENOME SEQUENCING

Saunders et al. introduce portello: transfer HiFi read alignments from the sample's own de novo contigs onto GRCh38, and DeepVariant removes 47% of small-variant basecall errors compared with conventional read mapping.

12 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / LANGUAGE MODELS
Breaking Free from Context Limits: Recursive Language Models Explained

BREAKING FREE FROM CONTEXT LIMITS: RECURSIVE LANGUAGE MODELS EXPLAINED

"This entry is a summary of the paper "Recursive Language Models" where Zhang, Kraska & Khattab introduce an approach to scaling Large Language Models (LLMs) by adding conditional memory"

11 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / LANGUAGE MODELS
Foundation Models Improve Perturbation Response Prediction

FOUNDATION MODELS IMPROVE PERTURBATION RESPONSE PREDICTION

""This entry is a summary of the paper "Foundation Models Improve Perturbation Response Prediction" where Cole et al. tackles a central question in computational biology: can foundation models — large pretrained neural networks — actually help predict how cells respond to genetic or chemical perturbations? ""

10 / JOURNAL CLUB / MACHINE LEARNING / ARXIV / LANGUAGE MODELS
Breaking Free from Context Limits: Recursive Language Models Explained

BREAKING FREE FROM CONTEXT LIMITS: RECURSIVE LANGUAGE MODELS EXPLAINED

This entry is a summary of the paper "Recursive Language Models" by Zhang, Kraska & Khattab

09 / JOURNAL CLUB / MACHINE LEARNING / NEURIPS / LANGUAGE MODELS
Small Batch Training for Language Models: Why Simple SGD Works

SMALL BATCH TRAINING FOR LANGUAGE MODELS: WHY SIMPLE SGD WORKS

"This journal club blog reviews the paper "Small Batch Training for Language Models: Why Simple SGD Works" by Marek et al."

08 / DOCKER / PROTEIN FOLDING / BIOINFORMATICS
Boltz-1x: A Comprehensive Guide to Next-Generation Protein Structure Prediction Using Boltzmann-Inspired Deep Learning

BOLTZ-1X: A COMPREHENSIVE GUIDE TO NEXT-GENERATION PROTEIN STRUCTURE PREDICTION USING BOLTZMANN-INSPIRED DEEP LEARNING

This technical blog provides a complete tutorial for implementing Boltz-1x, a novel protein structure prediction model that combines Boltzmann-inspired architecture with modern deep learning, including Docker setup instructions and a practical demonstration predicting the GSK3A-FRAT1 protein complex with high accuracy (0.71 Å RMSD).

07 / MACHINE LEARNING / DEEP-LEARNING / STATE-SPACE-MODELS
Forecasting Bitcoin with Mamba State Space Models

FORECASTING BITCOIN WITH MAMBA STATE SPACE MODELS

An intuitive guide to forecasting minute-by-minute Bitcoin OHLCV data using the fast, memory-efficient Mamba State Space Model—from Docker setup and minimal preprocessing to PyTorch Lightning training and comparison against Transformer baselines.

06 / CLOUD / GCP / MACHINE LEARNING
Scaling Your ML Training with Vertex AI Custom Jobs on GCP

SCALING YOUR ML TRAINING WITH VERTEX AI CUSTOM JOBS ON GCP

This blog is a step-by-step guide to scaling machine learning training with Vertex AI Custom Jobs on Google Cloud, covering Docker image creation, data upload, job submission, and GPU optimization for efficient cloud-based workflows.

05 / MACHINE LEARNING / DEEP-LEARNING / FLASHATTENTION
FlashAttention: Accelerating Deep Learning with Docker

FLASHATTENTION: ACCELERATING DEEP LEARNING WITH DOCKER

A concise, step-by-step demo of how to containerize FlashAttention and train a simple autoregressive Transformer on minimally preprocessed Bitcoin minute-by-minute data.

04 / DOCKER / MACHINE LEARNING
Reproducibility in ML with Docker

REPRODUCIBILITY IN ML WITH DOCKER

Learn how to use Docker to ensure reproducibility in machine learning projects, from local development to production deployment.

03 / DOCKER / BIOINFORMATICS / GENOMICS
FastQC-RS: Quality Control for Omics Data

FASTQC-RS: QUALITY CONTROL FOR OMICS DATA

FastQC-RS is a modern, Rust-based tool for fast and efficient quality control of FASTQ files, delivering lightweight performance and detailed HTML reports—perfect for ensuring high-quality omics data in genomics and transcriptomics workflows. This guide walks you through Docker-based setup, usage, and key features.

02 / DOCKER / BIOINFORMATICS / GENOMICS
Dragen-GATK: High-Performance Variant Calling

DRAGEN-GATK: HIGH-PERFORMANCE VARIANT CALLING

Dragen-GATK combines Illumina’s hardware acceleration with GATK’s best-practice workflows to deliver ultra-fast, clinically robust germline variant calling. This guide walks you through Docker-based setup, sample analysis, and key parameters to optimize high-performance variant discovery in genomics projects.

01 / BIOINFORMATICS / DOCKER / GENOMICS
Speeding Up FASTQ Preprocessing with FastP

SPEEDING UP FASTQ PREPROCESSING WITH FASTP

FastP is an ultra-fast, all-in-one tool for trimming, filtering, and quality-checking FASTQ files, helping you quickly generate clean, high-quality datasets for genomics and transcriptomics projects. This guide walks you through installation, usage, and key features of FastP, making it an essential part of your NGS workflow.