Writeups

Technical deep dives into projects, algorithms, and things I'm building. Longer-form explorations beyond the project card summaries.

Building Tensorbit Core: A C++/CUDA Transformer Pruning Library

How I built a C++/CUDA pipeline for pruning 7B-parameter LLMs with structured N:M sparsity, Hessian-aware importance estimation, BlockOBS greedy pruning, and a custom binary container format (.tbm) for zero-copy GPU loading.

Cosine Similarity Distillation: Teacher-Free Knowledge Transfer via Random Projection Fingerprints

How I designed a novel distillation method that replaces live teacher forwarding with compact precomputed fingerprints, achieving 67x storage reduction with competitive accuracy on CIFAR-100.