Writeups
Technical deep dives into projects, algorithms, and things I'm building. Longer-form explorations beyond the project card summaries.
Building Tensorbit Core: A C++/CUDA Transformer Pruning Library
How I built a C++/CUDA pipeline for pruning 7B-parameter LLMs with structured N:M sparsity, Hessian-aware importance estimation, BlockOBS greedy pruning, and a custom binary container format (.tbm) for zero-copy GPU loading.
Cosine Similarity Distillation: Teacher-Free Knowledge Transfer via Random Projection Fingerprints
How I designed a novel distillation method that replaces live teacher forwarding with compact precomputed fingerprints, achieving 67x storage reduction with competitive accuracy on CIFAR-100.