Welcome to my research blog!

Here I share personal projects and research notes in deep learning, with a focus on optimization methods and model compression.

Posts

  • Fisher-based Optimizers in Deep Learning

    This post summarizes the natural-gradient view of deep learning optimization and reviews practical Fisher-based approximations for linear layers in chronological order. The central question is how methods such as KFAC, EKFAC, Shampoo and SOAP make structured preconditioning cheap enough to use in neural networks.

  • Properties of the Fisher Information Matrix

    This post introduces the Fisher Information Matrix and develops its main statistical and geometric properties. It concludes with a short discussion of what Fisher singularity means and how it arises in overparameterized models.

  • Introducing My Pruning Library

    LLM pruning research is often hindered by the engineering complexity of reproducing activation-aware methods, which usually require custom hooks and intricate layer-wise management. To lower the barrier for experimentation, I developed nn-pruning: a modular PyTorch toolkit that standardizes activation collection and benchmarking. By decoupling pruning logic from the underlying model infrastructure, the project allows researchers to implement and compare new algorithms like Wanda or SparseGPT with minimal boilerplate.

  • FROG: My attempt to create efficient second-order optimizer

    FROG (Fisher ROw-wise PreconditioninG) is a second-order optimizer based on row-wise Fisher preconditioning. It uses joint Conjugate Gradient solves to approximate natural-gradient updates with low computational overhead. Fisher trace–based normalization ensures scale-free updates. The method is applicable to linear and convolutional layers and requires only a small number of CG iterations in practice. Implementation is available at GitHub.

  • Unstructured Pruning Methods

    This note provides a personal mathematical deep-dive into unstructured pruning methods. I first cover one-shot methods including Optimal Brain Surgeon, SparseGPT, and Wanda, followed by training-based approaches such as Movement Pruning and oBERT. To my knowledge, this is a unique synthesis that provides both rigorous mathematical derivations and explicit connections between these disparate frameworks.