Ali Hadizadeh
Efficient AI | LLM Acceleration
About
I am a co-founder and Lead Scientist at ByteShape, building systems that make AI models faster, cheaper, and more efficient to deploy. I earned my Ph.D. in Computer Engineering from the University of Toronto.
My background is in model compression, quantization, and large-scale optimization techniques that enable high-quality training and inference with minimal computational cost. In particular, I focus on optimizing generative models such as LLMs and diffusion-based architectures to run efficiently on commodity hardware, lowering the barrier so more people can actually use AI, whether on everyday devices or at scale in datacenters.
Through my work at ByteShape, I develop automatic, data-driven quantization methods that adapt to both model architecture and target hardware. Instead of relying on fixed, hand-tuned recipes, these methods learn how to tune each model to perform optimally on a given hardware platform, producing different quantization strategies for different deployment targets.
A representative example of my research is GOBO, one of the first post-training, outlier-aware quantization approaches for large language models. At a time when INT8 was still considered "aggressive," this work showed that with careful handling of outlier values, models could be pushed to very low bit lengths, around 3 bits, while preserving output quality. These low bit-width representations enable significantly faster and more energy-efficient inference. My research has appeared at top-tier systems and machine learning conferences, including ISCA, MICRO, ASPLOS, and MLSys.