Yaniv Romano

Yaniv Romano (Alum)
Yaniv Romano
Speeding Up Large Language Models Without Changing Their Answers
Technion - Israel Institute of Technology

Congratulations to Yaniv Romano, Zuckerman Israeli Postdoctoral Scholar (Alum) in Technion’s Computer Science Department, on publishing Accelerating Speculative Decoding with Block Diffusion Draft Trees in arXiv.  The study focuses on large language models (LLMs), AI systems behind tools like chatbots that generate text one word at a time. A major challenge is speed: producing each next word can be slow and computationally expensive. Dr. Romano and his team developed DDTree, which accelerates the process by using a smaller helper model to propose several likely continuations in a branching tree, then using the full model to confirm many of them efficiently in fewer steps. The key result preserves the full model’s exact outputs rather than introducing approximations.  This can lower the computational cost of deploying LLMs at scale, while improving latency and user experience in real world applications.

Abstract:
Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel.  (The study) introduces DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per position distributions of a block diffusion drafter.