Unlocking on-device generative AI with an NPU and heterogeneous computing

18 Pages

Qualcomm’s approach to on-device generative AI combines its custom-built Hexagon NPU with a heterogeneous computing architecture, including CPU, GPU, Sensing Hub, and memory subsystems. This enables efficient, low-power AI performance across diverse applications like voice assistants and real-time translation. The NPU delivers high-speed inference using fused scalar, vector, and tensor processing, while the Qualcomm AI Engine distributes workloads intelligently across processors. With a robust software stack and tools like INT4 quantization and AIMET, Qualcomm ensures developers can scale AI applications efficiently across billions of devices worldwide.

Join for free to read

White Paper Unlocking Generative AI Success with Quality Engineering

Ebook UNLOCKING STRATEGIC WITH GENERATIVE AI

Guide Guide to Generative AI

Ebook Generative AI –The Persistent Way

More from Qualcomm

White Paper The future of AI is hybrid

White Paper AI disruption is driving innovation in on-device inference

White Paper Vision, market drivers, and research directions on the path to…

White Paper Augmented and Virtual Reality: the First Wave of 5G Killer Apps

White Paper

Unlocking on-device generative AI with an NPU and heterogeneous computing

Unlocking on-device generative AI with an NPU and heterogeneous computing

You Might Also Like

More from Qualcomm