White Paper

Unlocking on-device generative AI with an NPU and heterogeneous computing

Unlocking on-device generative AI with an NPU and heterogeneous computing

Pages 18 Pages

Qualcomm’s approach to on-device generative AI combines its custom-built Hexagon NPU with a heterogeneous computing architecture, including CPU, GPU, Sensing Hub, and memory subsystems. This enables efficient, low-power AI performance across diverse applications like voice assistants and real-time translation. The NPU delivers high-speed inference using fused scalar, vector, and tensor processing, while the Qualcomm AI Engine distributes workloads intelligently across processors. With a robust software stack and tools like INT4 quantization and AIMET, Qualcomm ensures developers can scale AI applications efficiently across billions of devices worldwide.

Join for free to read