Case Study

Matching GPU Price Performance Using Amazon Instances With Intel® Xeon® Processors

Matching GPU Price Performance Using Amazon Instances With Intel® Xeon® Processors

Pages 1 Pages

Storm Reply, an IT consulting firm, needed a cost-effective and reliable hosting environment to deploy large language model (LLM) solutions for a major energy sector client. After evaluating options, they chose Amazon C7i-family instances powered by 4th Gen Intel® Xeon® Scalable processors, enhanced by Intel libraries and the open GenAI framework. Optimizations showed that LLM inference on these Intel-based instances matched GPU price performance. Using Intel’s tools, Storm Reply reduced Llama 2-13b model response time from 485 seconds to 92 seconds, highlighting significant gains in efficiency and cost savings for generative AI workloads.

Join for free to read