Generative AI on Kubernetes

396 Pages

This early release O’Reilly book focuses on the real-world challenge of running large language models in production using Kubernetes. Rather than diving deep into model theory, it tackles deployment, GPU scheduling, scaling, observability, and tuning. The authors break down inference phases, memory demands, KV caching, and production readiness concerns such as latency and cost control. It also explores agentic workflows and AI-driven applications. The core message is clear: Kubernetes can handle LLM workloads, but doing it well requires new patterns, smarter resource management, and strong operational discipline.

Join for free to read

White Paper Generative AI on AWS

White Paper Generative AI

Ebook Generative AI

White Paper GENERATIVE AI – RECHARGING DEVELOPER PRODUCTIVITY

More from RedHat

White Paper Top considerations for building a foundation for generative AI

Ebook Generative AI in action

Vendor Sheet Build on an Integrated AI platform with Red Hat AI Enterprise

Vendor Sheet Red Hat OpenShift Kubernetes Engine

White Paper

Generative AI on Kubernetes

Generative AI on Kubernetes

You Might Also Like

More from RedHat