Chris Matteson: Cost and Performance Optimization of LLM Inferencing

Chris Matteson is the Head of Solutions Sales at Fermyon. His background includes stints as a sysadmin, sales engineer, and consultant. He was an early employee of HashiCorp where he built Partner Solutions Engineering, and then subsequently joined Prisma to build Sales and Solutions Engineering for the rapidly growing ORM. Now at Fermyon he advocates for the next wave of cloud compute by helping users and partners find success with WebAssembly.

Cost and Performance Optimization of LLM Inferencing

This session provides an overview of the architectures and associated considerations when running AI models and inferencing (i.e., together, “AI Applications”) for various use cases, delving more deeply into the use case of most interest to enterprises: event-driven, user-interactive AI applications.

Finally, we will dive into common pricing models that arise from these architectures in light of their respective abilities to control costs based on the architectures.

Chris Matteson: Cost and Performance Optimization of LLM Inferencing

Cost and Performance Optimization of LLM Inferencing

Recent Posts