Escape

Cut costs, not corners: Deploying cheap and efficient ML models on AWS

13:45 SAST

17:15 IST

12:45 BST

Johannesburg

45 Min

In this talk, I will tackle the challenges of deploying large language models (LLMs), with a focus on cost efficiency and security. AI offers transformative potential for businesses, but concerns about expenses and data security can be significant barriers. This session provides practical strategies to address these concerns.

We will begin with an introduction to LLMs and their applications, such as voice and chat services. Next, we will explore why AWS is an ideal platform for hosting inference tasks, highlighting key services that facilitate powerful and economical deployments.

You will learn how to set up custom containers on Sagemaker, manage costs with inference components, and leverage autoscaling to handle variable workloads efficiently. We will also cover the ease and effectiveness of config-only machine learning using Hugging Face TGI and the Sagemaker API.

By the end of this session, you will have a comprehensive understanding of deploying and managing large language models on AWS Sagemaker, achieving high performance while keeping costs under control. This talk is ideal for professionals looking to boost their AI capabilities without compromising on budget or security.