MLOps.community  cover image

Large Language Models in Production Round-table Conversation

MLOps.community

00:00

The Cost of Latency in a Condensed Version Event

The latency from our first birth deployment a couple of years ago to like what we have now it's like magnitude sticker. We can even run those models on the cbu in a very condensed version event and the costs play a play a role as well or like you could just use gd right so if your business case pays for this option but I think the the to make it very domain specific is the biggest  latency saver you can go after. Running GPUs is about 4x what you can do on CPUs and so I'll go back to and I'm definitely stealing this James I love the triangle of the quality cost and latency.

Play episode from 43:30
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app