

MLOps.community
Demetrios
Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)
Episodes
Mentioned books

Apr 7, 2022 • 40min
The Shipyard: Lessons Learned While Building an ML Platform / Automating Adherence // Joseph Haaga // Coffee Sessions #91
MLOps Coffee Sessions #91 with Joseph Haaga, The Shipyard: Lessons Learned While Building an ML Platform / Automating Adherence.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractJoseph Haaga and the Interos team walk us through their design decisions in building an internal data platform. Joseph talks about why their use case wasn't a fit for off-the-shelf solutions, what their internal tool Snitch does, and how they use git as a model registry. Shipyard blogpost series: https://medium.com/interos-engineering.// Bio Joseph leads the ML Platform team at Interos, the operational resilience company. He was introduced to ML Ops while working as a Senior Data Engineer and has spent the past year building a platform for experimentation and serving. He lives in Washington, DC, with his dog Cheese.// MLOps Jobs board jobs.mlops.community// Related LinksWebsite: https://joehaaga.xyzMedium: https://medium.com/interos-engineeringShipyard blogpost series: https://medium.com/interos-engineering--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/Connect with Joseph on LinkedIn: https://www.linkedin.com/in/joseph-haaga/Timestamps:[00:00] Introduction to Joseph Haaga[02:07] Please subscribe, follow, like, rate, and review our Spotify and YouTube channels[02:31] New! Best of Slack Weekly Newsletter[03:03] Interos [04:33] Global supply chain[05:45] Machine Learning use cases of Interos[06:17] Forecasting and optimization of routes[07:14] Build, buy, open-source decision making[10:06] Experiences with Kubeflow[11:05] Creating standards and rules when creating the platform [13:29] Snitches[14:10] Inter-team discussions when processes fall apart[16:56] Examples of the development process based on the feedback of ML engineers and data scientists[20:35] Preserving flexibility when introducing new models and formats[21:37] Organizational structure of Interos[23:40] Surface area for product[24:46] Use of Git Ops to manage boarding pass[28:04] Cultural emphasis[30:02] Naming conventions[32:28] Benefit of a clean slate[33:16] One-size-fits-all choice[37:34] Wrap up

Apr 4, 2022 • 51min
Bringing Audio ML Models into Production // Valerio Velardo // MLOps Coffee Sessions #90
MLOps Coffee Sessions #90 with Valerio Velardo, Bringing Audio ML Models into Production.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract The majority of audio/music tech companies that employ ML still don’t use MLOps regularly. In these companies, you rarely find audio ML pipelines that take care of the whole ML lifecycle in a reliable and scalable manner. Audio ML probably pays the price of being a small sub-discipline of ML. It’s dwarfed by ML applications in image processing and NLP.In audio ML, novelties tend to travel slowly. However, things are starting to change. A few audio and music tech companies are investing in MLOps. Building MLOps solutions for music presents unique challenges because audio data is significantly different from all other data types.// Bio Valerio is MLOps Lead at Utopia Music. He’s also an AI audio consultant who helps companies implement their AI music vision by providing technical, strategy, and talent sourcing services.Valerio is interested in both the R&D and productization (MLOps) aspects of AI applied to the audio and music domains. He's the host of The Sound of AI, the largest YouTube channel and online community on AI audio with more than 22K subscribers.Previously, Valerio founded and led Melodrive, a tech startup that developed an AI-powered music engine capable of generating emotion-driven video game music in real-time. Valerio earned a Ph.D. in music AI from the University of Huddersfield (UK).// MLOps Jobs board jobs.mlops.community// Related Links Valerio's website https://valeriovelardo.com/The Sound of AI YouTube channel: https://www.youtube.com/channel/UCZPFjMe1uRSirmSpznqvJfQ--------------- ✌️Connect With Us ✌️ ------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Valerio on LinkedIn: https://www.linkedin.com/in/valeriovelardo/Timestamps: [00:00] Introduction to Valerio Velardo [01:28] Please subscribe and rate us! [02:40] History of Valerio's love for music [04:12] Intervention of computer science, AI, and Machine Learning in music [08:06] Experimenting with Machine Learning [09:25] Environmental Sound AI [11:05] AI Music [15:22] Traditional ML life cycle within music tech companies [18:02] Representation of data [22:22] Audio is being better served in the market [30:42] Success metrics [35:17] Challenges when talking to R&D teams [38:10] Things need to be battle-hardened before production [39:09] Education process besides Valerio's YouTube channel [42:38] Rectifying use cases not related to audio [45:48] Organizing modular blocks, building stacks [47:59] Open-source tools implementation [50:28] Wrap up

Mar 31, 2022 • 53min
A Journey in Scaling AI // Gabriel Straub // MLOps Coffee Sessions #89
MLOps Coffee Sessions #89 with Gabriel Straub, A Journey in Scaling AI. Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractGabriel talks to us about the difficulties of scaling ML products across an organization. He speaks about differences in profiles of data consumers and data producers, and the challenges of educating engineers so they have greater insights into the effects that their changes to the system may have.// BioGabriel joined Ocado Technology in 2020 as Chief Data Officer, bringing over 10 years of experience in leading data science teams and helping organizations realize the value of their data. At Ocado Technology, his role is to help the organization take advantage of data and machine learning so that we can best serve our retail partners and their customers.Gabriel is a guest lecturer at London Business School and an Honorary Senior Research Associate at UCL. He has also advised start-ups and VCs on data and machine learning strategies. Before joining Ocado, Gabriel was previously Head of Data Science at the BBC, Data Director at notonthehighstreet.com, and Head of Data Science at Tesco. Gabriel has an MA in Mathematics from Cambridge and an MBA from London Business School.// MLOps Jobs board jobs.mlops.community// Related LinksWebsite: https://www.ocadogroup.com/about-us/ocado-technologyPodcast: https://www.reinfer.io/podcast/ai-pioneers-gabriel-straub-chief-data-scientist-ocadoBlog: https://www.ocadogroup.com/technology/blog--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Gabriel on LinkedIn: https://www.linkedin.com/in/gabriel-s-65081521/Timestamps:[00:00] Introduction to Gabriel Straub[03:14] Best of Slack Newsletter[04:06] Gabriel's best purchase since the pandemic[05:37] Ocado's background and Gabriel's role[07:55] Sliding scale of Ocado[10:05] Different use cases of Ocado[12:02] Realizing value with Machine Learning[13:18] How things need to be computed on the edge[14:51] Ocado's main day-to-day[16:17] Being generalizable and when to stop[19:11] The Golden Path[21:30] Foundational level of maturity[24:41] Metrics of success[27:10] Lifespan of a data[28:49] Hard lessons learned from producers and consumers[30:19] Internal assessment[32:50] Evolution of Ocado [36:58] Rule-based system[38:58] Putting data science and/or machine learning value in front of the consumers[41:55] Going past the constraints[44:24] What holds people back?[46:30] Instilling the cultural value of doing right and well into the company[49:42] Being defensive, talking about AI[51:44] Ocado is hiring!

Mar 28, 2022 • 54min
ML Platform Tradeoffs and Wondering Why to Use Them // Javier Mansilla // MLOps Coffee Sessions #88
MLOps Coffee Sessions #88 with Javier Andres Mansilla, ML Platform Tradeoffs and Wondering Why to Use Them.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractJavier runs ML Platform at Mercado Libre. We’re here with Javier because he’s going to tell us about what the ML platform at Mercado Libre looks like granularly, talk about its purpose, lessons, wins, and future improvements, and share with us some of the most challenging use cases they’ve had to engineer around.// BioDuring the last 3 years, building the internal ML platform for Mercado Libre (NASDAQ MELI), the biggest company in Latam, and the eCommerce & fintech omnipresent solution for the continent.Seasoned entrepreneur and leader, Javier was co-founder and CTO of Machinalis, a high-end company building Machine Learning since 2010 (yes, before the breakthrough of neural nets). When Machinalis got acquired by Mercado Libre, that small team evolved to enable Machine Learning as a capability for a tech giant with more than 10k devs, impacting the lives of almost 100 million direct users.On a daily basis, Javier leads not only the tech and product roadmap of their Machine Learning Platform, but also their users' tracking system, the AB Testing framework, and the open-source office.Javier loves hanging out with family and friends, python, biking, football, carpentry, and slow-paced holidays in nature!// MLOps Jobs board jobs.mlops.community// Related Links--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/Connect with Javier on LinkedIn: https://www.linkedin.com/in/javimansilla/Timestamps: [00:00] Introduction to Javier Andres Mansilla [02:18] Refresher on what Mercado Libre is [06:16] Centralization of the Machine Learning platform at Mercado Libre [11:58] Mercado Libre's working size [16:15] Hitting the scale [21:07] Driving ML platform vision and the team's business metrics [28:23] Education process on how to use machine learning on the platform [36:49] Composition of the team members and finding the right people [43:05] Stakeholders [45:32] Decision making [48:51] Wrap up [49:52] Bonus from Javier

Mar 17, 2022 • 52min
Don't Listen Unless You Are Going to Do ML in Production // Kyle Morris // MLOps Coffee Sessions #87
MLOps Coffee Sessions #87 with Kyle Morris, Don't Listen Unless You Are Going to Do ML in Production.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractCompanies wanting to leverage ML specialize in model quality (architecture, training method, dataset), but face the same set of undifferentiated work they need to productionize the model. They must find machines to deploy their model on, set it up behind an API, make the inferences fast, cheap, and reliable by optimizing hardware, load-balancing, autoscaling, clustering launches per region, queueing long-running tasks... standardizing docs, billing, logging, CI/CD that integrates testing, and more.Banana.dev's aim is to simplify this process for all. This talk outlines our learnings and the trials and tribulations of ML hosting.// BioHey all! Kyle did self-driving AI @ Cruise, robotics @ CMU, currently in business @ Harvard. Now he's building banana.dev to accelerate ML! Kyle cares about safely building superhuman AI. Our generation has the chance to build tools that advance society 100x more in our lifetime than in all of history, but it needs to benefit all living things! This requires a lot of technical + social work. Let's go!// MLOps Jobs board jobs.mlops.community// Related Linkskyle.af--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Adam on LinkedIn: https://www.linkedin.com/in/aesroka/Connect with Kyle on LinkedIn: https://www.linkedin.com/in/kylejohnmorris/Timestamps:[00:00] Introduction to Kyle Morris[02:42] banana.dev [04:43] banana.dev's vision[06:22] banana.dev's goal beyond the competition[07:28] Computer vision optimization[08:46] Common pitfalls[11:47] Machine Learning Engineering vs Software Engineering[13:47] Who do you hire?[15:12] Disconnect in operationalizing[18:53] Meeting SLOs if stuff is breaking upstream[19:48] Is breaking upstream a part of quality?[21:16] Scenario of what to focus on[24:02] Advice to people dealing with unrealistic expectations[28:11] Hard truth[30:42] Don't Listen Unless You Are Going to Do ML in Production[33:15] Hurdle in productionizing ML systems[37:56] Chaos engineering[42:40] War stories[45:54] Catalyst on changing the original post on Kyle's blog[50:11] Wrap up[51:02] Message banana.dev or Kyle if you have questions regarding production. It's free of charge!

4 snips
Mar 12, 2022 • 48min
Building ML/Data Platform on Top of Kubernetes // Julien Bisconti // MLOps Coffee Sessions #86
MLOps Coffee Sessions #86 with Julien Bisconti, Building ML/Data Platform on Top of Kubernetes. Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractWhen building a platform, a good start would be to define the goals and features of that platform, knowing it will evolve. Kubernetes is established as the de facto standard for scalable platforms, but it is not a fully-fledged data platform. Do ML engineers have to learn and use Kubernetes directly? They probably shouldn't. So it is up to the data engineering team to provide the tools and abstractions necessary to allow ML engineers to do their work. The time, effort, and knowledge it takes to build a data platform is already quite an achievement. When it is built, one has to maintain it, monitor it, train people for on-call rotation, implement escalation policies and disaster recovery, optimize for usage and costs, secure it, and build a whole ecosystem of tools around it (front-end, CLI, dashboards). That cost might be too high and time-consuming for some companies to consider building their own ML platform as opposed to cloud offering alternatives. Note that cloud offerings still require some of those points, but most of the work is already done.// BioJulien is a software engineer turned Site Reliability Engineer. He is a Google developer expert, certified Data Engineer on Google Cloud, and Kubernetes Administrator, mentor for Women Developer Academy and Google For Startups program. He is working on building and maintaining a data/ML platform.// Related Linkshttps://portal.superwise.ai/Crossing the River by Feeling the Stones • Simon Wardley • GOTO 2018: https://www.youtube.com/watch?v=2IW9L1uNMCs--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/Connect with Julien on LinkedIn: https://www.linkedin.com/in/julienbisconti/Timestamps:[00:00] French intro by Julien[00:32] Introduction to Julien Bisconti[03:35] Arriving at the non-technical side process of MLOps[06:06] Envious of people with technological problems[07:27] People problem bandwidth conversation[11:04] Atomic decision making[14:20] Advice to developers: either buy or build in their career potential[18:23] Jobs board - https://mlops.pallet.xyz/jobs[21:28] Chaos engineering[26:33] Role of chaos engineering in building production machine learning systems[32:59] Core challenge of MLOps[37:04] Standardization on an industry level[40:30] Reconciliation of trade-offs using Vertex and Sagemaker[45:21] Crossing the River by Feeling the Stones talk by Simon Wardley [47:22] Wrap up

Mar 10, 2022 • 45min
Continuous Deployment of Critical ML Applications // Emmanuel Ameisen // MLOps Coffee Sessions #85
MLOps Coffee Sessions #85 with Emmanuel Ameisen, Continuous Deployment of Critical ML Applications.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractFinding an ML model that solves a business problem can feel like winning the lottery, but it can also be a curse. Once a model is embedded at the core of an application and used by real users, the real work begins. That's when you need to make sure that it works for everyone, that it keeps working every day, and that it can improve as time goes on. Just like building a model is all about data work, keeping a model alive and healthy is all about developing operational excellence.First, you need to monitor your model and its predictions and detect when it is not performing as expected for some types of users. Then, you'll have to devise ways to detect drift and how quickly your models get stale. Once you know how your model is doing and can detect when it isn't performing, you have to find ways to fix the specific issues you identify. Last but definitely not least, you will now be faced with the task of deploying a new model to replace the old one, without disrupting the day of all the users that depend on it.A lot of the topics covered are active areas of work around the industry and haven't been formalized yet, but they are crucial to making sure your ML work actually delivers value. While there aren't any textbook answers, there is no shortage of lessons to learn.// BioEmmanuel Ameisen has worked for years as a Data Scientist and ML Engineer. He is currently an ML Engineer at Stripe, where he works on helping improve model iteration velocity. Previously, he led Insight Data Science's AI program, where he oversaw more than a hundred machine learning projects. Before that, he implemented and deployed predictive analytics and machine learning solutions for Local Motion and Zipcar. Emmanuel holds graduate degrees in artificial intelligence, computer engineering, and management from three of France’s top schools.// Related Linkshttps://www.amazon.com/Building-Machine-Learning-Powered-Applications/dp/149204511X https://www.oreilly.com/library/view/building-machine-learning/9781492045106/--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Adam on LinkedIn: https://www.linkedin.com/in/aesroka/Connect with Emmanuel on LinkedIn: https://www.linkedin.com/in/ameisen/Timestamps: [00:00] Introduction to Emmanuel Ameisen [03:38] Building Machine Learning Powered Applications book inspiration [05:19] The writing process [07:04] Over-engineering NLP [09:13] CV-driven development: intentional or natural [11:09] Attribute to the machine learning team [14:44] Shortening the iteration cycle [16:41] Advice on how to tackle iteration [20:00] Failure modes [21:02] Infrastructure Iteration at Stripe [27:06] Deployment Steps tests challenges [29:34] "You develop operational excellence by exercising it." - Emmanuel Ameisen [33:22] Death of a thousand cuts: Balance of work vs productionization piece balance [36:15] Reproducibility headaches [40:04] Pipelines as software product [41:25] Get the book by Emmanuel Ameisen! [42:04] Takeaways and wrap up

4 snips
Mar 2, 2022 • 46min
Lessons from Studying FAANG ML Systems // Ernest Chan // MLOps Coffee Sessions #84
MLOps Coffee Sessions #84 with Ernest Chan, Lessons from Studying FAANG ML Systems.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractLarge tech companies invest in ML platforms to accelerate their ML efforts. Become better prepared to solve your own MLOps problems by learning from their technology and design decisions.Tune in to learn about ML platform components, capabilities, and design considerations.// BioErnest is a Data Scientist at Duo Security. As part of the core team that built Duo's first ML-powered product, Duo Trust Monitor, he faced many (frustrating) MLOps problems first-hand. That led him to advocate for an ML infrastructure team to make it easier to deliver ML products at Duo. Prior to Duo, Ernest worked at an EdTech company, building data science products for higher-ed. Ernest is passionate about MLOps and using ML for social good.// Related Links Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more: https://ernestklchan.medium.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7 Paper Highlights-Challenges in Deploying Machine Learning: a Survey of Case Studies https://towardsdatascience.com/paper-highlights-challenges-in-deploying-machine-learning-a-survey-of-case-studies-cafe61cfd04c Choose boring technologies Slideshare by Dan McKinley: https://www.slideshare.net/danmckinley/choose-boring-technology--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/Connect with Ernest on LinkedIn: https://www.linkedin.com/in/ernest-chan-68245773/Timestamps: [00:00] Introduction to Ernest Chan [01:07] Takeaways [02:58] Ernest's Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more blog post [05:55] Five components of an ML Platform [10:09] Limitations highlighted in the blog post [14:41] Level of maturity or completion observed in company efforts [16:17] Platform/Architecture admired the most [17:46] Advice to big tech companies [22:03] Process of needing an infrastructure and aiming towards having a platform [24:23] Paper Highlights-Challenges in Deploying Machine Learning: a Survey of Case Studies blog post [26:24] Takeaways from Paper Highlights-Challenges in Deploying Machine Learning [30:33] Prioritization [33:04] Delta Lake [35:27] Model rollouts and shadow mode [39:23] Are you an ML Engineer or a Data Scientist? [40:15] Simple route platform vs flexible platform trade-offs [41:08] Opinionated and simple vs less opinionated and flexible [43:22] Choose boring technologies Slideshare by Dan McKinley [44:36] Wrap up

Feb 28, 2022 • 48min
Better Use cases for Text Embeddings // Vincent Warmerdam // MLOps Coffee Sessions #83
MLOps Coffee Sessions #83 with Vincent Warmerdam, Better Use Cases for Text Embeddings.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract Text embeddings are very popular, but there are plenty of reasons to be concerned about their applications. There's algorithmic fairness, compute requirements, as well as issues with the datasets that they're typically trained on.In this session, Vincent gives an overview of some of these properties while also talking about an underappreciated use-case for the embeddings: labeling!// Bio Vincent D. Warmerdam is a senior data professional who has worked as an engineer, researcher, team lead, and educator in the past. He's especially interested in understanding algorithmic systems so that one can prevent failure. As such, he has a preference for simpler solutions that scale, as opposed to the latest and greatest from the hype cycle. He currently works as a Research Advocate at Rasa, where he collaborates with the research team to explain and understand conversational systems better.Outside of Rasa, Vincent is also well known for his open-source projects (scikit-lego, human-learn, doubtlab, and more), collaborations with open source projects like spaCy, his blog over at koaning.io, and his calm code educational project.--------------- ✌️Connect With Us ✌️ ------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletter, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Skylar on LinkedIn: https://www.linkedin.com/in/skylar-payne-766a1988/Connect with Vincent on LinkedIn: https://www.linkedin.com/in/vincentwarmerdam/Timestamps:[00:00] Takeaways[04:10] Favorite purchases this pandemic [05:05] What drives Vincent to understand how ML can fail?[08:33] How and why to make systems simpler?[11:37] Techniques shared by Vincent in his talks[15:51] ML as a UI problem[17:02] Figuring out rules in your data[20:01] Detecting bad labels[23:53] Labeling isn't necessarily easy[25:48] Fraud use case[27:42] How does Vincent stay sane looking for frauds?[29:12] How does Vincent produce so many packages?[31:23] Vincent's favorite package[33:24] Explosion AI[36:14] Python all the way[37:44] Shift from model-centric to data-centric AI[39:35] Talking about the problem is necessary[40:40] Vincent's war stories[44:04] Adding constraints to the system[47:49] Wrap up

Feb 23, 2022 • 50min
Feature Stores at Shopify and Skyscanner // Matt Delacour and Mike Moran // Reading Group #4
MLOps Reading Group meeting on February 11, 2022 Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterReading Group Session about Feature Stores with Matt Delacour and Mike Moran --------------- ✌️Connect With Us ✌️ ------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunity Connect with us on LinkedIn: https://www.linkedin.com/company/mlopscommunity/Sign up for the next meetup: https://go.mlops.community/registerCatch all episodes, Feature Store, Machine Learning Monitoring, and Blogs: https://mlops.community/Timestamps: [00:05] Matt's intro [00:26] Mike's intro [01:09] Matt’s talk: Feature store system at Shopify [01:45] What is Shopify? [02:05] Shopify Use Case [02:38] Choosing a solution [03:19] Managed service vs In-house vs Open-source (Feast) [06:01] Why did we choose Feast? [11:25] Implementation Strategy (multi-repo vs mono-repo approaches) [13:01] Mono-repo approach breakdown [14:30] Internal SDK [17:01] Q&A: Does Feast satisfy scalability for online inference of Shopify's latency requirements? [19:05] Q&A: Do you rely on Feast to serialize data to the online store? [20:13] Q&A: Is your mono-repo library a subset of Feast? [21:18] Q&A: Did you consider using git submodules for a multi-repo? [23:02] Q&A: Are you storing embeddings with Feast? [24:30] Q&A: Regarding the mono-repo, which modules are responsible for feature engineering? How do you guarantee that different feature engineering can be used across many DS? [27:58] Mike’s talk (Feature store at Skyscanner) [28:08] Kaleidoscope System [28:25] Background and context of the Feature store [29:30] Initial state of the feature store [30:13] How does the marketing team also leverage the feature store [31:04] Current state of the feature store (marketing & machine learning) [31:44] SDK approach of creating schemas with dataframes (easy access) [32:16] Reusability across teams among the marketing and DS team [33:06] GDPR constraints [33:34] Data updates at the feature store [36:09] Q&A: When a DS updates a feature, how are you communicating that across teams? [38:25] Q&A: Are you applying different levels of feature engineering to increase the likelihood of a DS going back to a previous checkpoint of processing? [40:55] Q&A: In what languages are you implementing the feature store? [44:28] Q&A: Regarding performance-wise, how do you decide what code remains in Apache Spark vs SQL? [49:00] Wrap-up


