

MLOps.community
Demetrios
Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)
Episodes
Mentioned books

Nov 23, 2020 • 59min
The Current MLOps Landscape // Nathan Benaich & Timothy Chen // MLOps Meetup #43
MLOps community meetup #43! Last Wednesday, we talked to Nathan Benaich, General Partner at Air Street Capital, and Timothy Chen, Managing Partner at Essence VC, about The MLOps Landscape.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract:In this session, we explored the MLOps landscape through the eyes of two accomplished investors. Tim and Nathan shared with us their experience in looking at hundreds of ML and MLOps companies each year to highlight major insights they have gained. What do the ML infrastructure and tooling landscape look like at the moment? Where have they been seeing patterns emerge? What do they expect to see happen within the market in the next couple of years? What current tools out there are the most interesting to them? And last but not least, how do they go about selecting which companies to invest in?// Bio:Nathan Benaich is the Founder and General Partner of Air Street Capital, a venture capital firm investing in early-stage AI-first technology and life science companies. The team’s investments include Mapillary (Acq. Facebook), Graphcore, Thought Machine, Tractable, and LabGenius. Nathan is Managing Trustee of The RAAIS Foundation, a non-profit with a mission to advance education and open-source research in the common good of AI. This includes running the annual RAAIS summit and funding fellowships at OpenMined. Nathan is also co-author of the annual State of AI Report. He holds a PhD in cancer biology from the University of Cambridge and a BA from Williams College.Timothy Chen is the Managing Partner at Essence VC, with a decade of experience leading engineering in enterprise infra and open source communities/companies.Prior to Essence, Tim was the SVP of Engineering at Cosmos, a popular open-source blockchain SDK. Prior to Cosmos, Tim cofounded Hyperpilot with Stanford Professor Christos Kozyrakis, which later exited to Cloudera. Prior to Hyperpilot, Tim was an early employee at Mesosphere and CloudFoundry.Tim is also active in the open-source space as an Apache member.----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Nathan on LinkedIn: https://www.linkedin.com/in/nathanbenaich/Connect with Tim on LinkedIn: https://www.linkedin.com/in/timchenTimestamps: 0:00 - Nathan Benaich & Timothy Chen 1:36 - Tim's background 4:07 - Nathan's background 8:08 - To Nathan: What's your take on the lay of the land in the MLOps fear or space? 10:20 - To Tim: Can you give us your rundown on what you've been seeing? The greater landscape that you look at. 14:35 - To Tim: What companies right now really excite you? What are some that are doing something that has a future? 19:36 - To Nathan: What kind of companies are you looking at right now that you're doing interesting things? 22:37 - The MLOps tools mature as the companies mature. 23:45 - No tool looks exactly the same from an MLOps perspective 25:44 - Sometimes MLOps tools are not a choice by data scientists at all. 28:10 - What MLOps needs that are not being addressed by the market right now? 35:00 - What is the annotation stack? 37:28 - How do you think about it in the context of federated learning? 41.24 - Will MLOps tools eventually become idiomatic? Would that be desirable? 47:55 - How do you switch from this open-source model to the money-making model? 52:30 - Should we focus only on the open-source at first and think about monetization later? If so, are investors prepared to invest in no-revenue companies?

Nov 19, 2020 • 52min
When Machine Learning meets privacy - Episode 3 with Charles Radclyffe
**AI and ethical dilemmas**
Artificial Intelligence is seen by many as a vehicle for great transformation, but for others, it still remains a mystery, and many questions remain unanswered: will AI systems rule us one day? Can we trust AI to rule our criminal systems? Maybe create political campaigns and dominate political advertisements? Or maybe something less harmful, do our laundry? Some of these questions may sound absurd, but they are for sure making people shift from thinking purely about functional AI capabilities but also to look further to the ethics behind creating such powerful solutions.
For this episode we count with Charles Radclyffe as a guest, the data philosopher, to cover some of these dilemmas. You can reach out to Charles through LinkedIn or at ethicsgrade.io
Useful links:
- MLOps.Community slack
- TEDx talk - Surviving the Robot Revolution
- Digital Ethics whitepaper

Nov 16, 2020 • 59min
UN Global Platform // Mark Craddock // Co-Founder & CTO, Global Certification and Training Ltd // MLOps Meetup #42
MLOps community meetup #42! Last Wednesday, we talked to Mark Craddock, Co-Founder & CTO, Global Certification and Training Ltd (GCATI), about the UN Global Platform.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract:Building a global big data platform for the UN. Streaming 600,000,000+ records/day into the platform. The strategy was developed using Wardley Maps and the Platform Design Toolkit.// Bio:Mark contributed to the Cloud First policy for the UK Public sector and was one of the founding architects for the UK Government's G-Cloud program. Mark developed the initial CloudStore, which enabled the UK Public Sector to procure cloud services from over 2,500 suppliers. The UK Public Sector has now purchased over £6.3Bn of cloud services, with £3.6Bn from Small to Medium Enterprises in the UK.Mark led the development of the United Nations Global Platform. A multi-cloud platform for capacity building within the national statistics offices in the use of Big Data and its integration with administrative sources, geospatial information, traditional survey, and census data.Mark is now building a non-profit training and certification organization.----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Mark on LinkedIn: https://www.linkedin.com/in/markcraddock/Timestamps:[0:00] - Intro to Mark Craddock[03:35] - Mark's background[05:05] - UN Global Platform[05:18] - Vision: A global collaboration to harness the power of data for better lives[05:37] - UN GWG (Big) Data Membership[05:49] - Sustainable Development Goals[06:21] - Using the platform[06:30] - Approach[06:44] - Principles[07:29] - How big was the team that put this together?[08:09] - Leave no one behind. Endeavor to reach the furthest behind first.[08:24] - Platform Business Model[10:06] - Six distinct aspects of a platform and its ecosystem[10:46] - The platform is the only business model able to orchestrate the wide range of products and services in an ecosystem[11:09] - Through the means of a platform organization, ecosystems are capable of providing an improbable combination of attributes[11:55] - Platforms and business models are also one of the best organizational structures for enabling rapid evolution[13:22] - Technology Strategy[13:23] - Wardley Maps[14:50] - Is this where Machine Learning tools would fit in?[20:35] - Are you looking at how fast these are moving across to the right? How can you gauge that?[26:57] - Is the value fluid?[28:43] - How did you factor in the different personas?[30:34] - How do you enable loosely coupled teams?[35:44] - Data also moves from left to right[42:00] - Technology Strategy Handbook[42:20] - Achievements - July '19[42:31] - Global Billing Intelligence[43:15] - Privacy-Preserving Techniques Handbook [43:26] - Cryptographic Techniques[44:12] - Global Big Datasets[44:55] - Big Data[47:41] - Automatic Identification System (AIS)[48:14] - Automatic Dependent Surveillance (ADS-B)[48:41] - Satellite Imagery [49:11] - Services in the platform[49:16] - Location Analytics Service[50:06] - Stack Sample[50:37] - Data Sources[51:50] - NiFi Dataflow[52:20] - Is this how you enabled reproducibility?[53:47] - Location Analytics Service[55:31] - Shanghai - Flights[55:45] - Shanghai - Cargo Ships[56:00] - UN Global Platform

Nov 12, 2020 • 36min
When Machine Learning meets Data Privacy - Episode 2 with Cat Coode
What are regulations saying about data privacy?
We are already aware of the importance of using Machine Learning to improve businesses, nevertheless to feed Machine Learning, data is a must, and in many cases, this data might even be considered sensitive information. So, does this mean that with new privacy regulations, access to data will be more and more difficult? ML and Data Science have their days counted? Or Will Machine beat privacy?
To answer all these questions I’ve invited Cat Coode, an expert on Data Privacy regulations, to join me in this episode, and help us sort out these questions!
Don’t forget to subscribe to the Mlops.community slack and if you’re looking for privacy-preserving solutions, show us some love and give a star to the Synthetic data open-source repo (https://github.com/ydataai/ydata-synthetic)
Useful links:
For more on Cat's work, you can have a look at catcoode.com or connect through LinkedIn.
Original Privacy by design definition: https://www.ipc.on.ca/wp-content/uploads/resources/7foundationalprinciples.pdf

Nov 10, 2020 • 1h 1min
When You Say Data Scientist Do You Mean Data Engineer? Lessons Learned From Start Up Life // Elizabeth Chabot
In this episode, we talked to Elizabeth Chabot, Consultant at Deloitte, about When You Say Data Scientist, Do You Mean Data Engineer? Lessons Learned From StartUp Life. Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Key takeaways: If you have a data product that you want to function in production, you need MLOps Education to happen about the data product life cycle, noting that ML is just part of the equation. Titles need to be defined to help outside users understand the differences in roles // Abstract: ML and AI may sound sexy to investors, but if you work in the field, you've probably spent late nights reviewing outputs manually, pored over logs, and run root cause analyses until your eyes hurt. If you've created data products at a company where analytics and data science held no meaning before your arrival, you've probably spent many a late night explaining the basics of data collection, why ETL cannot be half-baked, and that when you create a supervised model, it needs to be supervised. Companies hoping to create a data product can have a data scientist show them how ML/AI can further their product, help them scale, or create better recommendations than their competitors. What companies are not always aware of is that once the algorithm is created, the data scientist is usually handicapped until more data hires are made to build the necessary pipelines and frontend to put the algorithm in production. With the number of unique data titles growing each year, how should the first data-evangelist-wrangler-wizard navigate title assignment? // Bio: Elizabeth is a researcher turned data nerd. With a background in social and clinical sciences, Elizabeth is focused on developing data solutions that focus on creating value adds while allowing the user to make more intelligent decisions. ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/

Nov 10, 2020 • 1h
Metaflow: Supercharging Our Data Scientist Productivity // Ravi Kiran Chirravuri // MLOps Meetup #41
MLOps community meetup #41! Last Wednesday was an exciting episode that some attendees couldn't help to ask when the next season of their favorite series! The conversation was around Metaflow: Supercharging Data Scientist Productivity with none other than Netflix’s very own Ravi Kiran Chirravuri.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract:Netflix's unique culture affords its data scientists an extraordinary amount of freedom. They are expected to build, deploy, and operate large machine learning workflows autonomously without the need to be significantly experienced with systems or data engineering. Metaflow, our ML framework (now open-source at metaflow.org), provides them with delightful abstractions to manage their project's lifecycle end-to-end, leveraging the strengths of the cloud: elastic compute and high-throughput storage. In this talk, we preface with our experience working alongside data scientists, present our human-centric design principles when building Machine Learning Infrastructure, and showcase how you can adopt these yourself with ease with open-source Metaflow.// Bio:Ravi is an individual contributor to the Machine Learning Infrastructure (MLI) team at Netflix. With almost a decade of industry experience, he has been building large-scale systems focusing on performance, simplified user journeys, and intuitive APIs in MLI and previously Search Indexing and Tensorflow at Google.----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Ravi on LinkedIn: https://www.linkedin.com/in/seeravikiran/Timestamps: [00:00] - Introduction to Ravi Kiran Chirravuri [02:21] - Ravi's background [05:19] - Metaflow: Supercharging Data Scientist Productivity [05:31] - Why do we have to build Metaflow? [06:14] - Infographic of a very simplified view of a machine learning workflow [07:01] - "An idea is typically meaningless without execution." [07:38] - Scheduling [08:14] - Life is great! [08:24] - Life happens, and things are crashing and burning! [09:04] - What is Metaflow? [12:01] - How much do data scientist cares [12:25] - How infrastructure is needed [13:03] - What Metaflow does [13:44] - How can you go about using Metaflow for your data science needs? [14:20] - People love DAG's [16:00] - Baseline [16:16] - Architecture [17:28] - Syntax [19:00] - Vertical Scalability [21:10] - Horizontal Scalability [22:59] - Failures are a feature [23:57] - State Transfer and Persistence [27:05] - Dependencies [30:57] - Model Ops: Versioning [33:19] - Monitoring in Notebooks [35:16] - Decouple Orchestration [36:48] - AWS Step Functions [37:16] - Export to AWS Step Functions [38:10] - From Prototype to Production and Back [42:07] - What are the prerequisites to use Metaflow? [43:32] - Where does Metaflow store everything? [45:10] - Are there any tutorials available? [45:22] - Have the tutorials been updated? [47:27] - How do you deploy Metaflow? [49:02] - Do you see Metaflow becoming a tool to develop and support auto ML? [50:34] - What were some of the biggest learnings that you saw people doing that they're not doing on Netflix? [52:19] - Does Metaflow exist to help data scientists orchestrate everything? [54:30] - What is your version?

Nov 9, 2020 • 47min
Luigi in Production // MLOps Coffee Sessions #18 // Luigi Patruno ML in Production
Coffee Sessions #18 with Luigi Patruno of ML in Production, a Centralized Repository of Best PracticesJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterSummaryLuigi Patruno and ML in productionMLOps workflow: Knowledge sharing and best practicesObjective: learn!Links:ML in production: https://mlinproduction.com/Why you start MLinProduction: https://mlinproduction.com/why-i-started-mlinproduction/Luigi Patruno: a man whose goal is to help data scientists, ML engineers, and AI product managers build and operate machine learning systems in production.Luigi shares with us why he started ML in Production - A lot of relevant content, a lot of clickbait with low standards of quality.He had an Entrepreneurial itch, and the solution was to start a weekly newsletter. From there, he started creating Blog posts and now teamed up with Sam Charrington of TWIML to create courses on SagMaker ML. Applied MLBest practicesReading Google and Microsoft papersAnalyzing the tools that are out there, ie, Sagemaker, and how to see the world?Aimed at making you more effective and efficient at your jobCommunity questionsTaking some time to answer some community questions!Who do you learn from? Favorite resources?Self-taught, papers, talksConstruct the systemsUber michelangelo----------------- 📝 Rought notes 📝 ----------------Any companies that stand out to you in terms of MLOps excellence?Google, Amazon, Stitchfix: they've had to solve hard problemsServing adsPersonalization at scaleVertical problems: within their verticesMotivated by real challengesDropBoxGreat articlesA great machine learning companyToolsSagemakerHas a course on SagemakerNice lessons baked into the systemDos and don’ts of MLOpsDO LOG!MonitorAutomate - manual analysis leads to problemsDo it manually first til you feel confident that you can automate itTag, versionStore your training, val, and test sets!What is his process of identifying use cases that are suitable for machine learning as a solution? How do they proceed methodically?Start with the business goalThe potential number of users that the solution can benefitThe ability to build a predictive modelPerformance x impact = scoreRank problems by thisHow developed are the datasets?What part of the ML in Production process do people underestimate the most? What are the low-hanging fruits that many people don’t take advantage of?Generate actual value without needing to build the most complex model possibleIn the industry, performance is only one part of the equationHow has he seen ML in production evolve over the last few years, and where does he think it's headed next?More and more tools!Industry-specific tool taking advantage of MLThe problem is that you must have industry knowledge --------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/

Nov 5, 2020 • 19min
When Machine Learning meets Data Privacy
This is the first episode of a podcast series on Machine Learning and Data privacy. Machine Learning is the key to the new revolution in many industries. Nevertheless, ML does not exist without data and a lot of it, which in many cases results in the use of sensitive information. With new privacy regulations, access to data is today harder and much more difficult but, does that mean that ML and Data Science has its days counted? Will the Machines beat privacy?
Don’t forget to subscribe to the mlops.community slack (https://go.mlops.community/slack) and to give a star to the Synthetic data open-source repo (https://github.com/ydataai/ydata-synt...)
Useful links:
Medium post with the podcast transcription - https://medium.com/@fabiana_clemente/...
In case you’re curious about GDPR fines - enforcementtracker.com
The Netflix Prize - https://www.nytimes.com/2010/03/13/technology/13netflix.html
Tensorflow privacy - https://github.com/tensorflow/privacy

Nov 3, 2020 • 1h 1min
Analyzing the Google Paper on Continuous Delivery in ML // Part 4 // MLOps Coffee Sessions #17
MLOps level 2: CI/CD pipeline automationJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterFor a rapid and reliable update of the pipelines in production, you need a robust automated CI/CD system. This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment.Figure 4. CI/CD and automated ML pipeline. This MLOps setup includes the following components: Source controlTest and build servicesDeployment servicesModel registryFeature storeML metadata storeML pipeline orchestratorCharacteristics of the stages discussion. Figure 5. Stages of the CI/CD automated ML pipeline. The pipeline consists of the following stages:Development and experimentation: You iteratively try out new ML algorithms and new modeling, where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps that are then pushed to a source repository. Pipeline continuous integration: You build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage. Pipeline continuous delivery: You deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model. Automated triggering: The pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry. Model continuous delivery: You serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service. Monitoring: You collect statistics on the model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle. The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process.Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/

Oct 30, 2020 • 58min
Hands-on serving models using KFserving // Theofilos Papapanagiotou // Data Science Architect at Prosus // MLOps Meetup #40
MLOps community meetup #40! Last Wednesday, we talked to Theofilos Papapanagiotou, Data Science Architect at Prosus, about Hands-on Serving Models Using KFserving.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// Abstract:We looked at some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph, and the signature concepts.We discussed the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar, and the virtual service) and Knative (the service and revisions), as well as Kubeflow and KFServing. Then we got into the design details of KFServing, its custom resources, the controller and webhooks, the logging, and configuration.We spent a large part in the monitoring stack, the metrics of the servable (memory footprint, latency, number of requests), as well as the model metrics like the graph, init/restore latencies, the optimizations, and the runtime metrics, which end up in Prometheus. We looked at the inference payload and prediction logging to observe drifts and trigger the retraining of the pipeline.Finally, a few words about the awesome community and the roadmap of the project on multi-model serving and inference routing graph.// Bio:Theo is a recovering Unix Engineer with 20 years of work experience in Telcos, on internet services, video delivery, and cybersecurity. He is also a university student for life; BSc in CS 1999, MSc in Data Coms 2008, and MSc in AI 2017.Nowadays, he calls himself an ML Engineer, as he expresses his passion for System Engineering and Machine Learning.His analytical thinking is driven by curiosity and a hacker spirit. He has skills that span a variety of different areas: Statistics, Programming, Databases, Distributed Systems, and Visualization.----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Theofilos on LinkedIn: https://linkedin.com/in/theofpa


