The InfoQ Podcast

InfoQ

Software engineers, architects and team leads have found inspiration to drive change and innovation in their team by listening to the weekly InfoQ Podcast. They have received essential information that helped them validate their software development map. We have achieved that by interviewing some of the top CTOs, engineers and technology directors from companies like Uber, Netflix and more. Over 1,200,000 downloads in the last 3 years.

Episodes

Mentioned books

Nov 8, 2019 • 28min

Victor Dibia on TensorFlow.js and Building Machine Learning Models with JavaScript

Victor Dibia is a Research Engineer with Cloudera’s Fast Forward Labs. On today’s podcast, Wes and Victor talk about the realities of building machine learning in the browser. The two discuss the capabilities, limitations, process, and realities around using TensorFlow.js. The two wrap discussing techniques like Model distillation that may enable machine learning models to be deployed in smaller footprints like serverless. - While there are limitations in running machine learning processes in a resource constrained environment like the browser, there are tools like TensorFlow.js that make it worthwhile. One powerful use case is the ability to protect the privacy of a user base while still making recommendations. TensorFlow.js takes advantage of the WebGL library for its more computational intense operations. - TensorFlow.js enables workflows for training and scoring models (doing inference) purely online, by importing a model built offline with more tradition Python tools, and a hybrid approach that builds offline and finetunes online. To build an offline model, you can build a model with TensorFlow Python (perhaps using a GPU cluster). The model can be exported into the TensorFlow SaveModel Format (or the Keras Model Format) and then converted with TensorFlow.js into the TensorFlow Web Model Format. At that point, the can be directly imported into your JavaScript. - TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models and was made available by the Google AI team. It can give developers a quick jumpstart into using trained models. - Model compression promises to make models small enough to run in places we couldn’t run models before. Model distillation is a process where a smaller model is trained to replicate the behavior of a larger one. In one case, BERT (a library almost 500MB in size) was distilled to about 7MB (almost 60x compression). More on this: Quick scan our curated show notes on InfoQ https://bit.ly/32rWnab You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/32rWnab

Nov 1, 2019 • 34min

Michelle Krejci on Moving to Microservices: Visualising Technical Debt, Kubernetes, and GraphQL

In this podcast, Daniel Bryant spoke to Michelle Krejci, service engineer lead at Pantheon, about the Drupal and Wordpress webops-based company’s move to a microservices architecture. Michelle is a well-known conference speaker in the space of technical leadership and continuous integration, and she shared her lessons learned over the past four years of the migration. Why listen to this podcast: - The backend for the Pantheon webops platform began as a Python-based monolith with a Cassandra data store. This architecture choice initially enabled rapid feature development as the company searched for product/market fit. However, as the company found success and began scaling their engineering teams, the ability to add new functionality rapidly to the monolith became challenging. - Conceptual debt and technical debt greatly impact the ability to add new features to an application. Moving to microservices does not eliminate either of these forms of debt, but use of this architectural pattern can make it easier to identify and manage the debt, for example by creating well-defined APIs and boundaries between modules. - Technical debt -- and the associated engineering toil -- is real debt, with a dollar value, and should be tracked and made visible to everyone. Establishing “quick wins” during the early stages of the migration towards microservices was essential. Building new business-focused services using asynchronous “fire and forget” event-driven integrations with the monolith helped greatly with this goal. - Using containers and Kubernetes provided the foundations for rapidly deploying, releasing, and rolling back new versions of a service. Running multiple Kubernetes namespaces also allowed engineers to clone the production namespace and environment (without data) and perform development and testing within an individually owned sandboxed namespace. - Using the Apollo GraphQL platform allowed schema-first development. Frontend and backend teams collaborated on creating a GraphQL schema, and then individually built their respective services using this as a contract. Using GraphQL also allowed easy mocking during development. Creating backward compatible schema allowed the deployment and release of functionality to be decoupled.

Oct 4, 2019 • 29min

Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems

In today’s podcast we sit down with Ryan Kitchens, a senior site reliability engineer and member of the CORE team at Netflix. This team is responsible for the entire lifecycle of incident management at Netflix, from incident response to memorialising an issue. Why listen to this podcast: - Top level metrics can be used as a proxy for user experience, and can be used to determine that issue should be alerted on an investigated. For example, at Netflix if the customer playback initiation “streams per second” metric declines rapidly, this may be an indication that something has broken. - Focusing on how things go right can provide valuable insight into the resilience within your system e.g. what are people doing everyday that helps us overcome incidents. Finding sources of resilience is somewhat “the story of the incident you didn’t have”. - When conducting an incident postmortem, simply reconstructing an incident is often not sufficient to determine what needs to be fixed; there is no root cause with complex socio-technical systems as found at Netflix and most modern web-based organisations. Instead, teams must dig a little deeper, and look for what went well, what contributed to the problem, and where are the recurring patterns. - Resilience engineering is a multidisciplinary field that was established in the early 2000s, and the associated community that has emerged is both academic and deeply practical. Although much resilience engineering focuses on domains such as aviation, surgery and military agencies, there is much overlap with the domain of software engineering. - Make sure that support staff within an organisation have a feedback loop into the product team, as these people providing support often know where all of the hidden problems are, the nuances of the systems, and the workarounds. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2LLwk8T You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2LLwk8T

Sep 20, 2019 • 25min

Oliver Gould on the Three Pillars of Service Mesh, SMI, and Making Technology Bets

In this podcast we sit down with Oliver Gould, co-founder and CTO of Buoyant. Oliver has a strong background in networking, architecture and observability, and worked on solving associated technical challenges at both Yahoo! and Twitter. Oliver is a regular presenter at cloud and infrastructure conferences, and alongside his co-founder William Morgan, you can often find them in the hallway track, waxing lyrical about service mesh -- a term they practically coined -- and trying to bring others along on the journey. Service mesh technology is still young, and the ecosystem is still very much a work in progress, but there have been several recent interesting developments within this space. One of these was the announcement of the service mesh interface (SMI) at the recent KubeCon EU in Barcelona. The SMI spec seeks to unlock service mesh integrators and implementers, as this can provide an abstraction that removes the need to bet on any single service mesh implementation. This can be good for both tool makers and enterprise early adopters. Many organisations like Microsoft and HashiCorp are involved with working alongside the community to help define the SMI, including Buoyant. In this podcast we summarise the evolution of the service mesh concept, with a focus on the three pillars: visibility, security, and reliability. We explore the new traffic “tap” feature within Linkerd that allows near real time in-situ querying of metrics, and discuss how to implement network security by leveraging the primitives like Service Account provided by Kubernetes. We also discuss how reliability features, such as retries, time outs, and circuit-breakers are becoming table stakes for infrastructure platforms. We also cover the evolution of the service mesh interface, explore how service meses may impact development and platforms in the future, and briefly discuss some of the benefits offered by the Rust language in relation to building a data plane for Linkerd. We conclude the podcast with a discussion of the importance of community building. Why listen to this podcast: - A well-implemented service mesh can make a distributed software system more observable. Linkerd 2.0 supports both the emitting of mesh telemetry for offline analysis, and also the ability to “tap” communications and make queries dynamically against the data. The Linkerd UI currently makes use the tap functionality. - Linkerd aims to make the implementation of secure service-to-service communication easy, and it does this by leveraging existing Kubernetes primitives. For example, Service Accounts are used to bootstrap the notion of identity, which in turn is used as a basis for Linkerd’s mTLS implementation. - Offering reliability is “table stakes” for any service mesh. A service mesh should make it easy for platform owners to offer fundamental service-to-service communication reliability to application owners. - The future of software development platforms may move (back) to more PaaS-like offerings. Kubernetes-based function as a service (FaaS) frameworks like OpenFaaS and Knative are providing interesting features in this space. A service mesh may provide some of the glue for this type of platform. - Working on the service mesh interface (SMI) specification allowed the Buoyant team to sit down with other community members like HashiCorp and Microsoft, and share ideas and identify commonality between existing service mesh implementations. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2m5DSJ6 You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2m5DSJ6

Sep 13, 2019 • 25min

Matt Klein on Envoy Mobile, Platform Complexity, and a Universal Data Plane API for Proxies

In this podcast we sit down with Matt Klein, software plumber at Lyft and creator of Envoy, and discuss topics including the continued evolution of the popular proxy, the strength of the open source Envoy community, and the value of creating and implementing standards throughout the technology stack. We also explore the larger topic of cloud natives platforms, and discuss the tradeoffs between using a simple and opinionated platform against something that is bespoke and more configurable, but also more complex. Related to this, Matt shares his thoughts on when and how to make the decision within an organisation to embrace technology like container orchestration and service meshes. Finally, we explore the creation of the new Envoy Mobile project. The goal of this project is to expand the capabilities provided by Envoy all the way out to mobile devices powered by Android and iOS. For example, most current user-focused traffic shifting that is conducted at the edge is implemented with coarse-grained approaches via by BGP and DNS, and using something like Envoy within mobile app networking stacks should allow finer-grained control. Why listen to this podcast: - The Envoy Proxy community has grown from strength-to-strength over the last year, from the inaugural EnvoyCon that ran alongside KubeCon NA 2018, to the increasing number of code contributions from engineers working across the industry - Attempting to create a community-driven “universal proxy data plane” with clearly defined APIs, like Envoy’s XDS API, has allowed vendors to collaborate on a shared abstraction while still allowing room for “differentiated success” to be built on top of this standard Google’s gRPC framework is adopting the Envoy XDS APIs, as this will allow both Envoy and gRPC instances to be operated via a single control plane, for example, Google Cloud Platform’s Traffic Director service. - There is a tendency within the software development industry to fetishise architectures that are designed and implemented by the unicorn tech companies, but not every organisation operates at this scale. - However, there has also been industry pushback against the complexity that modern platform components like container orchestration and service meshes can introduce to a technology stack. - Using a platform within these components provides the best return on investment when an organisation’s software architecture and development teams have reached a certain size. - Function-as-a-Service (Faas)-type platforms will most likely be how engineers will interact with software in the future. Business-focused developers often do not want to interact with the platform plumbing Envoy Mobile is building on prior art, and aims to expand the capabilities provided by Envoy all the way out to mobile devices using Android and iOS. Most current end user traffic shifting is implemented with coarse-grained approaches via BGP and DNS, and using something like Envoy instead will allow finer-grained control. - Using Envoy Mobile in combination with Protocol Buffers 3, which supports annotations on APIs, can facilitate working with APIs offline, configuring caching, and handling poor networking conditions. One of the motivations for this work is that small increases in application response times can lead to better business outcomes. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/33nlGMu You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/33nlGMu

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The InfoQ Podcast

Episodes

Mentioned books

Victor Dibia on TensorFlow.js and Building Machine Learning Models with JavaScript

Michelle Krejci on Moving to Microservices: Visualising Technical Debt, Kubernetes, and GraphQL

Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems

Oliver Gould on the Three Pillars of Service Mesh, SMI, and Making Technology Bets

Event Sourcing: Bernd Rücker on Architecting for Scale

Pat Kua on Technical Leadership, Cultivating Culture, and Career Growth

Thomas Graf on Cilium, the 1.6 Release, eBPF Security, & the Road Ahead

Yuri Shkuro on Tracing Distributed Systems Using Jaeger

Louise Poubel on the Robotic Operating System

Matt Klein on Envoy Mobile, Platform Complexity, and a Universal Data Plane API for Proxies

The AI-powered Podcast Player