The InfoQ Podcast
InfoQ
Software engineers, architects and team leads have found inspiration to drive change and innovation in their team by listening to the weekly InfoQ Podcast. They have received essential information that helped them validate their software development map. We have achieved that by interviewing some of the top CTOs, engineers and technology directors from companies like Uber, Netflix and more. Over 1,200,000 downloads in the last 3 years.
Episodes
Mentioned books

Aug 21, 2020 • 23min
Liran Haimovitch on Understandability, Complexity, and Live Debugging
In this podcast, Liran Haimovitch, CTO at Rookout, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: the concept of “understandability” and how this relates to building modern software systems, how complexity impacts a system’s understandability, and the benefits of live debugging tooling.
Listen to the podcast for more.
Curated transcript and more information on the podcast: https://bit.ly/3ggFE0r
Follow us on Facebook, Twitter, LinkedIn, Youtube: @InfoQ
Follow us on Instagram: @infoqdotcom
Stay informed on emerging trends, peer-validated early adoption of technologies, and architectural best practices. Subscribe to The Software Architects’ Newsletter: https://www.infoq.com/software-architects-newsletter/

Aug 10, 2020 • 30min
Ana Medina on Chaos Engineering, Game Days, and Learning
In this podcast, Ana Medina, senior chaos engineer at Gremlin, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: how enterprise organisations are adopting chaos engineering with the requirements for guardrails and the need for “status checks” to ensure pre-experiment system health; how to run game days or IT fire drills when everyone is working remotely; and why teams should continually invest in learning from past incidents and preparing for inevitable failures within systems.
Listen to the podcast for more.
Curated transcript and more information on the podcast: https://bit.ly/2Pmsq7F
Follow us on Facebook, Twitter, LinkedIn, Youtube: @InfoQ
Follow us on Instagram: @infoqdotcom
Stay informed on emerging trends, peer-validated early adoption of technologies, and architectural best practices. Subscribe to The Software Architects’ Newsletter: https://www.infoq.com/software-architects-newsletter/

Jul 28, 2020 • 38min
Stefan Prodan on Progressive Delivery, Flagger, and GitOps
In this podcast, Stefan Prodan, developer experience engineer at Weaveworks and creator of the Flagger project, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: how progressive delivery extends the core ideas of continuous delivery; how the open source Flagger Kubernetes operator can be used to implement a progressive delivery strategy via canary releasing with an API gateway or service mesh; and the new “GitOps toolkit” that has evolved from the Flux continuous delivery operator.
Listen to the podcast for more.
Curated transcript and more information on the podcast: https://bit.ly/3f9CZ8p
Follow us on Facebook, Twitter, LinkedIn, Youtube: @InfoQ
Follow us on Instagram: @infoqdotcom
Stay informed on emerging trends, peer-validated early adoption of technologies, and architectural best practices. Subscribe to The Software Architects’ Newsletter: https://www.infoq.com/software-architects-newsletter/

Jul 10, 2020 • 32min
Rancher on Hybrid Cloud, Kubernetes at the Edge, and Open Standards
In this podcast, Shannon Williams, co-founder and president at Rancher Labs and Darren Shepherd, co-founder and CTO at Rancher Labs sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: the adoption of hybrid cloud across organisations, the evolution of Kubernetes as a key abstraction for portability and cross-cloud security, running thousands of Kubernetes clusters at the edge, and the value of open standards.
Why listen to this podcast
- Organisations are adopting hybrid cloud strategies. The use of containers to package and run applications across clouds has seen large adoption over the past five years. Containers and Kubernetes are everywhere: the datacenter, the edge, embedded systems, and other locations.
- Two enterprise use cases for Kubernetes stand out: providing standardised abstractions and APIs to increase portability across vendors and cloud platforms; and providing a framework and homogenised foundation on which to build and implement (cross cloud) security solutions.
- Open standards support interoperability and drive innovation. The CNCF is becoming the natural home for open cloud technologies. The Rancher team have donated Longhorn, their cloud-native distributed storage platform for Kubernetes that was recently announced as generally available, to the CNCF.
- With the success of lightweight Kubernetes distributions, such as Rancher’s K3s, engineers are starting to deploy standalone Kubernetes clusters “by the thousands” to edge locations. Rancher has recently released Fleet, a new open source project that is focused on managing large collections (“fleets”) of Kubernetes clusters.
- Many developers and end users of Kubernetes simply want a platform-as-a-service (PaaS)-like experience. The next 12 months will see the community focus on the simplification of the Kubernetes ecosystem.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/31ZjYT1
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/31ZjYT1

Jul 3, 2020 • 36min
Nora Jones on Resilience Engineering, Mental Models, and Learning from Incidents
In this podcast, Nora Jones, Co-Founder and CEO at Jeli and co-author of O’Reilly’s “Chaos Engineering: System Resiliency in Practice”, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: chaos engineering and resilience engineering, planning and running effective chaos experiments, and learning from incidents.
Why listen to this podcast:
- The chaos engineering and resilience engineering fields, although inextricably linked, are often incorrectly conflated. Resilience engineering is focused on “identifying and then enhancing the positive capabilities of people in organizations that allow them to adapt effectively and safely under varying circumstances.”
- The UX of internal or engineering-focused tooling, such as chaos experimentation tooling, is extremely important. However, engineers that create these tools often overlook the value of UX, or don’t have the relevant skills in user design research to undertake this.
- We all work in socio-technical systems. It is important to take the time to understand both aspects. Developing empathy and working alongside teams that you are trying to influence is essential. It is extremely important to continually work to build correct “mental models” of a system.
- The before and after of running a chaos experiment is as important as running the experiment itself. However, the aspects of planning, creating effective hypotheses, and analysing and disseminating the results are often under-resourced.
- Incident analysis can be a catalyst to help you understand more about your system. The Learning from Incidents website, alongside books such as Sidney Dekker’s The Field Guide to Understanding Human Error and Scott Snook’s Friendly Fire, can provide excellent background information to these topics.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2C4R6xL
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/2C4R6xL

Jun 26, 2020 • 33min
Rob Skillington on Metrics Collection, Uber’s M3, and OpenMetrics
In this podcast, Rob Skillington, co-founder and CTO at Chronosphere, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: metrics collection at scale, multi-dimensional metrics and high-cardinality, developer experience with platform tooling, and open standards related to observability.
Why listen to this podcast:
- Over the past ten years the requirements related to monitoring and alerting, and the approach taken to implement this, has changed considerably. Compute is now ephemeral and dynamic, services are more numerous, and engineers want to instrument more things. Scalability of a monitoring solution is vitally important.
- One of the challenges with metric data is the limited information for providing context for collected values. This can be solved by using multi-dimensional metrics. Dimensions of a metric are name-value pairs that carry additional data to describe the metric value. High dimensionality can lead to high cardinality.
- Uber’s M3 metrics collection system initially used open source components such as Cassandra and ElasticSearch for storage and indexing. As the scale of usage of M3 increased, these OSS components were gradually replaced by custom components, such as M3DB.
- Building an effective user experience for operational tooling, especially observability-foused tooling, is vitally important. Engineers will be interacting with these tools on a daily basis. They will also be relying on these tools for both alerting and being able to locate and understand what is occurring during production issues.
- Open standards are vitally important for interoperability. The OpenMetrics project is an effort to create an open standard for transmitting metrics at scale, with support for both text representation and protocol buffers.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/3i3dMPO
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/3i3dMPO

Jun 19, 2020 • 41min
Johnny Boursiquot on Serverless Go and Site Reliability Engineering at Heroku
In this podcast, Johnny Boursiquot, Site Reliability Engineer at Heroku, sat down with InfoQ podcast co-host Daniel Bryant and discussed topics that included: why Go is a useful language for building Function-as-a-Service (FaaS) style applications; how Heroku implement the role of Site Reliability Engineer (SRE); and why the ability to teach is such a valuable skill.
Why listen to this podcast:
- Go is a useful language for building Function-as-a-Service (FaaS) style applications. The ability to build Go applications into a static binary reduces the need for dependency management, and the quick runtime and application start time is good for initiation and scaling
- The FaaS development toolchain has improved over the years. Many cloud providers now provide local runtimes, e.g. AWS SAM Local, and service simulators, e.g. LocalStack. Testing in production is facilitated by the ability to do dark launches and canary releasing at the ingress/API gateway
- Developing “serverless” applications typically does not remove the need for operational expertise on a development team. Designing systems appropriately and getting the most out of the runtime (with minimal cost) requires knowledge of the underlying infrastructure components
- The role of Site Reliability Engineering (SRE) looks different across practically every organisation. The Heroku SRE team have adapted well-established patterns and practices into their roles. They act as “diplomats”, working closely with product teams to share knowledge around operational best practices
- The ability to teach is a valuable skill, regardless of your job. Teaching people to code or to embrace important operational principles is extremely rewarding. - Engineers who teach must seek to escape the pull of their ego; by focusing on the needs of the people you are teaching, much more progress can be made.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2UV0tqK
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/2UV0tqK

Jun 13, 2020 • 34min
Matt Debergalis on GraphQL and Data Modelling in the Enterprise
In this podcast, Matt Debergalis, Founder and CTO at Apollo, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: the motivations for GraphQL, the Apollo Data Graph platform, data modelling in an enterprise context, and how incrementally adopting GraphQL can help with decoupling the evolution of frontend and backend systems.
Why listen to this podcast:
- The challenges of defining client-side-friendly data models, building maintainable and composable backend APIs, and moving data from the cloud to a client application contributes to making modern software development difficult and time consuming.
- GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL lets application developers describe the data they need and bring that data into the screens that they are building for their users.
- The Apollo Data Graph platform is a middleware layer that provides a way of decoupling the core business APIs from the client-side consumption patterns. Apollo can implement cross-cutting concerns, such as transaction management, which mitigates the need to implement this in (potentially multiple) client-side applications.
- Apollo makes it possible to build a “data graph”: a series of graphs that are composed from an organisation’s data for use within client-side applications. A data graph is especially valuable in larger enterprises because it is here that many (money making) systems with existing APIs need to be combined to meet new business requirements.
- GraphQL can be adopted in an incremental fashion. To begin adoption just build the simplest possible graph that matches the needs of the first application, the first screen, or the first component that is required to transition over to the graph. Then let that graph evolve.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/3fhaeqP
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/3fhaeqP

May 31, 2020 • 37min
Lin Sun and Neeraj Poddar on Istio, Wasm, and the Future of Service Mesh
In this podcast, Lin Sun, senior technical staff member and master inventor at IBM, and Neeraj Poddar, engineering lead and architect at Aspen Mesh, sat down with InfoQ co-host Daniel Bryant. Topics discussed included: the evolution of service mesh data planes and control planes, the new Istio 1.5 architectures, Istio WebAssembly extension support, and the future of service mesh technology.
Why listen to this podcast:
- A service mesh in one implementation approach to provide service discovery, traffic management, and cross-cutting communication concerns that engineers see when they adopt (micro)service-based.
- The data plane of most modern service mesh implementations run out-of-process as a proxy sidecar. This has evolved from library based implementations, such as Airbnb’s SmartStack or Netflix’s OSS libraries.
- The recent release of Istio 1.5 saw the deployment packaging of the control plane move from a microservice-based approach to that of a monolithic implementation, named “istiod”.
- Istio now also supports data plane extensions written in WebAssembly (Wasm). These extensions can modify requests and responses and perform out-of-band actions, such as authentication and authorization.
- Standardisations like the Service Mesh Interface (SMI) can add a lot of value, but the user requirements, common use cases, and the core abstractions of the underlying technology must be well understood.
- Multi-cluster and mesh expansion (out-of-cluster) support is continually improving in Istio and many other service mesh implementations.

May 25, 2020 • 44min
Sam Newman: Monolith to Microservices
Today on the InfoQ Podcast, Wes Reisz talks with one of the thought leaders in Microservices, CI/CD, and Cloud -- Sam Newman. The podcast covers many of the topics, techniques, and patterns that Sam writes about in his latest book, Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. Topics covered in the podcast include understanding the problem you’re trying to solve, organizational/people changes when it comes to microservice architectures, database strategies for decomposing monolithic datastores, and why we’re seeing projects reverting from microservices to monoliths.
Why listen to this podcast:
- Fundamentally, microservices are distributed systems. Distributed systems have baggage (complexity) that comes along with them. The best way to deal with this complexity is not to address it. Try to solve the problem in other ways before choosing to take an organization to microservices.
- A common issue that large enterprises run into that might be a strong indicator for implementing microservices occurs when lots of developers are working on a given problem and they’re getting in each other’s way.
- A useful structure to follow with microservices is to make sure each service is owned by exactly one team. One team can own more than one service but having clear ownership of who owns a service helps in some of the operational challenges with microservices.
- A release train should be a stop in the journey towards continuous delivery. It’s not the destination. If you find that you can only release in a release train, you are likely building a distributed monolith.
- There are challenges of operating microservices when the end customer has to operate and manage it. These challenges are part of why we’re seeing projects move from microservices to process monoliths.
More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2XvGzmF
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Subscribe: www.youtube.com/infoq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/2XvGzmF


