

Last Week In AWS Podcast
Corey Quinn
The latest in AWS news, sprinkled with snark. Posts about AWS come out over sixty times a day. We filter through it all to find the hidden gems, the community contributions--the stuff worth hearing about! Then we summarize it with snark and share it with you--minus the nonsense.
Episodes
Mentioned books

Jan 13, 2020 • 13min
Your Database Will Explode in Sixty Seconds
AWS Morning Brief for the week of January 13th, 2020.

Jan 9, 2020 • 12min
Networking in the Cloud Fundamentals: The Cloud in China
About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Welcome back to Networking In The Cloud, a special 12 week mini feature of the AWS morning brief sponsored by ThousandEyes. This week's topic, The Cloud in China, but first, let's talk a little bit about ThousandEyes. You can think of ThousandEyes as the Google maps of the internet, just like you wouldn't leave San Jose to drive to San Francisco without checking which freeway to take because local references are always going to resonate the best when telling these stories, business rely on ThousandEyes to see the end to end paths that their applications and services are taking from their servers to their end users, to identify where the slowdowns are, where the pile ups are, and what's causing these issues. They can use ThousandEyes to figure out what's breaking and ideally notify providers before their customers notice. To learn more, visit thousandeyes.com. And my thanks to them for their sponsoring of this mini series.Now, when we're talking about China, I want to start by saying that I'm not here to pass judgment. Here in the United States, we're sort of the Oracle cloud of foreign policy, so Lord knows that my hands aren't clean any. Instead, I want to have a factual discussion about what networking in China looks like in the world of cloud in 2020. To start, China is a huge market. The market for cloud services in China this year is expected to reach just over a hundred billion dollars. So there's a lot of money on the table, there's a lot riding on companies making significant inroads into an extremely lucrative market that is extremely technologically savvy.Historically, according to multiple Chinese cloud executives who were interviewed for a variety of articles, China's enterprise IT market is probably somewhere between five to seven years behind most Western markets. That means that there's a huge amount of opportunity for companies to be able to make inroads and make an impact on that market before it winds up being dominated, like a lot of the Western markets have been by certain large Seattle-based cloud providers, ahem, ahem.Now, due to Chinese regulations, in order to run a cloud provider in China, it has to be operated by a Chinese company. That's why Microsoft works with a company called 21Vianet, whereas AWS has two partners, Beijing Sinnet and NWCD. Those local partners in fact own and operate the physical infrastructure that the cloud providers are building in China and become known as the seller of record. Although the US cloud companies of course do, or at least ostensibly retain all the rights to their intellectual property, either trademarks, their copyrights, etc.That said, if you take a look at any of the large cloud providers, service and region availability tables, there's very clearly a significant lag between when services get released in most regions and when they do inside the mainland China regions. Some of the concern, at least according to people off the record, comes down to concern over intellectual property theft. And in the current political climate where we have basically picked an unprovoked trade war with China, it winds up complicating this somewhat heavily. If for no other reason, then companies are extremely skittish about subjecting what they rightly perceive to be their incredibly valuable intellectual property to the risks of operating inside of mainland China, so on the one hand they don't want to deal with that. On the other, there are over half a billion people in China with smartphones, just shy of 900 million people on the internet in one form or another. So there's an awful lot of money at stake. So companies find themselves rather willing to overlook some things that they otherwise would not want to bother with. Now again, I'm not here to moralize, I just find the idea to be somewhat fascinating.Most of that stuff you can find out just from reading news articles and various press releases. So let's go a little bit further into how companies are servicing the Chinese market. Not for nothing, but picking on AWS because they are the incumbent in this space, and this is the AWS morning brief. But looking at the map on my wall, they have regions in Tokyo, in Seoul, in Hong Kong, in Singapore and Mumbai. If you squint enough, that sort of forms a periphery around the outside of mainland China. Here in the real world, if it's at all feasible, companies tend to use those regions that are more or less scattered around China, rather than within China if it is even slightly feasible and then provide services to their customers inside of China through those geographically local regions without having to deal with having a physical presence inside of China. You can learn a lot about this by looking at ThousandEyes 2019 Public Cloud Performance Benchmark Report, where they wound up figuring out what's going on with IBM, AWS, Azure and Google Cloud, and of course Alibaba this year, which is interesting and we'll get there in a minute because this is restricted to real clouds.Oracle cloud is not a real cloud and thus was not invited. Figure out what the different architectural conductivity differences are between these cloud providers. Take a look at the AWS global accelerator and how it pans out and what you can actually expect from real world networks going to other real world networks, and see what it is that makes sense for various use cases. My thanks again to ThousandEyes for sponsoring this podcast. You can get your own copy of the report at snark.cloud/real clouds, that's snark.cloud/realclouds.One of those real clouds as mentioned is Alibaba. The reason that I bring them up is that they currently dominate China's cloud market. Alibaba has something on the order of a 43% market share inside of mainland China. Second behind them with 17.4% is 10 Cent. 10 Cent is also growing rapidly. AWS is up there as well, given their significant posture and other places. But then there's a whole smattering of small scale cloud operators that are still vying for a piece of a very large, very lucrative pie.Now, if you're talking to any of those providers inside of China, then the networking works pretty much like you'd expect it to anywhere else on the planet. The challenge and why this is worth an entire episode is what happens when you try to network outside of China into the rest of the internet. Let's talk a little bit about China's great firewall. This was started roughly in 1998 in order to enforce Chinese law. News, shopping sites, stereo search engines and pornography are all blocked through a wide variety of methods in accordance with Chinese law, that tends to change and ebb and flow. No...

Jan 6, 2020 • 9min
Burning Amazon Lex to CD-ROM
AWS Morning Brief for the week of January 6th, 2020.

Dec 30, 2019 • 17min
Listener Mailbag
AWS Morning Brief for the week of December 30th, 2019.

Dec 23, 2019 • 15min
It's a Horrible Lyfebin
AWS Morning Brief for the week of December 23rd, 2019.

Dec 19, 2019 • 17min
Networking in the Cloud Fundamentals: Regions and Availability Zones in AWS
About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Hello, and welcome back to our Networking in the Cloud mini series sponsored by ThousandEyes. That's right. ThousandEyes is state-of-the-cloud Performance Benchmark Report is now available for your perusal. It's really providing a lot of baseline that we're taking all of the miniseries information from. It pointed us in a bunch of interesting directions and helps us tell stories that are actually, for a change, backed by data rather than pure sarcasm. To get your copy, visit snark.cloud/realclouds because it only covers real cloud providers. Thanks again to ThousandEyes for their ridiculous support of this shockingly informative podcast mini series.It's a basic fact of cloud that things break all the time. I've been joking for a while that a big competitive advantage that Microsoft brings to this space is that they have 40 years of experience apologizing for software failures, except that's not really a joke. It's true. There's something to be said for the idea of apologizing to both technical and business people about real or perceived failures being its own skillset, and they have a lot more experience than anyone in this space.There are two schools of thought around how to avoid having to apologize for service or component failures to your customers. The first is to build super expensive but super durable things, and you can kind of get away with this in typical data center environments right up until you can't, and then it turns out that your SAN just exploded. You're really not diversifying with most SANs. You're just putting all of your eggs in a really expensive basket, and of course, if you're still with power or networking outage, nothing can talk to the SAN, and you're back to square one.The other approach is to come at it with a perspective of building in redundancy to everything and eliminating single points of failure. That's usually the better path in the cloud. You don't ever want to have a single point of failure if you can reasonably avoid it, so going with multiple everythings starts to make sense to a point. Going with a full on multi-cloud story is a whole separate kettle of nonsense we'll get to another time. But you realize at some point you will have single points of failure and you're not going to be able to solve for that. We still only have one planet going around one sun for example. If either of those things explode, well, computers aren't really anyone's concern anymore. However, betting the entire farm on one EC2 instance is generally something you'll want to avoid if at all possible.In the world of AWS, there aren't data centers in the way that you or I would contextualize them. Instead, they have constructs known as availability zones and those composed to form a different construct called regions. Presumably, other cloud providers have similar constructs over in non-AWS land, but we're focusing on AWS as implementation in this series, again, because they have a giant multi-year head start over every other cloud provider, and even that manifests in those other cloud providers comparing what they've built and how they operate to AWS. If that upsets you and you work at one of those other cloud providers, well, you should have tried harder. Let's dive in to a discussion of data centers, availability zones, and regions today.Take an empty warehouse and shove it full of server racks. Congratulations. You have built the bare minimum requirement for a data center at its most basic layer. Your primary constraint and why it's a lot harder than it sounds is power, and to a lesser extent, cooling. Computers aren't just crunching numbers, they're also throwing off waste heat. You've got to think an awful lot about how to keep that heat out of the data center.At some point, you can't shove more capacity into that warehouse-style building just because you can't cool it if it's all running at the same time. If your data center's particularly robust, meaning you didn't cheap out on it, you're going to have different power distribution substations that feed the building from different lines that enter the building at different corners. You're going to see similar things with cooling as well, multiply redundant cooling systems.One of the big challenges, of course, when dealing with this physical infrastructure is validating that what it says on the diagram is what's actually there in the physical environment. That can be a trickier thing to explore than you would hope. Also, if you have a whole bunch of systems sitting in that warehouse and you take a power outage, well, you have to plan for this thing known as inrush current.Normally, it's steady state. Computers generally draw a known quantity of power. But when you first turn them on, if you've ever dealt with data center servers, the first thing they do is they power up everything to self-test. They sound like a jet fighter taking off as all the fans spin up. If you're not careful, and all these things turn on at once, you'll see a giant power spike that winds up causing issues, blowing breakers, maxing out consumption, so having a staggered start becomes a concern as well. Having spent too much time in data centers, I am painfully familiar with this problem of how you safely and sanely recover from site-wide events, but that's a bit out of scope, thankfully, because in the cloud, this is less of a problem.Let's talk about the internet and getting connectivity to these things. This is the Networking in the Cloud podcast after all. You're ideally going to have multiple providers running fiber lines to that data center hoping to avoid fiber's natural predator, the noble backhoe. Now, ideally, all those fiber lines are going over different paths, but again, hard thing to prove, so doing your homework's important, but here's something folks don't always consider: If you have a hundred gigabit ethernet links to each computer, which is not cheap, but doable, and then you have 20 servers in a rack, each rack theoretically needs to be able to speak at least two terabit at all times to each other server in each other rack, and most of them can't do that. They wind up having bottle-necking issues.As a result, when you have high-traffic applications speaking between systems, you need to make sure that they're aware of something known as rack affinity. In other words, are there bottlenecks between these systems, and how do you minimize those to make sure the crosstalk works responsibly? There are a lot of dragons in here, but let's hand-wave past all of it because we're talking about cloud here. The point of this is that there's an awful lot of nuance to running data centers, and AWS and other large cloud providers do a better job of it than you do. That's not me insulting your data center staff. That's just a fact. They have the scal...

Dec 16, 2019 • 11min
AWS Dep-Ric-ates Treasured Offering
AWS Morning Brief for the week of December 16th, 2019.

Dec 13, 2019 • 16min
reInvent Wrap-up, Part 4
AWS Morning Brief for Friday, December 13th, 2019

Dec 12, 2019 • 16min
Networking in the Cloud Fundamentals, Part 6
About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: Knock knock. Who's there? A DDOS attack. A DDOS a... Knock. Knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock, knock.Welcome to what we're calling Networking in the Cloud, episodes six, How Things Break in the Cloud, sponsored by ThousandEyes. ThousandEyes recently launched their state of the cloud performance benchmark report that effectively lets you compare and contrast performance and other aspects between the five large cloud providers, AWS, Azure, GCP, Alibaba and IBM cloud. Oracle cloud was not invited because we are talking about real clouds here. You can get your copy of this report at snark.cloud/realclouds. and they compare and contrast an awful lot of interesting things. One thing that we're not going to compare and contrast though, because of my own personal beliefs, is the outages of different cloud providers.Making people in companies, by the way, companies are composed of people, making them feel crappy about their downtime is mean, first off. Secondly, if companies are shamed for outages, it in turn makes it far likelier that they won't disclose having suffered an outage. And when companies talk about their outages in constructive blameless ways, there are incredibly valuable lessons that we all can learn from it. So let's dive into this a bit.If there's one thing that computers do well, better than almost anything else, it's break. And this is, and I'm not being sarcastic when I say this, a significant edge that Microsoft has when they come to cloud. They have 40 some odd years of experience in apologizing for software failures. That's not trying to be insulting to Microsoft, it's what computers do, they break. And being able to explain that intelligently to business stakeholders is incredibly important. They're masters at that. They also have a 20 year head start on everyone else in the space. What makes this interesting and useful is that in the cloud, computers break differently than people would expect them to in a non-cloud environment.Once upon a time when you were running servers and data centers, if you see everything suddenly go offline, you have some options. You can call the data center directly to see if someone cut the fiber, in case you were unaware of fiber optic cables' sole natural predator in the food chain is the mighty backhoe. So maybe something backhoed out some fiber lines, maybe the power is dead to the data center, maybe the entire thing exploded, burst into flames and burned to the ground, but you can call people. In the cloud, it doesn't work that way. Here in the cloud, instead you check Twitter because it's 3:00 AM and Nagios is the original call of duty or PagerDuty calls you, because you didn't need that sleep anyway, telling you there is something amiss with your site. So when a large bond provider takes an outage, and you're hanging out on Twitter at two in the morning, you can see DevOps Twitter come to life in the middle of the night, as they chatter back and forth.And incidentally, if that's you, understand a nuance of AWS availability zone naming. When people say things like us-east-1a is having a problem and someone else says, "No, I just see us-east-1c is having a problem," you're probably talking about the same availability zone. Those letters change, non deterministically, between accounts. You can pull zone IDs, and those are consistent. But by and large, that was originally to avoid having problems like everyone picking A, as humans tend to do or C, getting the reputation as the crappy one.So why would you check Twitter to figure out if your cloud provider's having a massive outage? Well, because honestly, the AWS status page is completely full of lies and gaslights you. It is as green as the healthiest Christmas tree you can imagine, even when things are exploding for a disturbingly long period of time. If you visit the website, stop.lying.cloud, you'll find a Lambda and Edge function that I've put there that cuts out some of the croft, but it's not perfect. And the reason behind this, after I gave them a bit too much crap one day and I got a phone call that started with, "Now you listen here," it turns out that there are humans in the loop, and they need to validate that there is in fact a systemic issue at AWS and what that issue might be, and then finally come up with a way to report that in a way that ideally doesn't get people sued and manually update the status page. Meanwhile, your site's on fire. So that is a trailing function, not a leading function.Alternately, you could always check ThousandEyes. That's right, this episode is sponsored by ThousandEyes. In addition to the report we mentioned earlier, you can think of them as Google Maps of the internet without the creepy privacy overreach issues. Just like you wouldn't necessarily want to commute during rush hour without checking where traffic is going to be and which route was faster, businesses rely on ThousandEyes to see the end to end paths their applications and services are taking in real time to identify where the slow downs are, where the outages are and what's causing problems. They use ThousandEyes to see what's breaking where and then importantly, ThousandEyes shares that data directly with the offending service providers. Not just to hold them accountable, but also to get them to fix the issue fast. Ideally, before it impacts users. But on this episode, it already has.So let's say that you don't have the good sense to pay for ThousandEyes or you're not on Twitter, for whatever reason, watching people flail around helplessly trying to figure out what's going on. Instead, you're now trying desperately to figure out whether this issue is the last deploy your team did or if it's a global problem. And the first thing people try to do in the event of an issue is, "Oh crap, what did we just change? undo it." And often that is a knee jerk response that can make things worse if it's not actually your code that caused the problem. Worse, it can eat up precious time at the beginning of an outage. If you knew that it was a single availability zone or an entire AWS region that was having a problem, you could instead be working to fail over to a different location instead of wasting valuable incident retime checking Twitter or looking over your last 200 commits.Part of the problem, and the reason this is the way that it is, is that unlike rusting computers in your data center currently being savaged by raccoons, things in the cloud break differently. You don't have the same diagnostic tools, you don't have the same level of visibility into what the hardware is doing, and the behaviors themselves are radically different. I have a half dozen tips and tricks on how to monitor whether or not your data center's experiencing a problem r...

Dec 11, 2019 • 11min
reInvent Wrap-up, Part 3
AWS Morning Brief for Wednesday, December 11th, 2019


