
Statistically Speaking Supermarket Sweep: Measuring inflation with over a billion scanner beeps
Miles and guests unpack how the ONS is collecting the prices from more than a billion supermarket checkout and online sales to measure UK inflation.
Transcript
Scanner data podcast transcript
Miles:
Hello and welcome to statistically speaking, the official podcast of the UK's Office for National Statistics. I'm Miles Fletcher, and in this episode, we're taking an in depth look at a very big change in how the ONS produces its estimates of inflation, no longer the sole preserve of clipboard wielding prices collectors roaming the supermarket aisles. The digital revolution has now fully arrived. From this month, the UK's inflation indices are now partly based on millions of prices data gathered directly from the tills or scanners, to be precise. How is it all done? What is the role of Taylor Swift in all this? Yes, there is one. And what are the benefits for economists, decision makers and all of us ordinary folk who worry about the cost of living. Here to unpack it all for us is Mike Hardy, who has led the project here at the ONS, and top economist and former member of the Bank of England Monetary Policy Committee, Jonathan Haskel, professor of economics at Imperial College London.
Professor to start with you: to understand what's changed, it'd probably be helpful to remind ourselves how consumer prices inflation has until now been calculated. Essentially, it was the ONS and its agents checking the prices of 1000s of items on a monthly basis to see how they changed.
Jonathan:
Yeah, that's right, and the ONS has gone to an enormous amount of effort in order to make that collection representative and make it consistent. But of course, in the modern era of scanner, data, computers, e commerce, things like that, there are other ways of doing it. I guess the important point, which Mike can talk about some more, is that one of the things that we know from statistics is that having a big sample isn't necessarily going to be better if you have a representative sample to start with. So I think one of the interesting points about all of this is whilst the scanner data is collecting many more data points, it's a fascinating check on the representativeness or otherwise of the ONS survey and the procedures thus far as to whether the actual average of all of that will turn out to be very different or similar to what's done before. It's a great advantage to have all of this extra data, but one shouldn't overstate either the advantage or use it as a way of rubbishing what the ONS has been doing in the past.
Miles:
So to put it this way, perhaps then, what the ONS has traditionally been doing in collecting prices, is to take this big monthly snapshot of retailing and prices, and what people have been paying for items. What it's got now, what it's moving to in the digital age is moving from a still picture, perhaps to a rolling 4k video, and from that, it can find out exactly what has been missed out of the inflation calculations previously.
Jonathan:
Well, I'll just put a little bit of a spin on that. One of the things that the price collectors do, and they're very, very careful to do, is to make sure that that snapshot is consistent across the snapshots, if you sort of see what I mean. So there is a bit of a rolling element to those snapshots already, because, for example, if you're going to collect the price of, let us say, Ladies jeans, which is something that I was doing with the price collector recently, you want to be sure you collect the price for the same good over time. And the point about the price collectors is they're extremely conscientious about making sure that, in the case of ladies jeans, they are coloured blue. They've got either a flared leg or not a flared leg. They've got the same number of pockets, they've got the same amount of stitching, they've got different decorations on them. To make sure that those goods remain the same is actually very important, and that's something actually which the hand collection can do. And as I say, I think that means that the snapshot element is maybe not quite the right metaphor, if I may say, miles. It is the relation. It's a consistent element over time, a consistent snapshot if that's using a metaphor
Miles:
That said it's more than just blindly following the same list of products every month. But nonetheless, the traditional way of doing things has had significant. What do you think those are?
Jonathan:
I guess the limitations are that when one is collecting a sample, any kind of sample at all, one is always doing one's best to try to hope that that's a representative sample, and having more data then is going to help if it turns out that the sample is unrepresentative. So I think that's one part of it. I think the other part of it is, of course, it's becoming increasingly costly to hand collect these numbers, and you know, like any public agency, one wants to be as careful as one possibly can with taxpayers money. That's the sort of second thing. And the third thing is, especially in the era of the Internet and dynamic pricing and so forth, these prices change, you know, at sort of dizzying rates as firms change their prices throughout the product cycle of the good. And therefore the sort of consistent snapshots may miss some of that variation,
Miles:
And all that data, of course, is out there to be learned from, isn't it? Essentially, Mike, is that what the scanner data project has been, it's been all about harvesting that data and using it to produce what's being described as a step change in how inflation is calculated.
Mike:
So we've been transforming our consumer price statistics for some time, and we've been acquiring a wide range of data sources with the aim of improving the quality and granularity of our consumer price statistics. So in recent years, we've used administrative data for rail fares and second hand cars, and we will be incorporating grocery scanner data for 50% of the grocery market, where we will be moving from using around 25,000 prices for those retailers to 300 million derived from the sale of over a billion products, so much more granular and rich information on the prices within those retailers. Importantly, as well, we not only have the price of everything within a store, so we move away from the sample that Jonathan described. So taking the price of a small number of products within each store to collecting all of the prices within store, from supermarket checkouts to also getting a better understanding of how much of each product people are purchasing. So that gives us a much clearer picture of inflation by using these large administrative data sets.
Miles:
Because the purpose here is to get a sense of how the cost of living is changing for people as well, isn't it? And presumably, if you're just checking the same prices of the same goods month after month, you're not understanding about how price changes are influencing people's purchasing decisions. Does it help with that?
Mike:
Yeah, so the scanner data, as I said, gives us a complete kind of picture of all the prices within a store, and it gives us the underlying quantities, so how much of each particular product is being purchased, it also gives us the price at the till, rather than the price on the shelf. So that captures a number of different things that we were unable to capture with the sample data. So the first being if consumers switch from a premium to a value brand, for example, in response to cost of living rising. We pick that up in the scanner data and also discounting, we can better reflect that particularly store cards, because we now understand for a particular product, total spending on that product and the quantity of that product sold, which allows us to get an average price for that product. So we better capture store discount cards, which are available in many of the supermarkets.
Miles:
And so by getting a sense of the changing availability of products and what people are actually spending on them, it becomes much more useful then as a cost of living index as well as simply a measure of price change. Yes.
Mike:
So the way we currently produce our inflation statistics is that we have a large virtual shopping basket of goods and services. There are 760 items in that basket. We set the weights at the start of the year, and we set the basket, and then we track the prices of those 760 representative items throughout the year. What the scanner data allows us to do at a very detailed level for certain is what we describe as consumption segments. So a consumption segment would be rice for example, is to reflect change in consumer spending patterns within that consumption segment. So for example, if somebody changes the type of rice that they're buying, they decide to buy microwave rice instead of basmati rice, or they decide to switch from a premium rice product to a value rice product, then we'd capture that in the scanner data on a monthly basis, whereas in the previous approach, we'd just monitor the price of a small number of products. So maybe we would monitor the price of microwave rice and basmati rice just over the year. But now with the scanner data, we have the price of all rice sold within a store, and we can reflect people's changing consumer spending patterns when purchasing a particular consumption segment, which I've described here as rice.
Miles:
And that in turn, I guess, can also influence the way you weight the index as well in future, because that's a fundamental part of calculating inflation that perhaps a lot of people don't fully appreciate.
Mike:
Yeah, so it'll still be a fixed basket, but the lowest level of aggregation, so that the most detailed data that we have - that data that's coming in from retailers, where we have the kind of total sales plus the quantity, which allows us to derive a price or a unit cost within that calculation - we would reflect kind of change in weights at that very detailed level. But it will still remain a fixed basket that most of our stakeholders are familiar with, because at a higher level in the aggregation, we constrain the weights at the start of the year.
Miles:
What was involved in getting these changes so far? It sounds like a huge project, and presumably, first you had to get the retailers on board.
Mike: Yes, that in itself was a huge undertaking. We've been engaging with the grocery sector for a number of years. We have the Digital Economy Act in the UK, which is a legal gateway for us to access the data. But instead of using the legislation, we wanted to work collaboratively with the retailers. So we started by engaging with them and requesting the data naturally. They had a range of questions about what we needed the data for, how we were going to publish it, how it was going to be stored. We needed to ensure that we were kind of meeting all of their requirements in terms of them transferring across the data, and they needed to be confident that we are using it for the public good, and there wouldn't be data leaks from their perspective, so that the data are tightly controlled.
So that was a process in itself, which took a number of years. Then the existing systems that we have at ONS had to be moved to the cloud because they just simply weren't capable of processing the millions and millions of data points that we have for scanner data. And the scanner data is very different in nature to sample data, so we had to develop a wide range of methods to use the data. So we have two advisory panels. Jonathan's actually Chair of our stakeholder panel, but we also have a technical panel as well. So over the number of years that we've been developing this project, they've been advising us on the methods that we should be using. We've also been engaging with international experts and other national statistics institutes as well. So we had to, you know, gain access to the data. And that's a fairly new area for ONS, commercial kind of data partnerships. You know, we had to move all of the existing IT infrastructure to the cloud, because you need to aggregate the scanner data with the kind of locally collected data, so the data collected in stores.
And we also had to develop a wide range of methods as well. The challenge here as well is that our risk appetite is close to zero with consumer price statistics. They are used to inform pensions, benefits, taxes, student loans. So we need to get the numbers right. So our risk appetite has been quite low, and that's quite a difficult tension to manage when you're kind of trailblazing in a number of different areas, whether that's data acquisition systems and methods, but at the same time, you need to ensure that you get the numbers right. So we've ensured that we've taken some time to make sure that, you know, we're comfortable with the methods that are in place and the processes to be able to produce our inflation statistics on a monthly basis.
Miles:
And one of the reasons, I guess, it's taken a few years to get all this in train and deliver the results is you had to check whether or not your new estimates of inflation were going to be radically different from the ones that have been published already, because that would itself have had some pretty profound consequences, wouldn't it? How did you provide that assurance?
Mike:
We make changes to our consumer price statistics every March. That's when the basket is updated. That's when the weights are updated. So going back to the beginning of last year, we felt in a reasonably good position to implement scanner data at that point in time, we were nearly ready, but working with the stakeholder advisory panel and broader group of stakeholders. So over the last year, we've been parallel running the data in the background. So every month, we obviously publish the numbers, and then alongside that in the background, we've been producing prices index and other measures, including grocery scanner data and cross checking it against the published estimates.
Miles:
How closely do they align now?
Mike:
Very well. So the impact at a headline level, for the duration of the impact analysis, which is from 2019, up to pretty much the current period, is kind of negative 0. percentage points for CPI. So that is a small impact at headline. We should note, though, that in 39 of the 66 months, the headline rate would have been different, albeit slightly different, in most of those months. A number of takeaways from this. I think we can have confidence, as Jonathan said, in the kind of sample approach that we currently take at a headline level, you know that's robust. We have a good sample design. What the scanner data allows us to do is get deeper insights into what's driving inflation. So over that period, we tended to find that inflation was slightly higher at the beginning of that period, and then lower from 2022 onwards. And that's why the average over the period is small, because there's an element of off setting. But I'll give you one example of where we've been able to provide deeper insights. So from 2022 onwards, where inflation, utilizing the scanner data was slightly lower, in some of the categories, actually inflation was higher, such as bread and cereals and oils and fats. And that can be attributed within those categories to breakfast cereals and margarine, which are products affected by the Russia, Ukraine war. So we're seeing the impact at a more granular level using the scanner data and being able to better capture changes in price,
Miles
Did it have a slightly predictive effect? Then you could spot early examples of inflationary pressures coming through better than has been possible before.
Mike:
Well, I wouldn't say early. You know, our role is to produce inflation for periods in the past, forecasting inflation is more of a job for the Bank of England, but at a more detailed level, yes, we could definitely see insights that we were not able to see with the sample data that we were using prior to the implementation of scanner data.
Miles:
And Jonathan, from the economist point of view, looking at the new scanner data driven inflation estimates and the path of price changes that that reveals, does it materially change the macroeconomic story?
Jonathan
I'm not sure it does actually miles, which may come as a big disappointment to listeners to this podcast who are thinking: well, why has Mike and his team put in all this effort? But in a sense, it's actually a very good result because it goes back to what I was trying to say earlier on, which is it suggests that the sampling frame that the ONS were using to sample a subset of all these prices was actually a pretty well chosen frame. So on the sort of headline kind of effect, it doesn't change things much. Where it does change things is in the detail, as Mike has just been saying, and especially since this is groceries data around food inflation and food prices.
And the reason, Miles, I find that important, and I think the community of economists will find that important, is I had the privilege of being on the Bank of England's Monetary Policy Committee up until a year and a half ago, and especially Mike mentioned it during the war in Ukraine, we were very attentive on the committee to changes in food prices, because changes in food prices turn out, the evidence suggests, to be extremely salient to consumers when they're thinking about, you know, how their cost of living is really affected. So since there's going to be much more colour on how it is those food prices have changed, I think that's going to help policy makers, for example, at the Bank of England, get under the hood a little bit of the types of price changes which are very salient to consumers.
Miles
So would it be fair to say then, looking back as a member of the Monetary Policy Committee, thinking about potential changes in interest rates, it might not necessarily have led you to make a different decision, but it would have made you better informed or more confident in that decision.
Jonathan
I think that's right. I don't think we would have changed our decision. And in any case, one never makes policy decisions in hindsight, you know. What we know now about the covid vaccine, we didn't know then, and so of course, we would have made a different decision, but we didn't know that. Now, I don't think it would have changed the decision, but as I say, I think since especially these food prices are so salient to consumers, it's going to allow the current Committee, which you know, again, to be clear, I'm not on, so this is just me speculating. It's going to allow the current committee to have a much better view as to what these various price changes are and what it is consumers are doing. And that's going to turn out, I think, to be extremely, or potentially extremely important. Because if the current rise in energy prices is maintained for a long time that might well feed through to food prices in various ways, and then we're going to need all the detail that Mike and his team are providing in order to make better policy.
Miles
Lots of information about price changes here, but also we're getting an insight into sales volumes as well. Are we not, Mike, even though we're not at this stage actually using these data to compile the retail sales indices?
Mike:
So we're not at this stage, the focus is consumer price statistics and using the scanner data for 50% of the market in March. And then there is a plan to expand that market coverage moving forwards and onboarding more retailers. There's potential to use these data sources in other parts of the ONS, you know, for example, in national accounts, for household expenditure and for retail sales. And we'll work collaboratively with the retailers if we want to use their data for other statistics moving forwards. But there are certainly benefits to using these data sources beyond prices
Miles
You mentioned earlier work that had to go on with retailers to get them to get their confidence in all of this, and presumably there needs to be a clear message to shoppers, it's not about spying on people's shopping habits?
Mike
No, it's a really good point, actually. So we do not have access to what individuals are purchasing. We just get aggregated data. So for a product that's sold within a store, we know the total value of sales, and we know the number of that product that have been sold, we do not know what individuals are purchasing in store, so we don't have access to the loyalty card data which would give us that information.
Miles
And that brings us to a very important point that's always worth stating about official statistics. Generally, the ONS will never publish anything that discloses the identity of any individual or indeed any retail outlet, so we can't even say which specific retailers are taking part, although you do have good coverage of the sector,
Mike
Yes, so at this stage, we can just say we have coverage of 50% of the market. Co Op are the only retailer that are happy to be named, and we've previously done a press release with them. The other retailers that are included within that 50% have explicitly asked not to be named as a data provider, which we will obviously respect.
Miles
And as you say, it's all put into a big aggregated pot of data anyway, although it does provide some local and regional insights as well, of course, that perhaps weren't there in such quantity before
Mike
It does. So the scanner data that we receive for the retailers that are providing data, we have that broken down by store. So what we do by region is aggregate each of the retailers data together with our local collection data. So I should highlight that that's not disclosive. So it's not possible to identify any retailer within those statistics, but we will be publishing some micro data by region for our stakeholders. I should also note, as well Miles, we've talked a lot about the impact of the headline level being quite small, there are other benefits to this project. So one is that it de risks the ongoing production of consumer price statistics. So some of the systems that we previously had in place had been in place since the 90s and they needed to be moved to a modern alternative so as part of that work we've done that, and also it sets a really good foundation for the future. So now we have the IT infrastructure, the methods in place to be able to use alternative data sources for other parts of the basket. So it gives us a very good foundation to transform our consumer price statistics moving forwards.
Miles
Jonathan, coming back to you and the economist's point of view, what is the potential? We talked already, obviously, about the corroborative value of producing inflation statistics with much more certainty than before, but what do you see as the broader economic value and insights that we're getting from this now?
Jonathan
I think there are two. One is, as I was mentioning before, a more sort of forensic vision about what it is that consumers are doing. The second, though, is a little bit more indirect, but let me put it on the table anyway. Miles, which is, as Mike has been saying, in order to implement this, the statistics agency ONS, or whoever, whichever statistics agency in the world is going to do this is going to need to invest in it and new processes and new equipment and so forth. And that, of course, spreading that good practice over all areas of this, and I'm not just talking about the ONS, I'm talking about any statistics agency anywhere in the world, will be very, very helpful, because, of course, that would improve all of the data collection processes in the statistics agency, and for economists, that would be an enormous boon.
Miles
Mike, can we turn to some of the other benefits and some of the other aspects of the general improvement of inflation statistics that coincide with the introduction of scanner data this month. Intrigued to hear about something called the Taylor Swift effect. Can you unpack what that is and why it's relevant to all this?
Mike
So the famous Taylor Swift effect. This is in relation to hotel prices. So we currently collect previously collected hotel prices on a particular date in the month, and when there was a Taylor Swift concert close to one of the cities that we were collecting hotel prices in, that had an impact on hotel prices on that particular day, so that then, in turn, had an impact on the hotels index, which then kind of fed into the headline measure. So what we've done in response to that is to collect prices on two days during the month, so that those kind of one off events such as a Taylor Swift concert or, you know, a sporting event, do not have such a disproportionate impact on our headline measures of inflation. We still want to capture that price increase, but by collecting on two dates over the month rather than one, we're softening its impact. And I think that's only right because it's not necessarily representative of hotel prices across the UK.
Miles
So you can't blame Taylor Swift for higher inflation. Is the message as it was simply a quirk of how prices were collected by taking those single points, which sometimes happened to fall on days when there were big events going,
Mike
Yes, that's because we were collecting the hotel price on, you know, one particular day during the month. We've already announced that we'll be changing that for the forthcoming year to two dates during the month. But you know, something we may consider over the medium to longer term is whether we use administrative data for hotels, and that would certainly soften the impact of one off events in a particular city, because you'd be collecting data over the entire month. We already collect data across a wide range of locations, but having many more data points would soften the impact of those kind of one off events.
Jonathan
You're right to ask Miles about the Taylor Swift effect on inflation. And I had the privilege of being on the Monetary Policy Committee at the time, and I remember we discussed this. There's bad news and good news so that the bad news is that I was only vaguely aware of Taylor Swift's music. Thanks to my children, I know something about it, but I was no great expert. The good news was that, because the ONS were very open about the collection protocols, which Mike has just talked about, we were actually able, as a committee, to sort of reverse engineer what inflation would have been had Taylor Swift not been there. And that enabled us to essentially look through what was just a volatile bit of the index. So I'm pleased to say that this was an example where, you know, communication between bureaucratic agencies, which can maybe be improved, and often it's maybe not as good as it might be, was a case where, actually, it worked quite well. And I think it's for others to judge, but I think the bank did not make a bad policy mistake.
Miles
Yeah, I guess it's a limitation, isn't it? Of calculating inflation, you've got to pick a day on which to take the sample prices. If you happen to pick the day when Taylor's in town, it's going to have a distorting effect.
Jonathan
Oh, and sporting, as Mike was saying, sporting effects, the World Cup and all that kind of thing. But as I say, I think this is just an example of where, in fact, the lines of communication between the bank and the ONS actually work very well. And as I say, we were very aware on the committee I was on at the time. We're very aware of what the biases were, and I think that's sort of quite a nice, sort of mini lesson for how the bureaucracy worked. Even if our musical taste didn't work quite so well, at least the bureaucracy did function on this occasion.
Miles
It wasn't you there pushing up the hotel prices in Cardiff then! I think we can be fairly, fairly confident of that. But we have another example, though, don't we? A more regular example recently, and that's the collection of airfares. It's similar thing, isn't it, over holiday periods gone?
Jonathan
Well, again, that's exactly right, but again, I sort of hate to be, you know, boring the listeners with a tale of bureaucratic interrelations. But as I say, I think this is an example where the communication between the policy making authorities and the stats agency worked well on this occasion. We're aware of the ONS protocols. They were open with us, but you know that was not to be then pushed on further about how these things are collected. And if you're aware of that, you can then go to the Monetary Policy Committee, or whatever it might be, and make them aware about what all these biases might be. And so what I think is an important effect in the headline, has less of an effect on the chances, as I say, in this in this case of the bank, of making a policy mistake,
Miles
You can make sure these things are priced in as they say but there's a point about public confidence as well, though, isn't there? People are rightfully sceptical, and a lot of people claim there's nothing as misleading as an average particularly when it comes to the collection of prices.
Jonathan
Well, it's both the average and the volatility, which I think is difficult. And so when ONS staff are on the radio explaining what the inflation numbers are, they often get rather held up in some ways of explaining particularly volatile components, such as hotels and such as airfares, and I don't know what that does for the confidence of people in the overall index. I mean, in some sense, that means that the ONS are doing their job about collecting what the index is and sticking to international protocol on all of this, which in some sense is a confidence booster. But I can quite understand that people listening to a description about how volatile these things are, they might just say, Well, you know, I'm really not sure about what this index is telling me. So I think it's a communications problem ultimately as well.
Miles
Well, certainly. Well, we work hard at the ONS to mitigate, as you say, but I guess the fundamental point for people to understand is that the more data that's going in, the more reliable your estimates, at least more comprehensive your estimates are going to be coming out,
Jonathan
But also the protocols that Mike's been talking about, if I may say, about not concentrating data collection on a particular day which might coincide with a Taylor Swift concert. Or in the case of airfares, it might be half term on that particular day, and the airfares are particularly high and so forth. So flexing those methodological issues, I think, is going to help smooth out some of this volatility.
Miles
Okay, in terms of the International picture, how advanced would you say the UK is now compared to other, you know, similar economies and the way it calculates inflation?
Mike
So there are some countries that adopted scanner data, you know, a number of years ago, such as the Netherlands: early adopters. I think they've had scanner data in their consumer price statistics for at least 10 years, perhaps more. Then there are a range of other countries that are in a similar position to us who have recently stood up projects to utilize scanner data. So, you know, during our journey, we've relied heavily on international best practice and working with other NSIs to learn from them and their experiences of utilizing scanner data. And now we're in a position where we're about to implement grocery scanner data. You know, we've ensured that we are also sharing best practice with other NSIs as well, as they start to embark on this journey.
Miles
So not quite the first but among the sort of first wave, then?
Mike
Among the chasing pack, I'd say Yes,
Miles
Jonathan, could I finish off with you then with a question all about the bigger distant future. When you take a big step forward like this and introduce a huge increase in the amount of data, it presents a sort of tantalizing vision of the future, Jonathan, does it not? Where we're able to measure the economy almost in in real time, and the insights that that might be able to produce is that pie in the sky, or do you think we will get there eventually, that you could almost have a daily estimate of GDP, if that was worthwhile, or, perhaps more usefully, real time estimates of household incomes, for example, see how people are getting on?
Jonathan
I think Miles, it's a fascinating conjecture, and my immediate reaction is, we don't want to overstate this again, for the reason I've been saying it's not necessarily the case that more data is better. We want to be sampling representatively which the ONS seems to have been doing, even though it's only collecting 25,000 prices a month. Those 25,000 appear to be fairly representative. On the other hand, if there's a lot of dynamics to those prices, you know, discounts, changes in quality, you know, sort of digital changes to all these various prices. We want us as a statistics agency and as a measurement community, to be picking all that stuff up as well. So that seems to me to be the vision about going on the digital side and collecting all of this: that we can get much finer grain information about these prices. But as I say, I don't want to overstate it. Mike won't take any of the credit, but I'm going to give him some credit. It is down to Mike and his team. As I say, that 25,000 prices turns out to be an amazingly representative sample of the approximately now 300 million prices which are being collected instead.
Miles
Well, it's a lovely fitting note on which to wrap it up, and that is a fitting moment to leave this topic. Our sincere thanks to guests Mike Hardy and Professor Jonathan Haskell. Also to our producer Julia short. It's time for me to say goodbye as well as this is my last podcast before I stand down as head of media for the ONS after 13 fascinating years. But you can expect these podcasts to continue as of course, will the ONS itself. So don't forget to like and subscribe wherever you get your podcasts. Goodbye.
