Arbitron Questions Continued
PPM (Portable People Meter) Validity
Doc: In one of your answers, you mentioned that Arbitron is going to have an independent check of the validity of the PPM (Portable People Meter) after the reported problems with the PPM in New York (where some radio stations listed at the top in diaries, dropped significantly when the PPM was used). You have also mentioned other things about the PPM.
I don't have a research
background, so I rely solely on you for information about research and I have
learned a tremendous amount from you. Would you please give me your opinions
and thoughts about the PPM, the New York PPM problems, and the validity test?
Thanks in advance. - Anonymous
Anon: You rely solely on me for research information? Uh-oh, now I'm nervous. However, I'm glad you have learned a lot from the column. Thanks, and on to your question . . .
This isn't going to be a short
answer, so you may want to get a 6-pack of your favorite beverage while you're
reading all this. I'm sorry if I go astray a bit, but your question opens up a
variety of things. Here are some of my opinions, thoughts, and comments, for
what they are worth. I numbered each paragraph so it will be easier to
address a specific follow-up question if you have one.
Arbitron started using diaries to gather radio listening information in 1949, and there have been complaints about the company's procedures since that time. However, complaints about Arbitron primarily come from radio station owners, managers, and PDs (and others) whose radio stations don't perform well in ratings. In more than 30 years of radio research, I can't recall ever hearing a complaint about Arbitron from someone whose radio stations is/are doing well in the ratings.
The diary procedure used by Arbitron has been found to be valid over the years. When it comes to research, validity means that the measurement instrument actually measures what it is supposed to measure. In other words, after decades of analysis, although there is some error involved, the Arbitron diary procedure has been found to be a valid way to gather information about people's radio listening habits. (Note: All behavioral research involves error, so the fact that Arbitron diaries contain some degree of error, is expected.)
Another important point in research is reliability, which refers to the consistency or stability of the results provided by a measurement instrument. A reliable measurement repeatedly provides similar results under a variety of condition.
However, reliability has been a problem for Arbitron over the years because the company uses different samples for its ratings periods. In one book, a radio station may be in the Top 10 (for example) in the market, but drop out of the Top 10 in the following book. Why? Well, the radio station may have lost listeners, but the drop was more likely caused by the differences between the two samples used for the two books. That's why it's not correct to compare one Arbitron book to another unless the data have been converted to Z-Scores. A comparison of "raw" shares from one book to another is similar to comparing apples to oranges. It's not a legitimate comparison.
What do we have so far? Arbitron diaries are valid, but not necessarily reliable. Hmm. One way to correct that problem is to use the same respondents for a long time, similar to what Nielsen does for its metered ratings where respondents are in the sample for five years. That's the main reason why Nielsen TV data are so consistent.
Another research topic important in the discussion of radio listener data is Method-Specific Results. What this means, in brief, is that the results of a specific behavioral research project may be based on the research method used—the results are specific or unique, to the statistical methodology used to gather and analyze the data.
Here's an example I just thought of that doesn't relate to radio research. Let's say that we are interested in measuring the height of a group of randomly selected people, and the way we decide to measure height is to sit at a local 7-Eleven store and gauge customers' height as they go through the front door where there is a measurement scale taped to the door frame. (The tape is there to help allow the workers to help identify the height of someone who robs the store.)
We collect hundreds of measurements and conclude that the average height of people visiting the 7-Eleven is 5 feet, 7 inches. Is that accurate? Is that a "real" number? Is the number valid and reliable? The answer is "probably not" for all of those questions, but the number is "real" from the perspective of how it was calculated. The number is specific to the measurement approach of watching people pass by the height tape on the door frame.
With Method-Specific Results in mind, now it's time to address Arbitron's methodologies—the diary and the PPM. From what I have discussed already, we should be able to conclude at least one thing: Diaries and the PPM are different research methods and, therefore, should be expected to provide different results. With everything researchers know about the differences in research methods, we can't expect the results from the two procedures to be the same. If they are, that's great, but if they are different, we can't be surprised.
OK, so no problem if the results are the same. However, that's not what Arbitron is finding; the company has reported different results for the PPM. Hmm. Which method is correct? Has the diary always been wrong? Is the diary really an invalid procedure to measure radio listening? Or, is the diary the correct approach and the PPM approach is wrong? Or, are they both correct and the PPM needs a few adjustments? I don't know, and that's why the PPM needs a validity check. If the results are dramatically different, as reported in New York and other markets, then it's necessary to determine what's going on.
It's interesting that there are so many complaints about the New York PPM results. People are saying things like, "How can we drop from #1 to #10?" (Or whatever.) The first thing I thought of was—I wonder if the people complaining about the PPM results have also complained about Arbitron's diary results? If so, in their complaints, they are implying that the diary results are/were correct; the PPM results are incorrect. That's rather interesting.
The validity check of the PPM approach is a good idea. Arbitron needs to find out if the PPM actually (correctly) collects information about radio listening. If the validity and reliability checks are positive, then I think the PPM is a good idea because it takes the respondent out of the process or recording which radio stations they listen to and for how long.
Another thought . . . As with all behavioral research, there are errors in Arbitron ratings. Arbitron knows this and provides sampling error figures to help users interpret the audience estimates. The problem is that no one ever interprets Arbitron numbers with sampling error. The numbers in the "book" are real. That's wrong. Every number in an Arbitron report includes sampling error (along with measurement error and random error), but no one uses the data the way they are intended—the data are estimates that must be interpreted with sampling error.
Ad agencies and radio station people are to blame for the problem of misusing Arbitron data. For example, an ad agency will dictate that it wants to buy the "Top 3" or "Top 5" radio stations for its advertising purchase. The problem is that the Top 3 radio stations shown in a report may not be the Top 3 radio stations. If used correctly, an advertiser who wants to be the Top 3 radio stations should buy at least the Top 5 and maybe the Top 6. Why? Because considering sampling error, the number 4, 5, or 6 radio stations may actually be #1, #2, and/or #3. Buying only the Top 3 radio stations doesn't make sense. It's an incorrect use of Arbitron's information, and it's a blatant misuse of the data.
Radio station people never look at Arbitron data with sampling error considered. The numbers are real. A radio station listed at the top of the list is considered the #1 radio station, but it may not be the #1 radio station. It actually may be farther down the list. A radio station listed as #3 may actually be #1. But no one cares. The numbers are what they are and that is how they are interpreted. Too bad.
Many radio people perceive Arbitron as an enemy. I don't work for Arbitron, nor do I receive anything from them, but I know many people there and I can say without hesitation that their goal is to provide valid and reliable information. The PPM was initiated because the company wanted to try to find a better approach (if one exists). PPM may or may not be the answer, but it's important for the users of the information to allow Arbitron to determine the validity and reliability of the PPM approach and expect that there will be a few glitches in the system during its startup phase.
Does that answer your questions?
P1s, P2s, P3s
Doctor: Could you please give me an explanation on what P1s, P2s, and P3s mean when talking about a particular stations audience? Thanks. - The Great One
TGO: The “P” is an Arbitron term and stand for “Preference.” For example, P1 indicates the radio station a person listens to most often (First Preference), P2 indicates the radio station a person listens to second most often (Second Preference), and so on.
Theoretically, you could have any number following the P, such as P10, but in reality, most people usually listen to three or four radio stations, and P1s and P2s are the most important.
Keep in mind that the “P” designation doesn’t tell you anything about how much time in involved. A person could be designated a P1 if he/she listens to a radio station for only 10 minutes during a given week. If that’s the station the person listened to most often, then it is the person’s P1 radio station.
P1 History: To P or Not to P—That is the Question
Doc: It seems that sometime in the past decade, I woke up to find that stations were classifying their audiences as P1s, P2s etc. All of a sudden everyone was P-ing. When did this start? As ever. - Jerry Gordon
Jerry: I couldn't remember when the "P" designation began or who first used it, so I sent your note to Pierre Bouvard, President, Sales and Marketing at Arbitron. I have known Pierre for a few decades and he always helps me out. I'd like to thank him once again for the information. He said:
In the late 80s, a gentleman named Gary Donahue created a service called "Fingerprint." It was a very intense analysis of Arbitron's mechanical raw data that used "preference level" analysis that packaged goods folks have used for years. First preference referred to consumers' most used toothpaste, etc. This was followed by Second Preference, and so on. Gary marketed the report, which was very detailed, and it showed P1 diaries by Zip, age, and more.
He began selling this to stations and it was a huge hit. People loved the P1 concept and were amazed at the 30/70 ratio—30% of a radio station's cume are P1s and they drive 70% of the AQH. People quickly adopted Gary's P1 and P2 lingo. Soon after, he entered into a sales and marketing arrangement with Arbitron, where we sold his reports and he provided follow-up on service. Sometime after that, we bought the whole concept from him and began incorporating the Preference Level analysis into software applications like our Programmer's Package and PD Programmer's Advantage Service.
Today the term "P1" is nearly universally used, and it is Gary Donahue who created the concept and brought P1 to radio.
There ya go, Jerry. Now you know.
Hello Doc! I remember hearing that Arbitron defines a successful radio station as one that has 36% of its diaries coming from P1s. In your esteemed and experienced opinion, is that true, or is it crap? Since a listener that records just three quarter-hours of listening could be considered a P1 to my radio station (2 to my station and 1 to my competitor), I'm suspicious of the 36% figure. Your take, as always, is appreciated! - Anonymous
Anon: Here is my "take" . . .
Although many researchers, consultants, and others have discussed this P1 percentage, there is some misunderstanding with the information. While many of these people say that a successful radio station has 36% P1s who account for 70% of the radio station's listening, the Arbitron report says nothing like that.
If you go to the source of this information (click here) and go to Page 6, you'll see that the 36% number does NOT relate to successful radio stations. The number is merely an average of all radio stations in the 2003 Arbitron report.
So . . . If someone tells you that your goal should be to have 36% P1s because that's what successful radio stations have, the person needs to read the report. If you achieve 36% P1s, you have matched the average of radio stations in the 2003 report. That's it. No Arbitron "crap" as you suggest.
Predicting Arbitron Numbers
A person I know says that he can predict our radio station’s Arbitron numbers. He charges $5,000 for the information. Is there any validity to this? - Anonymous
Anon: You’ll notice that I substantially changed your question by eliminating names and radio station call letters.
Well, let me think about this. OK. Done. I’ll do it for free. The person you know is yanking your leg.
Programming Strategy from Ratings Data
Hi Doc: Could you throw some light (neon, please) on how best a programmer for a commercial FM music (as opposed to pure talk) can use ratings data most productively?
I feel envious when the sales team uses the data to the fullest. I feel I can do more justice to all those numbers, but I am a weak cruncher! - Anonymous
Anon: I don't know any programmers who use Arbitron numbers to program their radio station. Why? Because Arbitron information provides estimates of radio listening: who, how many, and when; Arbitron doesn't tell you anything about why people listen to specific radio stations. The differences are important in answering your question. Allow me to attempt to explain.
As a PD, you are responsible for all content on the radio station. It is your responsibility to find out what the listeners want so you can give it to them.
The sales team is responsible for selling time on a radio station to local and/or national advertisers. The clients who advertise on a radio station are really only interested in how many listeners they reach with their ads. They really don't care too much about the specific content on a radio station. (I'm sure you know this.)
So, what does this mean? It means that Arbitron ratings information is actually most useful for sales people, not programming people. Oh sure, Arbitron provides an indication of which radio formats are successful and which ones aren't and which dayparts are successful and which ones aren't, but that's about all a PD will get from Arbitron data. Arbitron doesn't give you the why behind the numbers, and that is what you need.
Let me repeat . . . Because Arbitron provides only who, how many, and when information, and that is what radio sales people need, Arbitron is most useful for the sales department and that's why they develop so many "neat" looking sales sheets and sales presentations. However, if you look closely at all that stuff, you'll see that it's only who, what, and when data — it has to be because that is what sales people sell to clients.
Programmers like you, on the other hand, can't do a whole lot with Arbitron. Sure, you can say that your radio station (or daypart) is #1, #3, or #12 in the market, or within a specific age/sex cell, but that's about it. However, regardless of your rank, Arbitron doesn't tell you why your radio station (or daypart) is in that position. The only way to know why your radio station or daypart is or is not successful is from your radio station's perceptual research.
Arbitron ratings information is a litmus test or a success "thermometer" for PDs. The data tell PDs if their programming is successful or not. The information doesn't tell PDs what to do to become successful. That comes from other places.
Don't feel as though you are deficient in your use of Arbitron data. Arbitron is primarily for sales and won't help you much with your programming decisions.
PUR (Persons Using Radio) and Arbitron's Thursday Start Day
Here are two “widely held beliefs” concerning Arbitron and listening levels:
1. Listening levels, or PUR
(Persons Using Radio), is highest on Thursday and Friday.
2. PUR is highest on Thursday and Friday because of the methodology—the diary begins on Thursday, and therefore a respondent is most likely to fill it out on the first or second day of the diary week.
Is there any empirical evidence to support either of these widely held beliefs? Any Arbitron studies, your own research, or any scholarly research? Obviously, many programmers/consultants/managers make some big dollar decisions based on these widely held beliefs.
And here is your follow-up:
Hah! I stumped you! No, really, I have a feeling that absolutely no evidence exists that listening levels actually go up on Thursday, Friday, etc. It's a "chicken and egg" argument. The assumption becomes that if the Arbitron diary began on, say, Tuesday, listening "levels" would be higher then. But, I would wager that no precise and comprehensive study has been done (across numerous markets and over time to check validity and reliability) to actually prove this hypothesis. And, I would also wager that no broadcaster has really cared to put the question in a perceptual, for lack and time and because it's really not actionable. So, I believe we will continue to drift in this orbit of assumptions. - Anonymous
Anon: Before I get to answering your questions, I have a few upfront comments:
1. I don’t know who submitted this question because the person did not sign his/her name, but the questions are excellent.
2. The original question was sent to me in on April 18th and it has taken me one month to provide a response. Here’s why: Since I started this column in January 2000, I have sent Arbitron related questions to Bob Michaels, who was the VP Radio Programming Services for Arbitron. He has always been very helpful and I want to thank him for all his help. However, on the same day that you submitted your first question, Bob sent an email to me saying that he was promoted to a new position with Arbitron’s Portable People Meter endeavor (Vice President, PPM Programming Services). Uh-oh, no Bob to answer questions.
But, no problem, since Gary Marince took over Bob’s position as VP Radio Programming Services, and I sent your question to him. The delay in getting to your question was due to Gary’s new activities and I didn’t want to press him for the information. However, he came through and I want to thank him for his help.
Gary sent your question to John Snyder, Arbitron VP National Group Services, who did an excellent analysis using Maximi$er to answer your question. First, John analyzed the Spring 2004 Arbitron information for the entire country—455,073 in-tab diaries. A good sample. He looked at Average Quarter Hour Persons (AQH) for the 12+ population.
However, once I saw that data, I asked John if he would run the same analysis for another book. He graciously agreed and ran the analysis for the Fall ’04 book. I wanted to see if Spring was unique or if the data would remain consistent in another ratings period. The Fall analysis includes a sample of 467,458 in-tab diaries. Once again, a good sample.
3. In my 5+ years writing this column, this is the longest answer I have ever written. In addition, it is probably one of the best since it addresses several “wives’ tales” about recorded listening in Arbitron diaries. Now, because the answer is so long, you may want to get a 6-pack of your favorite beverage. I gar-OWN-tee that you will be looking at the data for a long time.
4. Before you continue with this answer, I think it would be best for you to download the Excel table that contains the Arbitron data. This will make it a lot easier for you to follow along. To make it easy for you, I already highlighted the print area on the table, so just click on the link and hit “Print” when the table shows on your screen. I didn’t include printing the color graphs. You can do that on your own if you’d like to.
To get to the table, click here.
If you printed the data, you’ll see that the numbers are big. That’s because the data are from the entire country. However, please note that the numbers are truncated, which means that the last two zeros are eliminated (just like in Arbitron books). So, for example, you’ll see that the reported number of quarter hours for Monday (Spring 04) is 365,057. This is actually 36,505,700. That’s a lot of peeps.
The first thing you should notice is how consistent the data are in the two books. I was amazed to see this and it’s very strong evidence that the data are highly reliable.
OK, so let’s get to your questions and find out what the data say:
First, for the week Monday-Friday, 6a–Midnight, the average AQH for all five days is 38,348,900 AQH Persons. That is, during any given 15-minute period Monday-Friday, about 38,348,900 people 12+ are listening to the radio somewhere in the United States.
The highest AQH is Thursday with 42,829,100 AQH Persons; the second highest listening day is Friday with 39,770,200. However, Monday (36,505,700), Tuesday (36,366,900), and Wednesday (36,272,800) aren’t far behind Friday’s listening, so while Thursday does lead the five weekdays, the differences aren’t dramatic.
Therefore, the answer to your first question is “yes,” Thursday is the biggest listening day during the week Monday-Friday, and Friday is the second biggest reported listening day. However, the remaining days aren’t exactly “lame.” Wednesday is the lowest listening day, but it’s only about 16% behind Thursday’s listening—that’s not a huge amount.
Next, you may ask, “Which hour is the most listened to hour during the week, Monday-Friday?” John answered that too. The most listened to hour is 7:00a – 8:00a, although it’s lead isn’t substantial—close behind are 8:00a – 9:00a and 12:00N – 1:00p.
One more thing…John took a closer look at the biggest day (Thursday) and the biggest hour (7:00a – 8:00a) to find out which quarter hour in that hour is the biggest. The data show that 7:30a – 7:45a is the most listened to quarter-hour in the Monday-Friday week, although the other quarter hours during that hour aren’t far behind.
So what do we have here?
1. Thursday is the most listened to day of the week. (Friday is second)
2. 7:00a – 8:00a is the most listened to hour. (8:00a – 9:00a is second)
3. Thursday from 7:30a – 7:45a is the most listened to quarter-hour. (7:45a – 8:00a is second)
Now, before you get carried away here, I need to state a few things.
1. These data are from the entire country and include all listeners 12+. These listening characteristics may or may not reflect the listening in your market and for your target demo. However, if I were forced to speculate, I would say that the Arbitron data provide strong indications that the listening trends would be similar in most (not all) markets. But…if you want to be sure, you need to check your own market.
2. These data cannot definitively prove the significance of the Arbitron diary starting day (Thursday), because that wasn’t part of the analysis. (The only way to do this is a controlled study where the start date of the diary period is varied.) However, if we look at the data closely, the numbers seem to suggest that this is true. If you look at the AQH listening numbers Monday through Friday, you’ll see that Thursday has the highest listening, followed by slightly declining levels for each following day—Wednesday has the lowest reported listening. In other words, Thursday’s reported numbers are the highest because Thursday is the day when the Arbitron starts. Sounds good, eh?
That seems to make sense. Diary keepers are probably initially excited about being involved in the study and they religiously record their listening. It also seems like fatigue and/or lack of interest may enter into the equation since each day after Thursday shows a drop in recorded listening. It seems like we have finally verified the “widely held belief” (as you say) concerning Arbitron and listening levels. Or, have we?
Wait a minute! Uh-oh. We gots a problem here. A wrench has been thrown into the works. There is a fly in the ointment. There are ants in our pants. OK, that’s enough, but I do know there is a problem.
During my numerous emails with John Snyder, I mentioned the idea about Thursday being the highest recorded listening day since it’s the first day of the diary. He essentially lowered the hammer on that idea when he said that the Portable People Meter data, which come from a panel that starts on different days of the week show the same thing—Thursday is the highest listening day. What? Gag me with a beaker! Can’t be! Something’s wrong. It’s a radio “widely held belief.”
Sorry, but it’s not true, and that virtually destroys the “widely held belief” that Thursday is the biggest day since it’s the first of the diary period. The belief has been flushed down the toilet along with countless other “widely held beliefs” in radio.
3. So why is Thursday the highest listening day? What makes it so special? John Snyder offered three possible reasons: (1) The methodology of starting on Thursday may have some effect; (2) Many PDs apparently program for that day including such things as contests or special programming; and (3) Radio listening tends to be higher at the end of the week, especially among certain formats (as was seen with PPM in Philadelphia).
In other words, there may be some affect to the Thursday start date, but there are other more significant variables influencing the listening levels.
In summary, here are the questions/statements you posed followed by my comments:
1. Listening levels, or PUR (Persons Using Radio), is highest on Thursday and Friday. The Fall 2004 and Spring 2004 data indicate that this is true. However, recorded listening for the remaining days is also high.
2. PUR is highest on Thursday and Friday because of the methodology—the diary begins on Thursday, and therefore a respondent is most likely to fill it out on the first or second day of the diary week. The start date may have some affect, but there are other more significant variables that contribute to the phenomenon.
3. OK, but is there any empirical evidence to support either of these widely held beliefs? Any ARB studies, your own research, any scholarly research? Obviously, many programmers/consultants/managers make some big dollar decisions based on these widely held beliefs. The information is here, but I can’t imagine that programmers/consultants/managers would make “some big dollar decisions” based on such information. Programming to (or around, or whatever) the days of an Arbitron book makes no sense. Because we are so poor at predicting the behavior of radio listeners, a radio station’s programming should be special every day. Any person promoting (selling) the idea of “programming by the diary” needs to take a few months off and search for dinosaur bones.
Even if the Thursday start date does have some limited influence, it isn’t logical to program only for this day.
4. Hah! I stumped you! Yes you did. Temporarily. I have never been afraid to say, “I don’t know,” and I have never been afraid to pursue something until I find the answer.
5. I have a feeling that absolutely no evidence exists that listening levels actually go up on Thursday, Friday, etc. Well, you now know that your feeling is wrong since the data are now available.
6. But, I would wager that no precise and comprehensive study has been done (across numerous markets and over time to check validity and reliability) to actually prove this hypothesis. Sorry, you lost the wager.
7. I would also wager that no broadcaster has really cared to put the question in a perceptual for lack and time and because it's really not actionable. So, I believe we will continue to drift in this orbit of assumptions. Most broadcasters today don’t do any research (because they know all the answers), so including a question about daily listening is a moot point. However, I agree that the information probably isn’t “actionable” since there is no logic to programming a radio station according to the start of an Arbitron ratings period.
What’s my final answer, Regis? It is this…
While the Arbitron diary data show that Thursday is the highest recorded listening day, there is no proof that the higher listening is due to the start of the ratings period. In fact, the contrary is true since Arbitron’s PPM data (where respondents start participation at different times of the week) also show that Thursday is the highest listening day.
There must be something else going on, and the something else is probably three things: (1) PDs program for a Thursday start and, in some way or ways, make Thursdays more appealing to listeners; (2) Radio listening levels tend to be highest at the end of the week and Arbitron just coincidentally starts at the end of the week; and (3) There is probably some fatigue/boredom affect since the diary last for seven days, a point that PDs should use to their advantage…Keep the programming good all week long and there won’t be any reason to tune to another radio station or turn the radio off.
Is there a benefit to programming for a Thursday start date? I can’t think of one thing.
Thanks to John Snyder for all his help.
PUR - Response
Doc: The longest answer ever! I'm honored to now be in the "record books" of the Research Doctor You went to an incredible amount of work, and I can't thank you enough. You answered so many questions, and more! Though I am not in the sales world, this is probably NOT an answer that we would want to show clients, because we have enough trouble with "end of week" inventory now. If they even smelled the possibility that listening levels actually rise end of week (for whatever reasons), we'd really be in a pickle. I wish you and your lovely granddaughter all the best. - Anonymous
Anon: You’re welcome for the response. My pleasure. I will alert my granddaughter about your wishes. Your point is well taken, but I have an addendum to my first response….
While some people may interpret the Arbitron listening information as negative, I think the data provide extremely positive information for PDs. Now, this information probably won’t be relevant when Arbitron’s Portable People Meter is in use, but it’s good for now.
The positive is that since we know that listening levels tend to decline as the week progresses, it would behoove (neat word) PDs to do something special on Mondays, Tuesdays, and especially Wednesdays. I know that I said that every day should be special, but there is an opportunity here for PDs to have an affect on listeners’ diary entries by “messing” with the final days of an Arbitron book.
Oh, I still don’t know who you are, but don’t make me come out there by asking another question that takes one month to answer.
Click Here for Additional Arbitron Questions
All Content © 2012 - Wimmer Research All Rights Reserved