Arbitron Questions Continued
Sue Arbitron?
One set of results per year, two drops in a year, total 49 diaries and we lose tens of thousands of dollars from agencies that insist on using Arbitron to place their buys. You have stated that the margin of error in such a survey is HUGE! Has anyone ever attempted to sue ARB to force a more accurate accounting of listenership? - No Name or Station Please
No Name: Get a six-pack of your favorite beverage because this isn’t a short answer.
First, you asked me not to use your name or radio station. You need to know that unless you include that information, there is no way for me to know who you are. The email address for you, and anyone else who submits a question to me, is stripped by the All Access server before it arrives on my computer. If you don’t include any identifying information, I don’t know who you are.
OK, on to your question.
Can you sue Arbitron? Sure. This is America and you can sue anyone you want for any legitimate or illegitimate reason. For example, my twin brother (a retired trial attorney) told me about a case that involved a woman who saw one of her neighbors frequently stealing mail from neighbors’ mailboxes. In addition to calling the police to report the thefts, she videotaped the neighbor in the act of stealing mail from the neighbors. The police set up a stakeout and caught the woman stealing mail and she was arrested. Did the thief go to jail? Nope, because she filed a lawsuit against the woman who videotaped her claiming that the woman “violated her personal privacy.” The court agreed and the thief was set free. Rock on.
OK, back to Arbitron. I’ll address each point you mention, but I must add here that I’m not affiliated with Arbitron in any way and have no vested interests in the company:
One set of results per year: I assume you mean “one book” per year, which means you’re in a small market.
…two drops in a year: I’m not clear here. If you only get one book each year, how do you drop two times?
...total 49 diaries: Is this for the market, a county, your radio station, or what?
… lose…dollars from agencies: Ad agencies do use Arbitron results to determine radio station buys. Your statement is correct and the process will continue until another ratings approach is provided and accepted by the agencies. Many companies have tried to develop a new ratings procedure, but none has been successful. However, I need to add that my experience has shown that the people who make radio buys at ad agencies have no research experience and they have no idea what “sampling error” means, or any other research term for that matter. Their job is to buy radio stations according to a guideline given to them (Gross Ratings Points, only the Top 5 stations, and so on). The buyers (most or all of them) don’t know the sample sizes used in the surveys, and probably don’t care. They have a goal—buy radio stations according to the guidelines given to them—and that’s what they do. Do the buyers question Arbitron’s validity or reliability? No, because they (most or all of them) don’t know what these terms mean.
...margin of error…: Yes, the margin of error, particularly sampling error, with Arbitron ratings can be huge. The amount of error depends on sample size and that’s why Arbitron includes several statements about the accuracy of the ratings in each book.
For example, on the first page of every Arbitron book, the Preface says (I have edited the full Preface. You can read the entire statement in your book):
“This report is designed to provide radio audience estimates representing radio listening during an average week in the market…”
“All audience estimates are approximations subject to statistical variations and other limitations. The reliability of audience estimates cannot be determined to any precise mathematical value or definition.”
“This report is intended to furnish radio station, advertiser and agency clients of Arbitron with an aid (my bold) in evaluating radio audience size and composition.”
Finally, in the back of every Arbitron book is a page titled, “Arbitron Radio and Reliability Tables” that allows users to get an indication about the accuracy of the results. However, I doubt that many people use this table; the results printed in the book are considered to be “real” or “facts,” not estimates.
… Has anyone ever attempted to sue ARB?: I did a quick Internet search and found a few mentions about lawsuits, so it looks as though there might be a few. However, I didn’t take the time to read the articles to find out if any of them relate to Arbitron’s methodology.
OK, now to my comments…
First, I understand your frustrations and I’m not dismissing them in any way. I have been involved in radio research for more than 25 years and have seen and heard thousands of comments about research (usually that the results are wrong). That’s OK, but the problem is that many of the people who criticize research don’t know anything (or very little) about research. In fact, much of what I hear is based on some form of urban legend…things passed on from one person to another.
For example, in the area of recruiting respondents for music tests or perceptual studies, some people demand that the sample is recruited by playing music montages for the respondents (who are accepted for the study if they rate the “correct” montage with a specific rating number, such as a “7” or higher on a 10-point scale). Why do these people insist on recruiting with music montages? Because it seems like the right thing to do, or their consultant suggested it, or some other equally obtuse reason.
Yet, the fact is that valid and reliable research in this area documents that recruiting with music montages is one of the worst methods anyone can use. Why? Because the hooks included in the music montage may be wrong. In studies I have conducted, I have seen as many as 35% of a radio station’s P1s disqualify from a study because they rated their favorite radio station’s montage as too low. (A PD, or someone else, puts a montage together that he/she thinks represents his/her radio station, when, in fact, it doesn’t according to what the listeners think.)
I explained that to demonstrate the complexity of research. While many people in the radio industry claim to be researchers because they have research in their company’s name, or they have a website that says, “research,” the fact is that many of these “researchers” have no background in research—none. And I’m talking about some people who are CEOs (or whatever their title is) of some of the biggest research companies.
Off the soapbox…
As I mentioned, I’m not affiliated in any way with Arbitron and have no vested interests in the company. However, I do know many people at the company and I do know that all of them have an honest interest in providing the best research they can produce within the limitations that exist for them. Limitations? Like what?
The primary limitation is cost. You say that you have 49 diaries for your market (that’s what I assume from your comment). I’m sure Arbitron would be happy to have a larger sample size, but their procedures are limited by the amount of money available in your market. I’m sure you know that larger markets have many hundreds or even thousands of in-tab diaries. That’s because Arbitron has more money in those markets—there are more stations to pay the research bill. You have a small in-tab sample because of money, not because of some methodological oversight.
Now, I’m not an attorney, so don’t take this as a legal opinion, but…If you want to sue Arbitron for your radio station’s drop in ratings, or because of a small sample size, you would have to prove some type of negligence, malice-of-forethought, or malicious intent to harm your radio station. That’s going to be difficult, particularly in reference to all of the statements Arbitron includes about the reliability and validity of its ratings.
So what can you do? Yes, a sample size of 49 is small. If this is the Total sample for the book in your market, the sampling error is about ±14%. That is, if your 12+ share is 5.0, the actual share is between 4.3 and 5.7. However, that’s for the 12+ audience. The sample is much smaller if you consider only one demographic, such as Men 25-49, where your market may have only 5 or 6 diaries (check the sample distribution in your Arbitron book). If you have only five diaries in a specific demographic, say, Men 25-49, the sampling error is about ±44%. A 5.0 share in this demographic would actually fall somewhere between 2.8 and 7.2.
Finally, your comments suggest that you blame Arbitron for the decline in your radio station’s numbers. While you only have a small sample size, it may be that the results are real. It may be that fewer people are listening to your radio station. What I’m saying here is that you can’t automatically assume that your decline is the fault of Arbitron’s methodology, but rather it’s based on something at your radio station. I don’t know. But I do know that you can’t automatically blame Arbitron. The small sample may be the problem, but you need more information before you can say that.
Translators
Hi Roger: Does Arbitron add a stations translator listening to overall ratings? We have a translator that simulcasts our main signal. I hope we aren't losing credit for people listening to the translator, even though it is our main signal they hear. Thanks! - Anonymous
Anon: I didn’t know the answer to your question, so I sent it to Claudine Knisley, Manager of Diary Analysis & Communications at Arbitron. I want to thank her for her help. She said:
In general, diarykeepers identify their translator listening by recording the parent station's descriptors (for example, station name, frequency, or call letters). However, in some areas of the U.S., (for example, Arizona), translator frequency entries do come in diaries, and because of this, we have some procedures in place to credit this listening.
While we do not systematically collect this information, we will update our internal files to include a station's translator when the station submits this information to us in writing. Once we receive a request, we will then verify the translator's usage with the FCC, and the station then becomes eligible for credit in its translator-located county only.
On a yearly basis, we review all translator credit. Those that have received "clear" mentions stay on our files. The translator frequencies that have not appeared in the past survey year are removed from our files.
If a station is concerned about listeners—diary keepers—recording their translator frequencies, they can call Diary Analysis & Communications (410.312.8720) and speak to a diary credit specialist.
U.S. Census
Hey doc: I am a little confused with some of the data I am looking at regarding the sizes of markets. There seems to be a huge difference between Arbitron and the U.S. Census about the amount of people in an area. - Anonymous
Anon: Hmm. I'm not sure why there is a difference, because according to Arbitron, its information is based on U.S. Census information.
My guess is that you aren't comparing "apples to apples." Make sure that you are looking at the same area in both data sources. For example, make sure that both areas are "metro areas."
U.S. Census - Part 2
I have a similar question about Louisville, KY. Our city and county governments merged to become, supposedly, the sixteenth largest "metro" in the states. However, according to Arbitron, Louisville is #55 in market size. What gives? - Anonymous
Anon: Assuming your information is correct, my first reaction was that something is "goofy" here. I decided to search the Internet to see if I could find the discrepancy. I found it, and indeed, there is something amiss.
According to this article, Louisville used to be ranked somewhere around 67 in the nation's largest cities, but moved to position 16 when to county governments merged. According to a table in that article, the population in the "city" of Louisville was 256,231 in 2000, but leaped to 693,604 in 2002. What? Something don't be right, and the "something" is that some folks in the Louisville government redefined the "city" of Louisville to include other areas (while the legitimate metro area definition stayed the same).
According to the Spring 2006 Arbitron metro rankings, Louisville comes is in the 55th position with a population estimate of about 923,000. That's the legitimate metro. But the newly defined city of Louisville shows a population of about 693,604. That is, supposedly, the legitimate city—but it isn't. It is a newly defined area. Nothing in the metro changed, which is verified by Arbitron's consistent metro ranking for Louisville, and you can't compare that to the government's new definition (and rank) for the city.
Arbitron's rank is for the entire metro. The new definition for Louisville includes the city of Louisville and some other areas, but not all the counties in the legitimate metro.
Weighting Comment
Doc: My respect for you knows no bounds. You are a brilliant man. But, my logical mind cannot accept Arbitron weighting. To me, it’s like alchemy or turning water into wine. How can they automatically assume that an audience should be doubled if an area or ethnic group doesn’t submit enough diaries? That don't be right. As ever - Jerry
Jerry: Thanks for the comment, I appreciate that.
You may think that statistical weighting of a sample “don’t be right,” but the fact is that it is done all the time. If weighting is done correctly and doesn’t involve a huge “multiplier,” there is nothing wrong with the procedure. However, this doesn’t mean that every weighting situation is good. Allow me to explain.
Assume that you need 100 25-34 year-old females in a study, but can find only 25. This means that the 50 in-tab respondents will be weighted three times to achieve the required 100 respondents. The procedure doesn’t simply multiply each of the 50 in-tabs times three. What happens in that one person is randomly selected and the responses are tripled. That person is put back into the sample for the next selection (known as “sampling with replacement”). In other words, one person’s responses may be used more than once.
The potential problem is that the 25 original in-tab respondents may not reflect the population from which they were drawn. They may all be “outliers,” or respondents who are significantly different from the “average” person in the population. In this case, the weighting would be performed on “bad” respondents and the final data could be affected significantly.
For example, let’s say that none of the original 50 women listened to a Hot AC radio station. The results would show no listening to Hot AC radio stations. That don’t be right. However, you can rest a little easy because that example is something that may never happen. But, you could have some weird results if the weighting gets out of hand—a severe shortfall of respondents where the weighting is high (multiplied by 2, 3, 4 or more). Then it don’t be right.
Why Diaries?
I have several questions:
1. Arbitron spends a lot of money, time, effort and resources getting folks to, who are “representative” of a given market to keep diaries for a week. From this information, many decisions are made about people’s jobs and advertising bucks. Why?
2. It seems to me, the only real representation Arbitron gets is from a group of obsessive-compulsive types who are willing to keep a diary of their listening for $2 and then mail it back.
3. Why not just use a phone process whereby a stratified random sample is called and asked which stations they listen to MOST. This seems to be how Gallup does it with presidential elections and they have an error margin of about plus/minus 3 percentage points. I'm thinking this may be more accurate.
4. A related question: If cume is simply how many people listened to a station at least one time (maybe only for a couple of minutes), why is it so important? Again, wouldn't a better question be (i.e., in a phone survey) "Which station(s) do you listen to most?” - Just Wondering
JW: I numbered your paragraphs to make it easier to answer. Your questions fall into the category of “I wish I had a dollar for every time I’m asked this.” But that’s OK, I don’t mind.
Question 1: There is no reason to be surprised by the fact that Arbitron data affect jobs and advertising dollars. The same thing happens in virtually every industry on the planet. All industries have their “Arbitron” to tell them how they are doing—and virtually all industries complain about how the data are collected. You need to get used to the idea that jobs and advertising expenditures (and more) are usually based on performance. (I’m not going to address the problem of the “Peter Principle,” which contends that people rise to their level of incompetence.)
Anyway, love it or hate it, Arbitron is the standard by which radio audiences are measured. Since I started this column in January 2000, I have received dozens of questions about Arbitron. Most of these questions relate to the same underlying theme—Arbitron sucks and there must be a better way to collect radio listening information.
Question 2: Please don’t take this as me being rude, but whenever I hear a “factual” statement prefaced by “It seems to me . . .” I automatically question it. If you say that’s only your opinion and go no further, that’s OK because I can’t question your opinions. However, if you’re saying that “…the only real representation Arbitron gets is from a group of obsessive-compulsive types” as a statement of “fact,” then I have a real problem—I can question your “facts.”
If you think your statement is true, then prove it. While you’re at it, define “obsessive-compulsive types.” All of the information that I have seen indicates that Arbitron samples are pretty good. Are they perfect? Heck no, because Arbitron research deals with human beings. There are going to be errors occasionally. Does an occasional error make the Arbitron process useless? I don’t think so.
Your task, if you feel that the samples are bad is to prove it. Merely saying “it seems to me” means absolutely nothing to me. Zero. Nada. Your statement about Arbitron is as useful as a person telling me, “It seems like the sun is only about 10 million miles away from the Earth.” It may seem like it, but it isn’t. If you think differently, then prove it. If you can prove it, then I’ll listen to your arguments. That’s one of the neat things about scientific research—it is self-correcting.
Question 3: You’re comparing a radio survey with a political survey, which is apples and oranges in my book. Sure, radio surveys could be conducted that have people “vote” for their favorite radio station (listen to MOST often), but people in the radio business want more information than that. They want to know how long people listened, when they listened, and so on. A simple radio station popularity vote would not provide this information.
In reference to your error margin . . . Arbitron could also have error margins of 3 percentage points if the people who pay for the research would be willing to pay for the additional sample. A “pat on the back” for Gallup’s low margin of error doesn’t say much. Gallup conducts political polls for a few weeks every four years, not every day of the week in over 200 markets. Gallup also has clients who are willing to pay for the sample size.
Question 4: Cume is only one piece of information available in Arbitron ratings. It merely tells how many people listened to a radio station during a given daypart. Total audience size is important in all mass media. Case closed.
In addition, you mention (once again) that the data should be collected by telephone.
This telephone retrieval method was discussed a few months ago in this column. Someone asked, “Why does AccuTrack list college stations higher than Arbitron? How different are the two in audience measuring?” This is similar to your comment, and this is what I posted:
I assume you’re asking why college radio stations tend to perform better in AccuTrack than in Arbitron. This question has been debated for many years, so I checked with Bob Michaels at Arbitron to make sure that things haven’t changed from my last discussion of the situation. From Bob’s comments, the opinions about the ratings’ differences haven’t changed, and here’s the probable cause . . .
Arbitron uses diaries to collect radio listening information; AccuTrack uses telephone calls. Historically, younger people have demonstrated that they are more apt to volunteer for telephone research than are older people. As Bob Michaels said, “It was the same when Birch provided a similar service until 1991.” Now, since younger people are more likely to listen to college radio stations, it stands to reason that these stations would perform better in a telephone-based data collection methodology.
I’ll also include another point that Bob mentioned in reference to your question . . .
“One thing to note is how the younger targeted stations (Rock, CHR) do better across the country in telephone-based services. In Arbitron, it is possible to find a News station in one market with the lead, a CHR in another, and a rock station somewhere else. All one needs to do is look at the results to determine if they pass the ‘gut check.’ [That is] how likely is it that younger skewing stations do better in many markets in one service, yet another service (Arbitron) has a mix of stations/formats in the lead in those same markets? Could there be a favoritism towards younger stations in telephone based methodology?”
There is nothing wrong with using the telephone to collect data, but over the years, diaries have been shown to be more valid and reliable when it comes to radio ratings.
Finally, many of your concerns will probably disappear when the Personal People Meter replaces diaries. Wait and see. Thanks for writing.
Z-Scores and Arbitron
I was unable to find this answered in your archive. If you can handle another Z-score question without feeling like using a razor other than Occam’s, here goes:
How do I convert my Arbitron share to a Z-score? My guess is the score in the formula is my share. Yes? Mean is the average of all rated stations? Maybe? But how do I find the variance? I’ll trust my calculator to give me the square root once I find it, but I’m lost.
Also, can you average many shares (maybe a four book average) and convert that to a Z-scores the same way? Thanks! - Anonymous
Anon: Yes, I can handle another Z-score question and I won’t use a razor other than Occam’s. (I guess that's an insider-type research joke.) I think the answer is in “The Research Doctor” archive, but sometimes what is written doesn’t seem to relate to a specific question. So here goes…
1. How do I convert my Arbitron share to a Z-Score? To compute your radio station’s Z-score, you need to compute (as you suggest) the Z-scores for all the radio stations in your market.
Computing Z-scores is much easier if you use a spreadsheet, but you can still do it by hand if you want to. So, to compute the Z-scores, you must first compute the standard deviation (average difference) for your data set. To do that, you need to …
a. Compute the mean (average) share for all radio stations.
b. Subtract the mean from each radio station’s share (called the deviation score).
c. Square each deviation score and add them together.
d. Divide the sum of deviation scores by N – 1 (N = the number of radio stations in your list.) This number is the variance.
e. Compute the standard deviation by taking the square root of the variance.
OK, you now have the standard deviation for the set of Arbitron shares for the radio stations in your market. To compute each radio station’s Z-score, you need to…
a. Subtract the mean from each radio station’s share and divide that number by the standard deviation you just computed. (I prepared a sample Excel spreadsheet for you to see how to do this...click HERE to see it.
2. Can you average many shares (maybe a four book average) and convert that to Z-scores the same way? Hmm...If you’re asking if you can compute Z-scores for all the radio stations’ 4-book average, the answer is “yes.” If you’re asking if you can have a huge table with each radio station’s four numbers and then compute Z-scores for all those numbers, the answer is “yes, but I don’t know why you would do this.” The 4-book average is the most important number and there is no reason to compute individual Z-scores for each radio station’s individual 4-book numbers.
All Content © 2012 - Wimmer Research All Rights Reserved