Mass Media Research: An Introduction - 9th Edition
Roger D. Wimmer & Joseph R. Dominick

z-scores

While there are dozens of statistical methods available to analyze data, the simple z-score is probably the most versatile.  The following questions and answers are from Roger Wimmer's The Research Doctor Archive (with some editing).  The questions are from media professionals.

 

z-scores - 1

You've referred to z-scores in previous posts on All Access. I've asked our research companies for them and they are forthcoming. But I have a dumb question: what are z-scores? - Chris


Chris: This is NOT a dumb question. I'll do the best I can here without including a bunch of formulas.


Whenever we conduct a test of any kind (e.g., tests in school, music tests, personality ratings, or even Arbitron ratings), we collect some type of scores. The next logical step is to give meaning to these numbers by comparing them to one another. For example, how does a vocabulary test score of 95 compare to a score 84? In a music test, how does a song score of 82 compare to a score of 72? And so on. Without these comparisons, the scores have no meaning.


Although there are different ways to compare scores to one another (e.g., percentile ranks), the best way is to determine each score's standard deviation (or "average difference") above or below the mean of the total group of scores. This placement in standard deviation units above or below the mean is called a z-score, or standard score.


But wait. What is standard deviation? To understand that, you need to understand another term—variance. In simple terms, variance indicates the amount of difference that exists in a set of scores or data—how much the data vary.  If the variance, which differs from one test to another, is large, it means that the respondents did not agree very much on whatever it was they were rating or scoring.  Obviously, if the variance is small, the respondents agreed (were similar) in their ratings or scores.


The standard deviation is the square root of the variance. The advantage of the standard deviation is that it is expressed in the same units as the original scores. For example, if you test something using a 10-point scale, the standard deviation will exist somewhere between 1 and 10; a 65-point test will have a standard deviation between 1 and 65. And so on.


The standard deviation (SD) is important because it is used in calculating z-scores. What we need are symbols for Score (X), Mean
(M), and standard deviation (SD).  With those symbols, the z-score formula is X - M / SD, or subtract the Mean of the group of scores from each individual score, and divide by the standard deviation.  The typical way the z-score formula is shown is:

 

Z =

X- M

 

SD

 

All z-scores have a mean of zero and standard deviation of 1, and ranges (roughly) between –3.00 and +3.00. A z-score of "0" is average; a positive z-score is above average; a negative z-score is below average.  Here is a picture of a normal curve, showing that about 68% of a sample falls between -1 and +1 standard deviations, and about 98% fall between -2 and +2 standard deviations.

 

 

 

z-scores relate to the normal curve (the bell curve you may remember from when your teacher said that he/she was going to "curve" the test scores). Because of this, we know where things "stand" when using z-scores. For example, about 68% off your scores for a test will fall between –1.00 and +1.00 z-scores (standard deviations) from the mean.


z-scores allow you to compare "apples to oranges." For example, when you conduct a music test, you can't compare the raw scores of the males to the raw scores of the females, or one age cell to another. But you can with z-scores. In addition, if you compute z-scores for a music test, you can compare the scores from one market to scores in another market. You can't do this with the raw scores (regardless of the rating scale you use).


By the way, z-scores are known as standard scores because they are computed by transforming your scores into another metric by using the standard deviation. Get it? standard-ized scores. This procedure is known as a monotonic transformation . . . that is, all of your scores were transformed to another form using the same (monotonic) z-score formula.


That's just an introduction to z-scores. There is a lot more to the whole process in reference to interpreting these scores. I suggest that you refer to any statistics or research book (including mine) to find out more about the procedure. In addition, you can find out a great deal more on the Internet. For example, use a search engine and search for "z-scores." You'll find many good references, one of which will compute z-scores from the data you input. For example, click here.

 

By the way, you can get a very interesting look at your market by converting all of the radio stations' shares to z-scores. This allows you to see where each station stands on the market's normal curve.

 

z-scores - 2

In a recent post about z-scores you said you can't compare raw data of males against females but you could with z-scores. Why? And is this also true when comparing core artists and non-core artists? - Anonymous


Anon: The reason that you can't compare the raw scores of one group to another, whether it's males, females, age groups, county of residence, or anything else, is that each group tends to have its own way of rating things.


For example, assume that you use a 7-point scale for your music test. You look at a song and see that the men rate the song as a 6.2, while the women rate it a 7.1. Your first reaction is that the women liked the song more that the men since their average score is higher. But this may not be true. The women may have rated all the songs higher than the men, and therefore a 7.1 is actually lower than the men's 6.2. z-scores will show you this.


This is true regardless of which cell you are looking at or whether it's core artists, non-core artists, train wrecks, or anything else. You should never compare raw scores of one cell to another cell. Case closed. You need to compute z-scores in order to make head-to-head comparisons. This is true because z-scores transform the data to the same metric—the data have a mean of zero and standard deviation of 1.0. If you don't compute z-scores and compare the raw scores, you're just wasting your time. (I have been preaching this since about 1982.)

 

z-scores - 3

Doc, I enjoy this column. I learn a lot about a lot. Thanks. Here's my question: I'm in journalism school and have a professor who insists that we memorize the formula for ‘z-scores.' (The z-score formula - Score minus the mean divided by the standard deviation.). Can you tell me briefly where z-scores are used in the real world and why anybody would ever want to memorize the formula? Thanks. - Anonymous

 

Anon: Thanks for the comments. I'm happy you're learning . . . so am I.

 

In my opinion, the varieties of teaching approaches fall into two categories: theoretical and practical. These categories relate to all types of teaching—whether teaching students in a classroom, teaching a son or daughter, or teaching friend, colleague, or other relative.

 

The theoretical approach is probably best demonstrated in martial arts movies where the old and wise (seemingly near dead) "Master" passes on wisdom to his students with phrases such as, "As the soldier ant steadfastly protects its winter food storage from unwanted alien procurement, so should you protect your earthly possessions." What the heck is that? The Master could use the practical approach and say, "Kick the rear end of anyone who attempts to take your stuff."

 

Both teaching approaches are fine—it depends on the goal of the teacher. The theoretical approach is appropriate when the teacher wants a person to "work" for the answer. I believe that this "work" is essential to help people learn how to synthesize information and transfer knowledge from one area to another. However, it's sometimes more efficient to eliminate the work and use the direct practical approach (Such as "Listen to me now and believe me later.")

 

My guess is that your professor is simply using the practical approach with the z-score formula to prepare you for the "real world." Since research is so pervasive, I'm sure your professor knows that you will encounter z-scores in whatever media area you choose.

 

Why else is this stuff important? Well . . . research is a complicated process that includes strange looking words, terms, and formulas. A good researcher takes this complicated stuff and presents it in "English" to those who may not understand it but want to use the information in decision-making. With that in mind, then, I believe that z-scores are the best way to simplify a complicated data set. In the "real world" they are used in all areas of media research.

 

There are two reasons for you to memorize the z-score formula (and others).

 

1. You need to memorize some basic concepts to eliminate the need to refer to a textbook.

 

2. Your professor said so. Listen to your professor now and believe him/her later.

 

z-scores - 4

You seem to stress z-scores a lot in interpreting research results. Why is that? And why doesn't everybody do that? - Kurt


Kurt: Your question allows me to thank publicly for the first time my two research mentors. The first person is Dr. Charles Larson, my Master's Degree advisor and thesis director. The second is Dr. Ray Tucker, my Ph.D. advisor and dissertation director. These Professors are two of the smartest people I know, and they taught me a countless number of things about science and research.


One thing that both men forced me to understand was "Ockham's Razor," which states that the "simplest way is usually the best." (That's a paraphrase of William of Ockham's statement.)


When it comes to simplicity of interpreting data, nothing can be easier than z-scores. The statistic makes even the most impossible looking data set easy to "see," understand, and interpret. The statistic is also easy to compute . . . it can be computed on an Excel spreadsheet.


I hammer away on z-scores all the time because of Ockham's Razor. I can't understand why people want to make data interpretation so difficult. Even GMs understand z-scores!


As to "why everybody doesn't use z-scores," I'm not who you mean by everybody. Everybody as in researchers? I can't answer that. But I can say that I don't understand why, for example, a radio station PD would conduct a music test with 600+ songs and then try to figure out what the hell the numbers mean—is the song in? Out? A? B? Lunar? Venus? Pluto? I don't get it. Let the z-scores do it. PDs (and others) who interpret, sort, deliberate, and bang their heads against the wall over music test scores have enough other stuff on their plate. Let the z-scores do it.


Each of us will leave some type of legacy in the radio industry (and personal lives too). On my urn will be engraved three things. One will say, "He stressed simplicity." And that, Kurt, is why since 1976 I have been banging away on z-scores. It's as simple as that.

 

z-scores - 5

I understand what you're saying about having to convert to z-scores given the different samples because one group's median score would be different than the other. However, I don't fully understand your question about computational procedures used in forming the clusters. – Andy

 

Andy: Let me correct you about something—z-scores have nothing to do with the median—they are computed from the mean (average). The median is the midpoint of the distribution, where 50% of the scores are above and 50% are below.

 

Now, you mentioned in an earlier question that your scores fell when you put them in their respective clusters. I don't understand how that can happen—and maybe it's due to the way the clusters are put together. That's what I'm asking: "How do you form the clusters you refer to in your question?" I gots ta know!

 

z-scores - 6

Andy again. No, the scores are from different samples using the same screening procedure. The 3.5 and 2.3 scores I referred to we're Total sample scores. When I referred to clusters, I was talking about A) 80's and B) currents.

 

Andy: You can't compare them since the scores are from different samples. You must convert to z-scores. Case closed. The same goes for your clusters, but I still don't understand the computational procedures used to produce the clusters.

 

z-scores - 7

In your formula X (Score) – M (Mean Score) / SD (Standard Deviance) does a larger standard deviance mean that the raw score is less reliable? Secondly, when figuring the SD using the square root of the variance, is the variance calculated using only the scores and mean score of the song in question? - Anonymous


Anon: First, let me correct one thing that you wrote—SD stands for Standard Deviation, not deviance. Your term refers to what radio people do at conventions.


Question 1: You are very perceptive whoever you are. You are correct in saying that a larger standard deviation means that the raw score is less reliable. In reference to what we're talking about here, though, we are using the SD from all the songs combined—but your statement is still correct. If you conduct a music test and find that your overall Standard Deviation for all songs is very large (you'll need to compare a few tests to find out what is normal for your own tests), then something, as they say in Atlanta, "Don't be right." (There isn't much agreement among the respondents.)


Question 2: While you can compute the variance and standard deviation for each song in your test (that's the primary way to determine fatigue, by the way), we're talking about an overall variance and SD for all the songs in your test.


What you need to do is compute the variance and standard deviation for all songs. Then you compute z-scores for each song from those two pieces of information. (Variance is computed by subtracting the overall mean from each score. This is called the "deviation score." You do this for every song. You then square each of those deviation scores and add them all together. Finally, you divide that number by N-1 (N = your sample size). The square root of that number is your Standard Deviation. Simple . . . just like going to Wisconsin.

 

 

z-scores - 9

Hi Doc, I lost your answers to previous questions similar to mine so I hope you will forgive if I ask again. You frequently tout z-scores as the only way to compare "apples and oranges" in research numbers. Would you recommend using z-scores to compare the results of traditional callout to online music research?

 

If so, would you again please share your formula for computing z-scores?

 

Finally, what are your thoughts on using weighted positive scores in music research? Thanks. - Anonymous

 

Anon: For your information, there are several z-score questions and answers in the "Research Dr." book—including the formula. However, I'm happy to answer your questions again.

 

First, yes I recommend using z-scores to compare the results of traditional callout to online music research. There is no other way to compare these scores.

 

Second, the z-score formula is: z= Score-Mean/Standard Deviation, where "score" refers to the individual song score, "mean" is the mean or average of all the songs in your list, and "standard deviation" is the standard deviation of all the songs in your list. The "/" means "divided by."

 

Third, my thoughts on using weighted positive scores? If you use some type of metric for your test…a scale of 1-5, 1-7, 1-10, etc., you shouldn't have to weight anything. These scales are "self-weighting" because they are ratio data. In other words, a "5" is five times better than a "1" and so on. It's already weighted.

 

There isn't anything wrong with weighting the positive scores. It's just a waste of time because regardless of how you weight the scores (2 times the highest score or any other procedure), you will not change the ranking of the songs. If you weight all the scores the same way, you are performing what's called a "monotonic transformation" of the data. That is, all the data are changed the same way…therefore, the ranking will not change.

 

The songs that are liked a lot or rated highly will show regardless of whether you use the raw data or some type of weighting formula. It's a pure waste of time and means nothing. You might as well spend the time on the New York Times crossword puzzle.

 

I have seen some very strange weighting formulas. Our computer guru came up with this one after he saw a convoluted formula a person wanted to use for weighting his music test:

 

score={Sqrt(6s)*Arcsine(7s)}/{pi}*log(6s+7s)*Cosine(1s+2s)}*{7s^2+6s^3}/{sqrt(4s+5s)}

 

That's the best weighting formula I have ever seen.

 

z-score Question

In the past you have said that a song with a negative z-score would merit consideration for being dropped from the playlist. Give or take some wild fluctuations it seems to me that approximately half of the songs will have positive z-scores and approximately half will have negative scores.  If that’s true, does it not stand to reason that for every two songs you test that one will have a positive score on average? Tell me what I’m missing because, with this line of reasoning, it appears you could test an infinite number of titles and get positive scores (albeit probably not high positives) on titles no one has ever heard of. - Andy

 

Andy:  Yes, I have said that a song with a negative z-score would merit consideration for being dropped from the playlist.  But consideration only, not absolutely drop because you need to look at the raw scores first.

 

You are correct in saying that when using z-scores, 50% of the songs will be positive and 50% will be negative.  That’s because z-scores are computed using the mean for the group of songs.  And that’s why it’s important to look at the raw scores.  Let’s say that you use a 1-10 scale for respondents to rate your songs.  Theoretically, it’s possible for NO song to receive a score of less than, say, 9.0.  Even in this case, 50% of the songs will be negative.  And in this case, you probably wouldn’t eliminate any songs from your playlist.

 

z-scores show you how each song performs against all of the other songs in your test and will usually range between –3.00 and +3.00.  The z-scores show you the performance of the songs, but you need to look at the raw scores to understand the how the songs were rated.  In other words, you don’t automatically eliminate songs because of a negative z-score.  You must also look at the raw scores.

 

I don’t understand what you mean in your last sentence….that you will get positive scores on titles no one has heard of…  In all music tests that I know about, respondents do not rate unfamiliar songs.

 

z-scores Use

I read your answer to the question about up and down ratings and it got me curious about z-scores.  I went thru your Research Doctor Archive and feel like I now have general idea about them and am going to convert our last couple of ratings books and see how they look.

 

But here's my question.  I have been in radio 20 years now and you are the ONLY person I have ever heard talk about z-scores.  I have worked for some of the big companies and been under some pretty well known Sr. VPs of programming, consultants, OMs, etc., and none has ever spoken of z-scores (and a couple are regarded as research experts!).

 

In your experience, why are companies and executives reluctant to use z-scores as a tool?  We start doing extrapolations the second we get a trend, which of course are unweighted, and generally unreliable, but we won't use this formula to more accurately look at our ratings.  What gives? - Anonymous

 

Anon: Interesting question.  I'll try not to get carried away here.

 

First, I'm glad you mentioned the Research Doctor Archive because there are several questions about z-scores here.  If there is anyone reading this question and answer who doesn't know about z-scores, you may want to read the material on the Archive first and then come back to this column.  OK, on to your questions . . .

 

I can't remember when I first learned about z-scores (also called Standard Scores or Standardized Scores), but I think it was probably in my first statistics class in undergraduate school.  Since that time, I have used z-scores for over 30 years in virtually all of my media research, including just about every study I have ever conducted for radio.  (By the way, z-scores are also called Standard Scores or Standardized Scores because the data are divided by the standard deviation of the data set.)

 

Why?  Because the z-score is probably the most useful versatile statistic there is and it's very easy to learn how to interpret them.  In fact, I think an average person can learn how to interpret z-scores in less than 20 seconds.  OK, I know that some readers won't believe that, so here is an experiment to prove my point.

 

I'll explain z-scores in the next paragraph.  If you don't know what z-scores are, look at a watch or clock with a second hand, or use a stopwatch if you have one, and time how long it takes you to read this sentence.  Ready?  Go . . .

 

z-scores have an effective range of -3.00 to +3.00.  An average z-score is ZERO.  A negative z-score indicates that the item/element is below average and a positive z-score means that the item/element in above average. When teachers say they are going to "curve" the test, they do this by computing z-scores for the students' test scores.

 

Done.  That's all there is to it, and computing z-scores is just as easy.  The formula is in the Research Doctor Archive, but it's: z = Score-Mean/Standard Deviation (score minus the mean divided by the standard deviation).  What's even better is z-scores can be computed on Excel and other spreadsheets in a matter of seconds.  A while ago, a reader asked me to setup a simple z-score Excel file so he could see the calculations.  I did that and you can see it by clicking here.

 

During my career, I have found that every PD, GM, OM, and all the other people involved in radio research enjoy having data presented with z-scores because they are easy to interpret and there is no need to "haggle" about which numbers (ratings or scores) are good or bad (good scores are positive and bad scores are negative).  In the thousands of studies I have conducted, I have never heard one person say that z-scores didn't help in interpreting the research information.  Not one.  As I said earlier, the z-score is probably the most useful and versatile statistic of all statistical calculations available.

 

Why are z-scores so useful?  Because they allow us to compare one data set to another, and as anyone in radio knows, we are always comparing data sets—one Arbitron book to another, one music test to another, men vs. women, young vs. old, one station to another, and so on.  The thing is, you can't make any of these comparisons without converting the data to z-scores.  Never.  Not under any circumstances.  Not even for the "heck of it."  Not even, well, that should be enough.

 

But knowing that persuasion is most successful when the message is repeated, I'll repeat what I just said in a different form.  The following list includes examples of data sets that can never under any circumstances be compared unless the data are converted to z-scores:

  1. One Arbitron book to another.

  2. One music test to another.

  3. One telephone perceptual study to another.

  4. Men vs. Women

  5. One age cell to another.

  6. One market to another.

  7. One radio station to another radio station.

  8. On-air personalities from one study to another.

  9. Callout results from one period to another.

  10. Any other comparison of ratings, scores, or evaluations.

OK, so now that I tried to summarize the importance of z-scores, I need to try to answer your question about why other researchers, consultants, VPs, and other don't know anything about them and never mention them.

 

My answer is, "I don't know, but I'll take a few guesses."  First, I think there are two reasons that can't be why z-scores are used by other companies, researchers, and consultant:

  1. It can't be because z-scores are a secret.  z-scores are mentioned in virtually every introductory statistics book, and there are a few hundred thousand mentions of z-scores on the Internet.

  2. It can't be because z-scores are difficult to compute and interpret.  As I explained earlier, z-scores can be computed either by hand or by spreadsheet in a matter of seconds and it takes less than 20 seconds to understand how to interpret them.

OK, so what other options are there?  I think the reasons why other companies and people don't use z-scores include, but are not limited to:

  1. Many/most of the people who call themselves "researchers" or regarded as "research experts" (as you say) don't have research backgrounds and don't know much, if anything, about research, statistics, sampling, data analysis, or interpretation.  z-scores are just another statistical procedure these people know nothing about, don't intend to learn, or don't know how to learn about them.

  2. Many/most of the people who are VPs of Programming don't have research backgrounds and don't know much, if anything, about research, statistics, sampling, data analysis, or interpretation.  z-scores are just another statistical procedure these people know nothing about, don't intend to learn, or don't know how to learn about them.

  3. Many/most of the people who are consultants don't have research backgrounds and don't know much, if anything, about research, statistics, sampling, data analysis, or interpretation.  z-scores are just another statistical procedure these people know nothing about, don't intend to learn, or don't know how to learn about them.

  4. Many/most of the people who run radio companies (Owners, GMs, OMs, PDs, etc.) don't have research backgrounds and don't know much, if anything, about research, statistics, sampling, data analysis, or interpretation.  z-scores are just another statistical procedure these people know nothing about, don't intend to learn, or don't know how to learn about them.

OK, I know I repeated the same thing four times, but I wanted to make sure that my point was clear.

 

In summary, I'll say this:  Running a radio station (or any business) involves a great deal of decision making, and I think it's important for decision-makers to have the best information available to make decision making easier.  With that in mind, I steadfastly believe that my job as a researcher is to provide decision makers with easy-to-understand information to make their job easier.  z-scores are one of the ways I accomplish my job and I amazed that all decision makers don't demand z-scores from those who provide them with information.

 

z-scores Use Comment

Doc: I read with interest the recent discussion of z-scores.  Our company uses them to compare data sets of all types.  I have an observation, having been in either the radio or research business for about 30 years—no empirical evidence, mind you, just my observation.

 

For some reason, many radio professionals don't want to believe research.  Because many folks cling to their "intuitive" side in radio, and especially when the research doesn't agree with their personal beliefs, they find a way to discount the data.  I think that maybe another reason some might not use a practical tool like z-scores is that they are just too logical in an often-illogical business, and they would have to admit they were "wrong."  Perhaps you've seen the same thing.  Sorry for the long-winded rant. - Anonymous

 

Anon: The only company I know that frequently uses z-scores for data analysis is Cox, the company I worked for years ago.  If you're not with Cox, then now I know there are at least two companies that frequently use the approach.

 

I do think there is a lot of merit to your observation, even though you don't have any empirical evidence to back it up.  I have also found that many decision-makers tend to lean on their personal perceptions, whether they are right or wrong.  I don't understand why so many people are deathly afraid to be wrong.

 

Using z-scores in data analysis makes interpretation extremely easy.  In fact, I usually color-code z-scores to make interpretation almost foolproof—Items or elements with z-scores below zero are RED and z-scores above zero are GREEN.  Even someone with no research experience can understand that red is below average and green is above average, unless the person is colorblind.  (I also sometimes use black for a range of average scores—usually between -1.01 and +1.01.)

 

©2009 Roger D. Wimmer

Mass Media Research: An Introduction, 9th Edition, Home Page