Validity

Roger Wimmer, Ph.D.

Wimmer Research

My “Research Doctor” column on All Access has given me an opportunity to understand how much some radio people know about research.  Although there are variations, the people who submit questions fall into one of three basic groups:

  1. Those who know nothing about research, admit it, and are willing to learn.

  2. Those who think they know a lot about research, most information is urban legend or myth passed down from other people.

  3. Those who know a fair amount about research.

After receiving and answering thousands of questions, my guess is that there are probably an equal number of people in each category.  However, what I find is that the people in Group 2 are the most argumentative and the least likely to accept the realities of research.  But that’s OK.  I’m not criticizing anyone here, just pointing out a fact.

However, the people in Group 2 are the most responsible for creating problems with research, both in research design and uses of research.  When asked an opinion about research, these people usually begin their comment with, “Well, it seems like . . .”  The “it seems like” comment is the problem because the comment is based on opinion, not fact—and opinions mean nothing when it comes to research.  (See the "It Seems Like” article on the Readings page.)

The people in Group 2 have all the answers about every element of research—uses of research, advantages and disadvantages of research methodologies, sampling procedures, screener and questionnaire design, data analysis, univariate and multivariate statistics, and interpretation of results.  As I mentioned, however, I have found that most of the comments (and misleading questions) from the people in Group 2 are wrong.

From the comments I receive, I know that many All Access subscribers read the column every day.  Because my space in the column is somewhat limited, I wanted to take this opportunity to expand on one area that has been discussed briefly in the column—validity, the umbrella under which research operates.

Validity

When most people talk about research, they usually use two terms—reliability and validity.  In most cases, these terms are thrown around loosely and most people don’t really know what they mean.  So let’s start with that.

Reliability.  Reliability in research refers to whether a research study or methodology produces consistent results (not the same, but consistent).  For example, if you conduct music tests with your listeners using a 1-7 rating scale and the tests consistently tell you which songs the respondents like and which they do not like, then your method is reliable.  If you get results that bounce all over the place from one study to the next, then your method may be unreliable.  (Although there may be other causes for the differences in song scores.)

Validity.  There are two types of validity—internal and external.  Internal validity refers to whether you are measuring what you think you are measuring.  For example, if you conduct a music test to gather respondents’ ratings of songs you play for them but after further investigation you find that the test actually collects respondents’ ratings of music tempo, then your method is invalid.

External validity refers to whether your research results can be generalized to respondents outside your sample.  If you conduct a research study and find that your results relate only to your sample and no one else, then you have a problem with external validity.  The goal of most research is to select a sample of people from a population, conduct a research study, and then generalize the results to the population.  If you can’t do that, then your research will be limited in use.

The remainder of this article concentrates on Internal and External Validity.  Because of copyright laws, I need to state this information draws heavily from Mass Media Research: An Introduction, 10th Edition (Wadsworth Cengage Learning, 2014), the college text book I wrote with Joe Dominick.

I realize there are many strange sounding words and phrases in this discussion, but you need to learn these things to get a better understanding of research.  Learning the language of research is a significant step in the process of understanding what research can and cannot do.  If you get confused with anything, please let me know.

Internal Validity

Conducting research involves control over the situation.  If researchers don’t control the entire process, there is no way to know if the results are “real” or the result of some unknown entity.  This is referred to as “ruling out plausible but incorrect explanations of results.”  The example I used earlier about music tests relates here.  You must be sure that your music test actually collects respondents’ perceptions of songs they hear, and nothing else.

The variables that create possible (plausible) but incorrect explanations of results are called artifacts (or extraneous variables or confounding variables).  The presence of one or more artifacts in a research study indicates a lack of internal validity and your study failed to investigate what it was supposed to investigate.

Artifacts in research may arise in many ways.  Some of the artifacts that can affect a study include:

1.   History.  Events that happen during a study may affect the respondents’ attitudes, opinions, and behavior.  For example, let’s assume that you conduct callout research for your currents and it takes two weeks to collect the data.  Many things can happen between the first day of your callout and the last day that may affect your scores.  For example, an artist may be featured on TV, or maybe an artist is arrested for drug possession, or anything else.  The time when a respondent listens to and rates your hooks may affect the person’s ratings.

History may also affect a telephone perceptual study in the same way.  That’s why it’s important to collect the responses as quickly as possible.  If the data collection process takes a long time (probably more than two weeks), then the respondents should be coded in reference to when they participated in the survey.  Banner points (column headings in tables) can be used to separate the respondents according to the time when they participated.  The point to keep in mind is that the potential to confound a study is increased as the time increases between when the first respondent is tested (or asked questions) and when the last person is tested.

2.  Maturation. A respondent’s biological and psychological characteristics change during the course of a study.  Even getting tired or hungry may influence how a respondent respond in a research study.  A good example of this situation is with music tests where some research companies test 600+ songs in one session.  It’s often easy to spot respondents who are bored with the testing process and this may affect their scores.

Another example of maturation is in focus groups.  If the moderator does not conduct the group properly, respondents will often display signs of boredom or anxiety.  In these cases, their responses may not be legitimate.

3.  Testing.  Testing itself may be an artifact.  Although not used frequently in radio, research using pretests and posttests can cause problems—a pretest may sensitize subjects to the material and improve their posttest scores regardless of the type of experimental treatment given to them.

For example, assume that you select a sample of your listeners and give them a test that asks them questions about your radio station.  You then show the respondents a few TV spots to find out if the spots are effective in communicating information about your radio station.  After viewing the TV spots, you give the respondents the same test they took before seeing the spots.

Let’s say that the test results show that the TV spots do increase your listeners’ knowledge of your radio station.  However, this may not be correct.  It may be that the respondents learned how to answer the questions when they first took the test and the TV spots had nothing to do with the increase in understanding of your radio station.

4.  Instrumentation.  This is also known as instrument decay and refers to the deterioration of research instruments or methods during a study—equipment may wear out, hooks may be prepared differently from the beginning to the end, and respondents may become more casual in recording their responses.

Another example of instrument decay relates to perceptual studies, whether they are conducted on the phone, on the Internet, or some other way.  To be most useful, a questionnaire must be uniform in its approach.  You can encounter instrument decay in perceptual research if your questionnaire uses a variety of ratings scales, includes ambiguous, misleading, or double-barreled questions, and many other things.  The design of a questionnaire is very important and it’s not as easy as most people think it is.

5.  Statistical regression.  This artifact may be present in a variety of ways.  It basically refers to the fact that items, concepts, or anything else that is rated either very high or very low tend to regress (go toward) the mean (average) of the group of items when the test or measurement is conducted another time.  This is evident in music tests where a high scoring song in one test may be rated lower (closer to the mean) in another test.  The regression toward the mean phenomenon has been recently introduced into the analysis of stocks.  It is common now to hear stock market analysts discuss the idea that leading stocks tend to fall (toward the mean) and under-performing stocks tend to rise (unless there are extenuating circumstances that create the rise or fall).

6.  Experimental mortality.  While any research project faces the possibility that subjects will drop out for one reason or another, the problem is compounded in long-term (longitudinal, panel, or tracking) studies.  This artifact will become more important in radio research as more radio stations use tracking studies and panel studies on their web sites.

If you ever plan to follow the same respondents for any length of time, you must consider that some people will drop out of your study.  If you want to track 100 listeners, then you’ll have to recruit 120 or more at the start of the study to account for those who will drop out.

7.  Sample selection.  The type of people included in a research project is obviously very important.  In most cases, it is necessary to ensure that the respondents are homogenous (similar) in many respects.  For example, it wouldn’t be very wise to include people who prefer hard rock music in a music test for a soft AC radio station (unless there was a specific reason).  Screeners for music tests and focus groups and screener questions for telephone studies usually are designed to ensure that the sample is somewhat homogeneous.

A continuing sampling problem I see (in both radio and non-radio research) is that clients demand unrealistic samples—the screening requirements make it almost impossible to find qualified respondents.  For example, a PD or consultant (or someone) asks for females 25-29 who are P1s to WAAA, listen to WAAA’s weekday morning show most often, cume WBBB’s morning show, select a specific music montage, participate in contests, and listen to the radio at least 4 hours a day (and so on).  These multi-level screeners define very small populations and the clients get upset when the research company can’t find qualified respondents.

Remember that you limit your potential sample with every screener requirement you include in your screener or questionnaire.  What you don’t want to do is “screen” yourself out of an audience.  If you make changes on your radio station based on the results of unrealistic samples, you will surely fade away in Arbitron.  Radio is a MASS medium, not a medium designed to entertain a handful of people.

8.  Demand characteristics.  This relates to a respondent’s reactions to a testing or data collection situation and is also referred to as “prestige bias.”  A respondent’s awareness of the testing or data collection procedure my influence how the person responds to questions.  For example, it is known that some respondents who recognize the purpose of a study may produce only "good" data for researchers.  Some respondents don’t want to appear uniformed or dumb, so they will provide answers they think the researcher wants to hear or see—the research situation “demanded” answers and the respondents will provide them.

9.  Experimenter bias.  Researchers can (knowingly or unknowingly) influence the results of a project by mistakes in observation, data recording, math computations, and interpretation.  Focus group moderators are particularly susceptible to influencing the responses of the people in the group.  (One way to identify a good moderator is to see how the person responds to respondents’ comments.  Good moderators are always neutral in their reactions—nothing affects them.)

Bias can also enter into any phase of a research project if the researcher is influenced by a client who wants a research study to produce specific results (this does happen).  The best thing a researcher can do is to ask the client not to discuss the intent of a research project beyond what information is needed to design the study and collect the data.

10.  Evaluation apprehension.  This is similar to demand characteristics, but emphasizes the point that respondents are usually afraid or hesitant about being measured or tested.  It is important for a researcher to do everything possible to ensure that the respondents are comfortable with the situation and not afraid to answer truthfully.  Sometimes this isn’t easy to do.

11.  Causal time order. The organization of a research project may affect respondents’ answers and interpretation of the data.  For example, in a focus group to test various types of direct mail, the respondents’ answers may vary if they are first shown several direct mail pieces and asked to rate them, or if the process is reversed and they discuss the good and bad points about direct mail before they rate sample pieces.

12.  Diffusion or imitation of treatments.  In situations where respondents participate at different times during one day or over several days, or where groups of respondents are studied one after another, respondents may have the opportunity to discuss the project with someone else and contaminate the research project.  This is a special problem with focus groups when one group leaves the focus room at the same time a new group enters.

These are some of the main sources that affect internal validity.  As you can see, designing and conducting a research project isn’t as simple as asking a few people some questions and then trying to figure out what they said.  It takes more than that.

Keep in mind that all scientific research is subject to error.  It is better to know this and attempt to reduce error than to be ignorant about it and conceal the errors.

External Validity

External validity refers how well the results of a study can be generalized to the population from which a sample was selected.  In other words, a study that lacks external validity cannot be projected to other situations; it is valid only for the sample tested.  Results from a music test with 100 respondents wouldn’t be very useful if the results couldn’t be generalized to other listeners.

There are three primary ways to help ensure external validity:

You should consider external validity in every phase of your research project, from initial discussions to the presentation of the results.  Always ask yourself something like, “Can I generalize these results beyond the sample?”  If your answer is “no,” then you need to redesign the project.

Summary

As I mentioned at the beginning of this article, research involves an understanding of many things in order to ensure that a study is valid and reliable.  There are many items to consider in project design, screener and questionnaire development, sampling, data collection, and data analysis.  If you don’t understand something about research ask, don’t just rely on someone who says “It seems like . . .”  Ask for facts, not opinions.

If you arrived here via a search engine, click here.