Nolan Chart
Home Be a Columnist Logon Columns Survey FAQ Newsletter Contact Print Advertise Other

Left Coast Views
columnist: William Westmiller

Like This Article?
Thumb It!
5 thumbs so far

Topic: Presidential Campaign 2008
Making Election Polls and Sausage

You don't want to know how sausage is made. Making public opinion polls, particularly for elections, is even more dreadful. Here's how they're made and what they contain.
by William Westmiller
(Libertarian)
Thursday, January 3, 2008

Sausage is made by chopping up an assortment of body parts from one or more animals and stuffing them into their own intestinal lining. Election polls are made the same way: chopping up an assortment of voter sentiments and stuffing them into the pollster’s gut feelings.

Poll Check ListPublic opinion polls don’t claim to predict the results of any future election (good crystal balls are in short supply), but they do claim to use scientific methods to generate an accurate sample of current sentiments among people who may vote in the future.

The following is a "Primer on Polling" with highlights to indicate a few of the undocumented errors that creep into the effort, making it more an art than a science. Future articles will discuss the implications.

How to Poll

1.         No polling company conducts any kind of survey for free. They are paid by individuals, corporations, and universities who want some impression of general public opinion. In the case of political campaigns, the clients are candidates, political parties, and the news media. They buy questions. More often than not, they specify exactly how each question will be asked and what answers will be considered acceptable.

Here, we’re only interested in the surveys purchased by the mass media, which end up as fodder for national newspapers and broadcast network news. The sponsors, sometimes in concert, engage Rasmussen, Zogby, Gallup, Cook, ARG, and others (some have their own) to conduct a survey, which they then promote. The object is to stimulate excitement about a running "battle" that will give their advertisers more eyes and ears. That’s how the poll sponsors pay for the surveys. It’s considered a legitimate "news story" which generates its own excitement.

2.         For political contests, the polling company is only interested in talking to people who can vote. There are three ways to find voters and talk to them. Professional pollsters are about evenly split among the third and fourth sampling methods.

Foot Leather System (FLS?) entails actually going to people’s homes and asking them. This was how polling started, but is so expensive and time-consuming that the only "pollster" that tries to use it is the Census Bureau (long form). Fortunately for the commercial pollsters, the Bureau collects a lot of demographic information, which the pollsters get for free.

Registration Based Sampling (RBS) requires the services of a data aggregating firm, which buys voter registration names and addresses from each state’s Registrar of Voters. Then they buy digital directories from all the phone companies. Then they buy demographic files from driver’s license bureaus, or credit card companies, or use old Census Bureau files. The throw it all together. Technically, they merge, match, consolidate, and purge until they have an aggregation of useful data on lots of registered voters. They sell those data files to public opinion survey companies. The data is expensive and only as accurate as the data source and their own computer match programs.

All of the data is old by the time it gets to the pollsters. Every state has different laws and methods for maintaining registration data. Many don’t track party affiliation, others don’t record voting history, and some don’t even centralize the data at the state government level. The telephone number data doesn’t include unlisted numbers or cell phones, but it may have multiple "white page" listings for professionals (doctors and lawyers) or multiple lines in the same household. Demographic data (gender, age, ethnicity, residence location, and income category) may be missing or simply wrong. Pollsters have no choice but to assume that the aggregated data that they buy, usually quarterly, is entirely correct.

Random Digit Dialing (RDD) pollsters don’t buy any pre-screened data. Their computer just starts dialing made-up numbers for a particular area code or prefix. The computer might dial a thousand different numbers before they get an answer, which is then routed to their survey staff. Because they have no idea who is on the phone, the attendant has to go through a series of qualifying questions.

Although RDD pollsters don’t have the up-front costs of data aggregation, they pay dearly for the time their staff spends screening out hundreds of connected calls where there is no qualified voter. In any case, they are totally dependent on what the participant says is true. Academic studies have shown that participants frequently evade embarrassing questions ("Now, tell the truth: did you really vote in the last election?") and give themselves the benefit of the doubt on whether they’re even registered to vote.

3.         The sponsor of the poll wants to know how many people are likely to support a  set of candidates. The buyer may only want to know what "likely voters" think of the "leading candidates," broken down by party, with some "class" breakouts (sex, race, income, etc.), and maybe their issue inclinations.

The pollster will work with the sponsor to "design" the questions and select the "proper" answers (those that will be recorded in the caller files). Academic studies have shown that the wording and sequencing of the questions can have a major impact on the results, so smart sponsors "tailor" a poll to get the results that they believe are acceptable or important. They may exclude legally qualified candidates, set the order of names, rotate only part of the name list, or only offer the names of those they consider potential winners. The only "science" involved is the pollster’s effort to figure out how to satisfying their client-sponsor.

Candidates may have legitimate objectives in private polling, but many of them do "push polling" with an RDD computer system. The questions aren’t designed to get legitimate answers, but to seed suspicion or "push" the person toward the answer desired. ("Do you favor Joe Blow, the homeless child molester, or Mary Smith, the award-winning community leader?")

4.         Now, the pollster is ready to "dial for dollars." A computer system makes a random selection from the data records of known phone numbers and dials the call. Procedures vary, but if you don’t answer in three rings, you’ll probably never be called again. If your answering machine takes the call, the computer hangs up. If someone answers, they’ll receive a recorded message to "hold for an important call." If you hang up, thinking it’s just another marketing call, the computer won’t call back.

Since pollsters know that most voters won’t be home during the day, they tend to select time-zones for calls in the early evening. But, most pollsters only run their phone banks during the local working hours and shut down at local closing hours. Very few work weekends. Although their computers record the time information, none of them report the local times of the people who actually responded. There are dozens of reasons why daytime (or evening) poll queries are slanted, but there’s no way to know (unless you’re the sponsor) what day of the week, or local time of day, the pollster called, so there’s no way to know their bias.

If a person answers, and they wait, they’ll get connected to a member of the survey staff. After assurances that they’re not selling anything, they ask if you are the voter, or spouse, of the name that is in their RBS records (which has now popped up on the computer screen of the survey attendant). If  your child has answered the phone, or it’s the wrong name, or you’re so rude as to ask who is paying for their call, the attendant will say "Sorry to bother you," and hang up.

For the RDD method, the computer has probably gone through a thousand phone numbers and the phone attendant a dozen, just to find someone who is willing to be polled and says they are registered to vote. Now we can get down to business.

5.         If it’s an election poll, the first RDD questions are qualifiers ("Are you registered to vote?"; "Are you affiliated with a political party?"; "Did you vote in the last election?"). Depending on what the sponsor has requested, you may or may not qualify. If not, the attendant will say "Thank you very much" and hang up. Maybe you’re not sure about whether you’re still registered. Maybe you haven’t affiliated with any political party (pollsters might talk to "independents," if the sponsor is interested). Maybe you didn’t vote in the last election or you’ve registered since then. Or, maybe you just lie about all the questions, because you’re embarrassed to admit that you’re not a responsible citizen. One way or the other, you’ll either get a "Thank you" hang up, or the rest of the poll questions.

The RBS attendant assumes that all of the computer information is correct, but may try to verify the critical sponsor criteria with a few questions. If the sponsor is looking for "likely voters," the participant will get an appropriate question. For some states without party registration, they may ask whether the person is "certainly, probably, or likely" to vote in a particular party primary election. If the state being called holds caucuses, rather than primary elections, the attendant may ask for the voter’s prior history of participation and whether they are "certainly, probably, or likely" to participate. With either method, if you don’t qualify, the call is terminated with varying degrees of gentility.

6.         The main point of the election survey call is the candidate choice. There are at least five ways to do this and each pollster has their own preference. The least likely question is to simply ask "Who will you vote for?" Depending on how far away the election is, anywhere from 40 to 80 percent of respondents will not know who is running for an office (even President).

The second option is to read off a list of the candidates, giving the respondent an opportunity to say "Yes" or "No." Well, the attendant may not read off the complete list (particularly in a multi-candidate primary), but a list of the candidates that the sponsor wants to have mentioned. That may be the top three, or five, or those who are "credible" in the eyes of the sponsor. The form of the question is also important. Some pollsters ask "Which of these candidates will you vote for?" (undecided will be very large), others ask "If the election were held today, how would you vote?" (undecided still large), and others ask "As of today, who are you likely to vote for?" There are many variations, each with notable effects.

Academic studies show that people will pick the first candidate name they recognize, unless they already have a negative impression associated with that name. These are called "donkey voters," not to imply any party affiliation, but rather a lack of intelligent choice. Depending on the race, being listed first may produce as much as a 36% advantage. Therefore, the order of the list is important.

Most pollsters rotate the names, or at least instruct their attendants to do a random rotate the names. This is a boring job and "quality control" will vary enormously among companies. Maybe the attendant has a preferred candidate. Maybe he or she can’t pronounce a name very well (calling from India, in many cases). Or, maybe they’re just lazy. Those dispositions are assumed by the pollster to "balance out" over a large number of attendants and calls.

Some pollsters use a "selective rotation" for a long list of candidates. They’ll rotate the first three, then read off the remainder in alphabetical order (if the attendant ever gets to them). One "Yes" answer terminates the reading of the list.

So far, so good. But, pollsters have discovered a fairly large percentage of people will not pick any name. So, the attendants are instructed to ask whether they are "leaning toward," or have any "inclination" to vote for one of the candidates. Each company does it differently. Some record those responses as "leaning," and may even ask how "committed" the voter is to vote for that candidate. Others pollsters don’t even bother to differentiate, nor record, whether the "favored" name is just "I kind of like him/her." or "I love the candidate and will absolutely vote for him/her." Frequently, there’s no way to tell which method or prompt the pollster used to get the results they report.

Since the way that a candidate is identified affects the results, pollsters may use (for example) "Rudy Giuliani" as one of the choices. Or, to give more credibility "Mayor Rudy Giuliani," or "New York Mayor Rudy Giuliani" (note that no Mayor of any city has ever been nominated by any party in U.S. history). For less credibility, they might use "former Mayor Rudy Giuliani." Of course, none will use the current occupation, such as "Lawyer Rudy Giuliani" or "Lobbyist Rudy Giuliani." Every variation produces a different result for each of the candidates named.

7.         Now, we’re done, right? Nope. There are usually follow-up questions for responsive voters, dependent on what the sponsor has requested. Almost always (mainly for RDD pollsters), there are "demographic" questions: what’s your gender, age (or age range), education level, ethnicity, income, etc. Does the RBS data an average for a particular zip code? Does it apply to this voter? Do people tell the truth? Is the data used by RBS pollsters correct? Maybe, maybe not. But, that information is critical to the next step.

8.         After two, or three, or four days, the pollster has enough complete survey responses to meet the demands of the sponsor. But, they don’t just tally up the numbers; the data has to be screened.

All pollsters screen, "weigh", "screen", balance, or massage the responses, just to maintain some credibility that their call responses are somehow representative of the general population of registered or likely voters. They almost never report their demographic balancing methodology. Some won’t even bother to weigh the results.

DemographicsSome take demographic data from previous elections (derived from cross-references data or their own prior exit polls), which suggest the percent of voters in each category. Then, they "balance" their responses to match that distribution. They randomly "toss out" the respondents that don’t fit the previous voting ratios. If 25% of voters were over 65 at the last election (based on those polls), then it’s assumed that the same percentage will apply to the next election. Any responses over 25% of the sample who are over 65 will be ignored. The same applies for the other demographic criteria. Too many females? Toss them out. Too many Hispanics? Toss them out.

Each pollster has their own filtering process, which is a "corporate secret." In theory, the sponsors find out whether this works or not by looking at the polling company’s previous record. But, there are a hundred reasons (and excuses) why the polls don’t match up with reality. Therefore, it’s purely a matter of corporate reputation. Right or wrong, if the company is credible, sponsors have do decide whether the survey is worth the price they’re paying.

9.         Now, we’re done. Except for the "scientific" credibility. A random selection of telephone numbers is considered scientific, simply because it is presumably unbiased. Of course, the source and quality of the calling list is dependent upon all of the factors discussed above, which might be critically important to any real scientific report of some actual physical phenomenon. But, it’s all ignored. The calls made from any presumably random list that approximates the general population characteristics are considered "scientifically" valid: they are presumed to be representative of the whole population.

The balancing or filtering is considered scientifically valid because there is no other credible assumption, or at least none that’s been measured. There is no alternative method, so the one chosen is, by default, the most scientific alternative, whether or not it’s objectively true.

However, the "balancing" depends upon the very dubious assumption that the way people vote is at least partially dependent upon their gender, age, ethnicity, income, etc. Most pollsters know that isn’t true, but it would be nearly impossible to correlate the proportional results for the last election in any coherent way to the next election, since those future ratios are (necessarily) unknown. Pollsters simply assume that the demographic characteristics of past voters are a good "scientific criteria" for balancing the respondents in their current survey.

10.     Finally, we’re done. Except, we’ve skipped one step. When the sponsor orders the survey, they don’t specify how many people will be surveyed. What they actually "buy" is called a "confidence level," which dictates the number of people who need to be fully quizzed in order to attain a desired "margin of sample error" for the survey. The normal order is for 95% confidence and a three to six percent error margin.

Now, scientifically, you can take a presumably random, small sample of any large population and it will be pretty close to correct for the entire population. The number of samples determines the Margin of Sample Error for a particular confidence level.

For example, you can ask 1,000 people whether they believe the sun will rise tomorrow and determine, within three percentage points, what the entire world believes about the sun rising. The only catch is that the sampling has to be a totally random selection from among all the people in the world. Statistically, you will be 95% confident that the results are true for everyone in the world, within that margin of error. To be precise, it is the margin of sampling error, on the assumption that the sample is totally random. There is no way to determine what errors might have occurred in the sample selection itself.

Margin of ErrorElection pollsters usually report their sample sizes, which might range from 400 to 1000 "qualified" people, producing a six to three percent margin of sampling error. If one or the other is missing, you can use this simple chart to find the other number. Some pollsters may offer their clients a "deal," which only produces a 90% confidence level, where the margins are different. Since very few end consumers notice those percentages (and fewer yet know what they mean), the client gets "hard numbers" for a lower price. This smaller confidence level skews the results, but it’s important to the pollster, because he is interested in making a profit. The fewer calls he has to make, the better. The fewer attendants he has to hire, the better. The fewer questions he has to ask (or the shorter the list of candidate names), the better. Since no one audits those choices, the pollster’s decisions on these matters are a "trade secret."

Finally, the process is done. The sponsor gets their "scientific" survey results, writes up a story about the current status of the "horserace" and tells the world who is winning and who is losing. Almost none of their customers notice the margin of error. Those who do notice, might imagine that it represents the total error for all the percentages shown. If the total is off by only 4%, that’s no big deal. However, the error is actually double that amount (shown as a ± character that some might imagine is a symbol of divine intervention) and it applies to each of the numbers shown. So, the reported 21%, which appears to be a "hard number," is actually somewhere between 16 and 25 percent. But, that’s with 95% confidence. There’s a 5% chance that it isn’t within that reported error range at all. Here’s an example:

Poll Example

The only significant results are those above or below the "chance distribution" (1/4 for each candidate). The Reported Results strongly suggest that "John Smith" is leading, but the Actual Ranges of what might be true is plus or minus five percent. Which means that exactly the same poll can be showing that "Peter Smith" has Real Support above all the others and "John Smith" is in third place. The polling itself doesn’t say which one is true, it only stresses the median values of the error range. It is just as likely that "John Smith" would win the election (if it were held on that day) with 37% of the vote.

Before we finish, one additional note. The margin of error applies to the whole sample. It does not apply to any subset. For example, a survey of 600 voters might produce results for the entire field of national candidates. But, as soon as the sample is broken down into subsets - say only Republican voters or likely voters - the margin of error will increase. Rarely, if ever, do polling stories explain that the 21% reported support within a subset has a totally different (and much larger) error range, somewhere between 13 and 29 percent, or even wider, for various subsets of subsets.

All of these numbers are "scientific," statistical probabilities, well known to the pollsters, but rarely announced to the ultimate consumers (readers and viewers) of the published or broadcast polling reports. The "hard numbers" are the story, translated into "surges", "boomlets", and "fading" positions.

What Polls Measure

Asking "How would you vote, if the election were held today?" entails no commitment on the part of the respondent. The survey response is certainly not as critical as a final, absolute choice that will be legally recorded and counted in an election. It’s just a fleeting opinion, influenced by two factors.

Survey respondents will not pick a name they do not recognize (Huckleberry who?), so the response is primarily a matter of name identification. Given a list (in any order), the respondent is most likely to "select" the first name that they recognize. Which is why incumbents usually get re-elected. The next best name-recognition goes to those who have held or run for the same office before (e.g.: Edwards or McCain in prior presidential primaries). After that come those who have held a similar office (Governor, Senator, Representative, Mayor) and received wide national media coverage (e.g.: Rudy Giuliani is known to 98% of the population as "America’s Mayor"). Finally, television celebrities are likely to be widely recognized (e.g.: Fred Thompson’s appearances in Law and Order). For all practical purposes, the name identification of a national candidate is a result of major media coverage of their prior exploits. If all of the prospective candidates are equally well known, or unknown, the second factor comes into play.

People form impressions of candidates based on what they know. A favorable or unfavorable reaction may be frivolous or substantive. If the information is sparse, people will still have an impression, which may have nothing whatever to do with the candidate or their positions: he has a nice haircut, she has a warm smile, he or she "looks like" a President, etc. Not until the last few weeks of a campaign do any significant number of voters know anything substantive about challengers or even incumbents. But, they will have no reticence about picking a name that they associate with a favorable sentiment.

People will select a name they do not even know, if they have an unfavorable impression of the other candidates listed. In any election with two contenders, there will be as many people voting for a candidate as there will be those voting against the other candidate. Negative campaigning works because most people chose the "lesser of two evils," even if they have substantive disagreements with the candidate they choose. The same criteria applies to a multi-candidate poll. The respondents will pick the first name they hear and recognize, if they have a favorable impression. A pollster will never say "Let me read the rest of the list, just in case." If all of the other names have a negative impression for the respondent, they will pick the first unknown name on the list.

Given these two factors, the only useful polling information is the degree of name recognition and the favorable or unfavorable impressions of potential voters. Changes in the poll position over time will indicate the effects of those two factors. The time trend lines for candidate support may be the only real measure generated by polling: who is likely to headed toward an actual election victory. A candidate may be extremely well known at the beginning of a campaign and poll respondents may have a favorable, but superficial, impression. As time goes on, falling poll results for that candidate will indicate rising unfavorable impressions. A totally unknown candidate may get support simply because they have not generated any negative impressions. The ideal that every candidate seeks is to be universally known and liked. None of them ever achieve that objective.

Straw Polls and Internet Polls

For all their vague assumptions and faults, scientific polls do offer a "best guess" on the sentiments of large populations of voters. Straw polls tell us very little about how the general population will vote, but they do tell us about the political activists who participated in that specific poll. For clarity, a straw poll is taken at a particular location with actual people present, who casting a physical ballot. A straw poll is commonly conducted by an organization, usually a political party, as part of some formal, usually non-public, event. To some degree, it’s a promotional "added attraction" or an opportunity for participants to advise the organizers of their opinions. The results may reflect the current sentiments of active organizational members, but that’s all.

A straw poll of a dozen or even a thousand activists doesn’t reflect general voter sentiments, but it does reveal the enthusiasm for candidates among one group of activists who are likely in influence a very large number of average voters. The true measure for a straw poll is not percentages, but the actual numbers of votes relative to the population they claim to represent. Two hundred straw poll activists aren’t going to have much influence on a state with a million voters, but they can certainly impact a county or city with a few thousand potential voters. In other words, take it for what it’s really worth. Don’t try to arbitrarily extrapolate small results to large groups of voters.

Internet polls, like straw polls, engage "self-selected" participants. Because they don’t even pretend to be a random selection of the general population of voters, they don’t actually "represent" anyone except themselves. However, the fact that they have enough enthusiasm for a candidate to participate in open internet polls at least indicates some degree of interest. Unlike actual straw polls, the amount of effort required is a only a few minutes of searching and a few mouse clicks. There is no was to tell whether the participants are voters, nor whether they will be activists beyond that minimal effort. An internet poll is primarily of measure of enthusiasm for one candidate, relative to others, among the full set of internet users who visit that site.

It’s likely that scientific internet polling will become a valuable, if not indispensable, tool for public opinion polling firms. Current technology can limit votes to one per computer, which is a start. In the future, voter profiles will be matched with the person’s preferences, which can then be extrapolated to the general set of internet users (which has its own demographic characteristics). At some point, the internet will become universally accessible. It could become the only reliable method of polling and may even become the way that many people cast their absolute, final, legal vote for a candidate. That’s still a long way off.

Who Runs In The Race

Pollsters focus on the "horserace" among the candidates, but the people in the polling "race" are actually the voters themselves. What any candidate thinks about his chances is pure speculation and spin. Anecdotal stories from the "man in the street" as to which candidate is doing best has only a cursory relationship to reality.

There are, of course, fifty-one different "races" in our electoral system, including each of the state’s primary elections and the eventual general election. The nation starts with about 204 million eligible voters (USA 2004) in the stables. Only 142 million even bother to register, which is mandatory if they want to vote (this includes those states that allow election-day registration), so they’re not even at the starting line. In a national election for President, only four to thirty percent participate in any of the state primary contests, depending on whether it’s a caucus, the number of competing candidates, and where the state falls on the primary election schedule. There is no "national primary," although "Super Tuesday," when 20-28 states hold their primaries, comes close. In the final general election contest for President, about 121 million people will actually cast a ballot for one of the candidates. Up to seven percent will vote for more than one, or spoil their ballot.

The question that pollsters try to answer is actually: "How many people with a particular candidate preference will actually vote for them in a particular election?" That’s the "finish line" in the fifty primary "horseraces." Pollsters have no idea who will vote or how they’ll vote. They only know what people say they will do. Those sentiments and dispositions may change, daily or hourly, right up to the point where each of those individual voters actually pass the finish line and mark their ballots.

Polls are just polls. They are statistically "scientific" only if we ignore the dubious presumptions of the sample. There is no "horserace" that can be reliably measured. At best, it’s an art form; at worst a cruel fantasy. 

Related: Real Clear Politics or Very Fuzzy Fable?

Did you like this article?
If you did, Thumb It!
5 thumbs so far

2008 William Westmiller, all rights reserved.
Published: Thursday, January 3, 2008
Last modified: Saturday, February 2, 2008

The views expressed in this article are those of William Westmiller only and do not represent the views of Nolan Chart, LLC or its affiliates. William Westmiller is solely responsible for the contents of this article and is not an employee or otherwise affiliated with Nolan Chart, LLC in his/her role as a columnist.

Report violation by William Westmiller of Nolan Chart LLC's terms of use policy.


More Articles By William Westmiller

Be A Columnist
Tell A Friend About This Article

Reader Comments:

Want to comment on this article? Leave your comment here. Your email address is required to track your comment. However, we will neither publish your email address nor distribute it to other organizations or persons. The only reason we might use it would be if we needed to contact you regarding your comment. All comments are subject to our terms of use policy.

Leave A Comment

Your Name:  

Your Email Address*:  

Your Comment: