Correction for Measurement Errors in Survey Research: Necessary and Possible

Survey research is the most frequently used data collection method in many disciplines. Nearly, everybody agrees that such data contain serious measurement errors. However, only few researchers try to correct for them. If the measurement errors in the variables vary, the comparison of the sizes of effects of these variables on each other will be wrong. If the sizes of the measurement errors are different across countries, cross-national comparisons of relationships between variables cannot be made. There is ample evidence for these differences in measurement errors across variables, methods and countries (Saris and Gallhofer in Design, evaluation and analysis of questionnaires for survey. Wiley, Hoboken, 2007; Oberski in Measurement errors in comparative surveys. PhD thesis, University of Tilburg, 2011). Therefore, correction for measurement errors is essential for the social sciences. The correction for measurement errors can be made in a simple way, but it requires that the sizes of the error variances are known for all observed variables. Many experiments are carried out to determine the quality of questions. The relationship between the quality and the characteristics of the questions has been studied. Because this relationship is rather strong, one can also predict the quality of new questions. A program SQP has been developed to predict the quality of questions. Using this program, the quality of the questions (complement of error variance) can be obtained for nearly all questions measuring subjective concepts. For objective variables, other research needs to be used (e.g., Alwin in Margins of error: a study of reliability in survey measurement. Wiley, Hoboken, 2007). Using these two sources of information, making correction for measurement error in survey research is possible. We illustrate here that correction for measurement errors can and should be performed.


Introduction
Most studies that require information of individuals about values, attitudes, opinions, evaluations, feelings, preferences, expectations, status, occupation, education, income and behavior rely on interviews or questionnaires. As a consequence, survey research is the most frequently used method for collecting data in Sociology, Political Science, Communication Science and Marketing Research (Saris and Gallhofer 2007).
The effects that the wording of survey questions can have on their responses have been studied by many researchers, to mention some important contributions: Belson (1981), Schuman and Presser (1981), Sudman and Bradburn (1982), Andrews (1984), Alwin and Krosnick (1991), Molenaar (1986), Költringer (1993), Scherpenzeel (1995), Tourangeau et al (2000), Dillman (2000), Alwin (2007), Saris and Gallhofer (2007) and Biemer (2011). In all these studies, the researchers indicate that the formulation of the questions has a considerable effect on the results one obtains. That is the same as saying that there is a considerable error in survey measurement, even though in many cases we do not know what the true values of the variables we want to measure are.
While these studies are very well known to the research community and it is a very common opinion that survey data contain a lot of measurement errors, only very few researchers try to correct for these errors. To illustrate this point, we have collected information for a number of important journals with respect to the frequency of use of survey research, the attention paid to measurement errors and the correction for these errors. Table 1 summarizes these results.
This table shows not only how important survey research is in the chosen journals but also how little attention has been paid to measurement problems in these journals, let alone that correction of the measurement errors has been performed. 1 One may wonder how this lack of attention to measurement errors can go together with the general idea that survey research contains a lot of errors as has been shown by the above-mentioned studies. We can see three main possible explanations: 1. The effect of the measurement errors and their consequences are relatively small, so they can be ignored. 2. The procedures to correct for measurement errors are so complex or expensive that in most research these corrections cannot be performed. 3. The estimates of the measurement error variances or the complement of that, the quality of questions, are not available, and so correction is not possible.
In this paper, we would like to discuss these three possible explanations. We want to show that the effects of measurement errors are considerable and cannot be ignored, that corrections can be made easily and that nowadays estimates of the size of the errors variances or the quality of questions are available. As a consequence, we think that all researchers can, but also should, correct for measurement errors in order to provide believable results of their research. This is what we want to show in this paper. We will discuss the three issues in sequence and, then, come back to the general conclusions.

Can measurement Errors in Survey Research be Ignored?
In several studies (Andrews 1984, Alwin and Krosnick 1991, Költringer 1995, Scherpenzeel 1995and Saris and Gallhofer 2007, it has been shown that the measurement errors in survey questions are considerable. Alwin (2007) suggests that 50 % of the variance of the observed variables in survey research is due to errors. So there is a considerable difference between the variable one likes to measure and the one that is really measured with the question. This difference has a big effect on conclusions of research. It is a fundamental problem of these sciences, as we will now demonstrate.
Imagine that we would like to know the strength of relationship between two opinions (the variables of interest); for example, job satisfaction (f 1 ) and life satisfaction (f 2 ) represented by the correlation coefficient q(f 1 , f 2 ). This coefficient cannot be obtained directly. One can only estimate the relationship q(y 1 , y 2 ) between the observed variables, i.e., the responses to questions with respect to job satisfaction (y 1 ) and life satisfaction (y 2 ). The relationship between f 1 and y 1 and between f 2 and y 2 will not be perfect because of the measurement errors (e 1 and e 2 ). The standardized effect of the variable of interest f i on y i is called the quality coefficient (q i ). This simple idea is presented in Fig. 1. A more elaborate measurement model will be presented later.
If the variables of interest (opinions) and the errors are uncorrelated and the variables are standardized, the variance of the observed variable is 1 and it follows that 2 : Because of this result, the quality of the ith question is q i 2 = 1 -var(errors) and the coefficient q i is called the quality coefficient.
It can also be shown, using path analysis, that the following relationship exists between the correlation for the observed answers q(y 1, y 2 ) and the correlation for the latent variables of interest q(f 1 , f 2 ): var(e i ) ,so the total variance of y i , which is equal to 1, can be decomposed into the quality of the question i plus the proportion error in the variance in the observed variable y i . Normally, q i 2 has been called the reliability of a measurement instrument (Lord and Novick 1968) but we prefer the term quality for reason that will become clear in Sect. 4.
The two correlations are only equal if the quality of both measures is perfect (equal to 1); i.e., there are no measurement errors. Unfortunately, this will never occur. What happens if the qualities of the measures are different from 1.0 is presented in Table 2.
In the example, we assume for illustrative purpose that the correlation is .9 between the two variables of interest so q(f 1 , f 2 ) = .9. Whenever the quality of the two variables goes down, q(y 1 , y 2 ) will also go down but much faster. If the quality of the measures is equal to .5 (the average quality in survey research, Alwin 2007), then the quality coefficients q i are .7 and the expected correlation between the observed variables will be only half of the size of the correlation between the variables of interest. If the quality coefficients go down to .6, then this correlation will be as small as a third of the true value. The correlations between the variables of interest are very much underestimated, if one does not correct for measurement errors.
However, this is not the only problem. Measurement errors also make comparisons of correlations questionable. Imagine that a researcher is interested in the correlation of age and job satisfaction with life satisfaction. Imagine that the effect of age on life satisfaction is .4 and the effect of job satisfaction is .6. The quality of the measurement of age will be nearly perfect. But we can expect quite some measurement errors for the two other variables. Let us assume that they both have a quality coefficient of .6. In Fig. 2, we have presented this model.
The correlation between the observed variables LS and age will be .238 (=.4 9 .99 9 .6), and the one between LS and job satisfaction will be .22 (=.6 9 .6 9 .6). Apart from the fact that, due to errors, these correlations are much lower than the correlations between the variables of interest, the researchers may also draw the wrong conclusion that age is a bit more correlated than job satisfaction with life satisfaction. The true correlation between life satisfaction and job satisfaction is much larger (.6) than the correlation between life satisfaction and age (.4).
For the same reasons, comparisons of relationships across countries or cultural groups cannot be made if one does not know whether the measurement errors are comparable. To illustrate this point, imagine that in two countries the correlations between two latent variables are equal to .8, but the quality of the measures is rather different, say for both measures q = .9 in country A and q = .7 in country B. Then, using Eq. 2, one can derive that the correlations between the observed variables will be .65 in country A and .40 in country B.
Overall, the conclusion should be that without correction for measurement errors, one runs the risk of very wrong conclusions with respect to relationships between variables and differences in relationships across countries. The commonly known procedure to correct for measurement errors using structural equation models (SEMs) has already been introduced in Goldberger and Duncan (1973) together with the LISREL program for estimation of these models with latent variables (Jöreskog 1973). A ''simple'' example of such a model is presented in Fig. 3. Here the researcher has made a model for explanation of ''Environmental friendly behavior'' using two endogenous variables ''Environmental values'' and ''Influence'' and two exogenous variables ''Perception of environmental damage'' and ''understanding politics.'' All these variables are latent variables. Each of these latent variables is measured by two indicators. This can be single questions or two composite scores based on several indicators. Essential in this approach to correct for measurement errors is that one needs for each latent variable at least two observable indicators. Otherwise, the error variances cannot be estimated and corrected. This is also the major problem of this approach. The research costs increase. This approach also increases the length of the survey, the burden for the respondents and the complexity of the models. As a consequence, the estimation and testing of these models become more difficult as well. All these reasons may have been enough to prevent the use of this in principle correct procedure (see Table 1).
But correction for measurement errors does not have to be so difficult. A much simpler procedure to correct the correlation matrix or covariance matrix for measurement errors has been developed earlier. After that, one can analyze the data as if there were no measurement errors. This approach follows directly from Eq. 2: Table 2 Effect of the measurement quality on the observed correlation given that the correlation between the variables of interest is .9; i.e., q(f 1 , f 2 ) = .9 Quality coefficient Quality coefficient Observed correlation q 1 q 2 q(y 1 , y 2 ) Equation 3 suggests that the correlation between the variables of interest is equal to the correlation between the observed variables divided by the product of quality coefficients of the measures used. Thus, correction for measurement error in the observed correlation is very simple if the quality estimates of the observed variables are known. This result holds for single questions as well as composite scores. This result was already known in psychology for a long time (Lord and Novick 1968). However, also this approach is hardly used.
Let us illustrate this two-step procedure with an example from a recent study of opinions about democracy in Europe. The data have been collected in the pilot study of the 6th round of the European Social Survey (ESS). Using Mokken scaling (Mokken 1971), the composite scores on two latent variables were obtained: • The first one, based on opinions about liberal rights, was called ''liberal democracy,'' and • The second one, based on opinions about electoral requirements, was called ''electoral democracy.'' The Mokken scale also provides an estimate of the quality of the measure developed. The qualities of these two scales turned out to be .79 for liberal democracy (q 1 2 ) and .77 for electoral democracy (q 2 2 ), while the estimated correlation q(y 1 , y 2 ) between the two scales was .638. To correct for measurement errors in this correlation, we use Eq. 3 and we get: We see that in this case the correlation increases of about 20 % when correcting for measurement error.
In the study mentioned above, it is also expected that the variable ''liberal democracy'' should correlate with the variables measuring opinions about the importance for the democracy of preventing poverty, holding referenda and sufficiently high incomes for the people. For these observed variables, the quality has also been estimated using SQP 2.0. For the opinion about poverty, called ''Just,'' the quality was .51, for the opinion about referenda, called ''Direct,'' the quality was .62, and for the household income, called ''Income,'' the quality was .92.
As mentioned above, correction for measurement errors in further analysis of data is rather simple after correction of the correlations or covariances for measurement errors. The analysis with correction for measurement errors can be implemented in a simple way with programs which can use as data the covariance or correlations matrix like LISREL, STATA or the Lavaan module for R. For more information and examples, we refer to the ESS training modules (De Castellarnau and Saris 2014). 3 Appendix 1 presents the input for the analysis with and without correction for measurement errors using the program LISREL. The only difference between the inputs without and with correction for measurement errors is the diagonal of the correlation matrix. Without correction for measurement errors, the original correlations are used, whereas with correction for measurement errors, the diagonal of the matrix contains the estimates of quality of the measures. This is the same as the original variance, which is (in this case) 1 minus the estimated error variance. This change is enough to get the estimates of the regression coefficients corrected for measurement errors.
The program computes the correlation matrices correcting for measurement errors. This is done using the formula mentioned in Eq. 3. It leads to considerable changes in the correlations between the variables, as can be seen in Table 3. As a consequence, the estimated effects of the different variables also considerably change when correcting for measurement errors.
While without correction for measurement errors, all three variables have significant effects on the opinion about liberal democracy, after correction, the effect of the variable ''Just'' is nearly twice as large and the effect of the variable ''Direct'' is a fourth of what it was before and is even not significant anymore.
To conclude, since correction for measurement errors is simple, it can and must be done because the results can be very different with and without correction for measurement errors. However, it is necessary that good estimates of the qualities of the questions are available.
4 Are Estimates of the Quality of Survey Questions Missing?

Estimation of the Quality of Questions
There are a lot of different procedures to estimate the quality of questions and measures for complex concepts. Maybe the most well known is the test-retest design (Lord and Novick 1968) to estimate the reliability of questions. An adjustment of this approach is the quasisimplex model (Heise 1969;Wiley and Wiley 1970) used by Alwin and Krosnick (1991) and (Alwin 2007). The multitrait-multimethod (MTMM) design was suggested by Campbell and Fiske (1959) to take the effects of the method used into account. For concepts with multiple indicators, different procedures have been developed based on latent variable models like factor analysis (Lawley and Maxwell 1971;Harman 1976) and latent class analysis (Hagenaars 1988;Vermunt 2003;Breiman 2001). Besides that, scaling methods have been developed, like Thurstone scale, Likert scale (Torgerson 1958), Gutmann scale and Mokken scale (Mokken 1971), unfolding scale (Van Schuur 1997), Rasch scale (Rasch 1960) and item response theory (Hambleto et al. 1991). For the advantages and disadvantages of these different procedures, we refer to this literature.
All these procedures require at least two questions for each concept. That means that the number of questions is at least twice the number of concepts one likes to evaluate with respect to quality. As a consequence, these procedures lead to rather costly and timeconsuming research with rather complex procedures. Besides, all these procedures provide the estimates of the quality of questions specific to the formulations of the questions used in the specific questionnaire and context. Generalization is not easily possible.
Therefore, before the final data collection, a lot of research has to be carried out in order to determine the quality of all variables to be used. This is so much work that it is only seldom done, as can be seen in Table 1. Thus, the question is whether there is a procedure that is less time-and money-consuming to estimate the quality of survey questions.

Prediction Instead of Estimation
The alternative to estimate the quality of questions is to predict the quality of the questions. This can be done if one has a sufficiently large database of questions for which the qualities are known and one has found an algorithm that can predict with high precision the quality of these questions on the basis of their characteristics and the context in which they have been asked. If this is the case, one can also apply the algorithm on new questions to predict their quality and one does not have to collect any extra data to estimate the quality of these new questions.
First, this approach has been worked out in three countries and led to the program SQP 1.0 (Saris and Gallhofer 2007). Later, it was extended to many different European countries participating in the ESS and led to the development of the program SQP 2.0 Saris and Gallhofer 2014) which is free of charge and available for use at sqp.upf.edu. Since 2002, each round of the ESS contains 4-6 experiments to evaluate the quality of the questions, carried out in most countries. Consequently, after 6 rounds, the quality is now known 4 for more than 5000 questions in more than 20 different languages.
Concerning the way the program SQP 2.0 has been developed, we refer to Saris et al. (2011) and Saris and Gallhofer (2014). Below, we give only a brief description of the program.
The quality of questions was determined analyzing the data of MTMM experiments (Andrews 1984;Saris et al. 2004). In this way, we obtained estimates of the reliability (1 -random errors variance), validity (1 -method error variance) and the quality (product of reliability and validity) of all questions involved in the experiments and in all the different countries.
The idea was to use the characteristics and the context of the questions as predictors of the quality of questions. Therefore, we have made a program to code the questions that were involved in the experiments of the ESS. For details of these characteristics, we refer to Saris et al. (2011). People who were native speakers in the different languages involved in the ESS and were able to understand English coded the questions. Results were controlled by comparing the codes from different languages with the codes of the English source questionnaires, and corrections were made in the codes when necessary.
The next step was to choose a procedure to study the relationship between the question characteristics and the quality estimates of these questions. For this purpose, we have not chosen the regression model used in the past (Saris and Gallhofer 2007) but the so-called ''Random forest'' approach developed by Breiman (2001), because it was suggested to be the most efficient prediction procedure for this kind of problems.
It turned out that this procedure indeed provided better predictions of the reliability and validity for our data. The R 2 was .69 for reliability and .72 for validity.
Finally, based on this knowledge, this algorithm has been used to develop the computer program SQP 2.0 5 to generate predictions of the quality of questions .
In order to predict the quality of new questions, the user has to code the characteristics of the question. Then, the program generates the prediction of the quality of the question. This means that researchers can now get, via SQP, a prediction of the quality for most ESS questions but also for other new questions without further costs than the time required to introduce the question in the program and to code it.
So far, we discussed the estimation and prediction of the quality of single questions. Often, researchers use concepts based on several indicators. Therefore, we need also a solution for the prediction of the quality of composite scores for complex concepts. Such a solution exists based on the predictions of the quality of single questions. For this topic, we refer to Saris and Gallhofer (2014, chapter 14) and the ESS online training module (De Castellarnau and Saris 2014) mentioned above.
We can conclude that, nowadays, a simple procedure is available to obtain the quality of existing and new questions and composite scores for concepts with multiple indicators.

A realistic Illustration
As an illustration of this approach, we have chosen a research issue that has been studied by many researchers recently on the basis of the ESS data of Round 3. It is the explanation of opinions of people about extra immigration in their country. Some of the variables introduced in the ESS for explanation of this opinion and the model proposed are presented in Fig. 4.
The questions asked to measure these concepts are presented in Appendix 2.
In this study, the questions B37 till B40 have been used, respectively, for the variables ''Allow,'' ''Economic threat,'' ''Cultural threat'' and ''Better live.'' The question ''Allow'' is quite different from the other three which have been specified with the same type of scale. This can lead to a standard reaction of the respondents which is called the method effect (Campbell and Fiske 1959), for example giving each time the same answer (straightlining) or using the scale in a specific way (response style). Because this effect will occur in all three observed variables, one can expect an extra correlation between these variables. This correlation is called the common method variance (CMV). This means that our earlier simple measurement model should be adjusted to take the method effect into account, as presented in Fig. 5.
Equations 2 and 3 have also to be changed in order to take this CMV into account. The more realistic equations are presented in Eqs. 4 and 5.
From which follows as before: The predicted quality (q 2 ) of the questions decomposed into reliability (r 2 ), validity (v 2 ) and method effect (m 2 ) is presented in Table 4.
In the top part of Table 5, the correlations between the observed variables have been presented below the diagonal. Above the diagonal, the CMV 6 for the different correlations is presented. In the lower part of the table, the CMV is subtracted from the observed correlations, and on the diagonal, the quality of the different measures are placed as before. This latter covariance matrix has been used for the estimation of the effects in the model corrected for measurement errors. The correlation matrix below the diagonal at the top part of the table is used for the estimation of the effects without correction for measurement errors. The results of these two analyses are presented in Table 6. The table shows again the considerable differences in the effects depending whether one corrects for measurement error or not.
While the economic threat has a significant effect in both regression equations if one does not correct for measurement errors, this effect is reduced to zero after correction for measurement errors, while the effects of Better life and Cultural threat become much larger. Also, the explained variance is increased considerably as expected when the random error is removed from the variance of the dependent variables.  This result shows once again the importance of correction for measurement errors in the analyses.

Conclusions
In this paper, we have discussed three possible reasons why researchers in the social sciences hardly correct for measurement errors.
The first reason was that the effect of these errors may be ignorable because they are very small. Based on a theoretical argument, empirical research and several illustrations, we have shown that this is not the case. The effects can be considerable. If one does not correct for measurement errors, the consequences are the following: 1. The estimates of the relationships between variables are in general biased: They can be underestimated because the quality is below 1 or overestimated because of high method effects. 2. Thus, the estimates of effects of different variables can lead to wrong conclusions. 3. The estimates of the relationships across countries cannot directly be compared.
Thus, correction for measurement errors is absolutely necessary. The second possible reason for not correcting for measurement error was that the correction procedures are too complicated. However, we showed that a very simple procedure is available for correction. We showed that any model can be estimated correcting for measurement errors by adjusting the covariances for CMV and subtracting from the variances of the variables in the covariance matrix the error variance of each variable. In this case, one gets automatically estimates of the parameters in the model corrected for measurement errors. All steps can be done in a SEM program like LISREL, but it can also be done in other programs. The general conclusion is that there are simple procedures for correction for measurement errors available. So this cannot be a reason to ignore the presence of measurement errors. * Indicates that the coefficient in absolute value is more than twice as large as the standard error of this coefficient The third possible reason was that the size of the measurement error variance or the quality of the measures is not available. It is indeed quite some work to collect information of the quality of all variables of interest. However, we presented in this paper a new approach based on the meta-analysis of a lot of measurement quality experiments which makes it possible to predict the quality of nearly all substantive variables just by coding the characteristics of the question of interest. This procedure is available in the program SQP 2.0 which is freely available for use.
Given this situation, correction for measurement errors can routinely be done. There is no reason anymore to ignore measurement errors.
Above, we mentioned that an alternative for the procedure suggested here is the multiple indicators approach. This approach can be applied in combination of latent variable models (as illustrated before) or with models without latent variables but with composite scores as observed variables. Both approaches are in principle correct, but they seem too complex for researchers because Table 1 shows that they are only seldom used. The problem of the multiple indicators models combined with latent variables is that the models become rather complex, and as a consequence, the models can contain many misspecifications, and it is not so simple to cope with this complication.
The use of composite scores based on the multiple indicators occurs more often. In that case, often the Cronbach's a is computed as an estimate of the reliability of the composite score. These reliability estimates are normally far from 1 and vary in size across the different concepts. Therefore, correction for measurement errors is necessary. These coefficients could be used as quality estimates for the composite scores, and one could apply the procedure mentioned above. However, this is also only seldom done, and if that is not done, one cannot rely on the results of such analyses. Gallhofer (2007, 2014) indicated that the quality of composite scores can also be evaluated on the basis of information about the quality of single questions, based on SQP, and the correlations between these indicators. This procedure will lead in general to different results with respect to the estimates of the quality of the composite scores because the Cronbach's a makes more assumptions than this approach.
Thus, we think that the procedure suggested in this paper is simpler and makes fewer assumptions. Therefore, we hope that this procedure will be used more frequently in the future to correct for measurement errors in survey research.

Limitations
The new procedure using predictions of the quality of questions is just as good as these predictions are. They depend on the database of questions on which the prediction algorithm has been developed, precision of the estimation of the qualities and the precision of the prediction algorithm. Thus, the corrected results may not be perfect, but it is always better to correct for measurement errors than not to do so, because we know that the quality of the question is never perfect and can lead to serious errors in the estimation. An alternative will be to use SEM, but this should in fact lead to exactly the same estimates (Saris and Gallhofer 2007). Besides, as we can see in Table 1, people are not using it: It seems to be too complicated.
An important limitation of this approach is that MTMM experiments are difficult to formulate for background variables questions and other factual questions asking for dichotomous responses like yes/no or done/not done, etc. However, for these questions, one can rely on the information about the quality that has been provided in the work of Alwin (2007) based on panel data with the same questions.
Furthermore, we have to say that there will always be new questions that are quite deviant from the questions that are now in the database. For example, so far the new Web survey questions have not yet been entered in the system. So occasionally SQP cannot be used. This means that SQP needs an update regularly to accommodate new question types.
It can also happen that in correction of correlations for measurement errors one obtains values larger than 1 or smaller than zero. This is of course not an acceptable result. The reason for this problem and the way to cope with this situation can be found in De Castellarnau and .
In the analysis of the data using the two-step procedure for correction for measurement errors, one should realize that the standard errors of the estimated regression coefficients and parameters of causal models will be underestimated. If one would use latent variables models with for each latent variable one observed variable with the loading fixed on q and the errors variance on 1q 2 then the estimates of the causal effects would be identical, but the standard errors would be a bit larger. However, also these standard errors would be a bit underestimated. For more details of this issue, we refer to Oberski and Satorra (2013).
On the basis of the presented results and the limitations mentioned, we draw the conclusion that researchers using survey data have in general the possibility to correct for measurement errors in their data and have to do so in order to make their results and conclusions believable.