EVS is one of the oldest and longest operating comparative survey projects in the world. In this chapter, we argue that projects like this must strive for the highest possible methodological standards. This is true for all aspects of measurement, particularly with respect to equivalence of questions between social groups, languages and countries. Of equal importance are aspects of representation underlining the necessity of randomly selected samples. But the quest for quality must go beyond methodological considerations. Projects like the EVS can only achieve excellence when firmly integrated in the scientific community and the wider social context. To ensure the utility of the data for scientific research a continuous dialogue between users and those responsible for the surveys on their content is necessary. Additionally, it is important to ascertain the relevance of surveys to a wider audience and, specifically, to those selected to take part in them. The argument is developed through five theses that are to be considered when working in comparative survey programs
In the landscape of comparative surveys in Europe, the European Values Study (EVS) is one of the important landmarks: ‘duration since the first round of data collection’, ‘number of countries participating’, ‘involvement of the scientific community’ are just a few important keywords that define the project. But can we derive some general lessons for comparative surveys from the comparison of EVS, the project Loek Halman was for a very long time involved in, with other similar surveys like the European Social Survey (ESS) or the International Social Survey Programme (ISSP)?
We think this is possible and organize our reflections around five theses, covering a number of important dilemmas which everyone involved in implementing international comparative surveys faces. As in many text books (among others Groves et al., 2004), we will consider the “measurement” side as well as the “representation” side, but keep in mind that both are central to survey quality in general and the implementation of comparative surveys in particular.
The first thesis that we want to propose goes as follows: (Comparative) surveys are only useful if they are the product of a joint effort of survey specialists and those using these data for their research: The Total Research Quality quest.
The total survey error (TSE) framework is now a widely accepted perspective in the field of survey methodology (Lyberg and Weisberg, 2016). One way to present it is to take into consideration all sources of errors potentially endangering the results of a survey and to try minimizing their impact under given constraints, e.g., the available financial budget or other resources. Parallel to the TSE, in particular by statistical offices, the concept of “Total Survey Quality” was developed, taking into account three levels of quality, namely product, process and organizational quality. Lyberg and Weisberg (2016) propose to merge TSE with the Total Survey Quality approach into a “Total Research Quality” perspective including, for example, the discussion of the mode of research and the adequacy of survey characteristics in relation to research goals.
Why is this important for the EVS and for comparative surveys in general? There are at least two reasons. The first one is that the TSE paradigm implies that one should document all decisions on implementation and have close control over the fieldwork and document possible adaptations and deviations from central standards in each country; i.e., definition of the sample, ways to control the “randomness” of the process, as well as measurement characteristics and translation procedures. It must be emphasized that such a transparency process was put forward by the ESS since its first edition in 2002.1 It was also developed for some ISSP modules (Joye, Sapin & Wolf, 2019a).
As for the second reason, when Lyberg and Weisberg (2016) mentioned the idea of Total Research Quality, they were also thinking about introducing a link between those responsible for the survey and the research community as an element of quality. A discussion about the validity and quality of a survey must, therefore, include those who will use the data in the end.
All comparative projects face the challenge to integrate scientific excellence and research interests with methodological expertise and technical know-how. While continuously involving researchers in yearly or biennial surveys like the ISSP or ESS, this is particularly challenging for a project with a nine-year cycle like EVS. Here it is almost impossible to organize a sustained debate between researchers and survey practitioners for such a long period of time for “only” one survey. Thus, the challenge is to maintain a functioning network of survey methodologists and substantive researchers from diverse scientific disciplines able to work together or, even better, researchers having substantive and methodological competences. This implies thinking about the organizational structure of a comparative survey project (de Graaf & Halman, 2013).
The second thesis we propose is: Surveys are co-created by researchers and respondents.
The strength of the EVS or the ISSP is the integration of different disciplines of the social sciences as well as statisticians and survey methodologists. In this way, these surveys are good examples of interdisciplinarity where we expect not only a juxtaposition of different disciplines (multidisciplinary) but a real integration of the different perspectives. One step further in this direction is to communicate to a broader public, for example, by using graphical tools like cartographic presentation (Halman, Luijkx & van Zundert, 2005, but see also https://www.atlasofeuropeanvalues.eu/). The idea of transdisciplinarity is taking this even further by additionally postulating that everyone participating in the process of research is contributing to its outcomes, i.e., also respondents (Hirsch Hadorn et al., 2008).
The cooperation with respondents is a crucial point for different reasons. First of all, because of the feeling of freedom and respect. For example, not accepting a “don’t know” but instead forcing an answer in an internet survey is often the cause for a break-off. Also, coercing a respondent to cooperate in a survey is not an acceptable interviewer behaviour. More generally, a respondent is not to be seen as the object of an experiment but as a partner in a knowledge production process. This idea of social exchange was already used by Dillman and colleagues (2014) in order to increase participation.
A second reason why cooperation with respondents is crucial is interest. For example, Groves, Presser and Dipko (2004) have shown a link between interest and participation. If such an effect is strong, this would influence the results because interested people may not have the same attitudes, values or demographic characteristics as others. Thus, the survey must be as interesting as possible for most respondents while the design should make sure that not only the easiest to reach respondents are selected but all sample members have a chance to participate.
Thirdly, another reason is the centrality of the concept. We know that respondents will generally give an answer to every question even if it is not central to their belief system. This phenomenon is covered in survey methodology under the heading of non-attitudes (Converse, 1974; but see also Barton, 2011). If we take this seriously, we should not only ask about the opinion that we are interested in but also about the strength of the opinion (Smith, 1984).
Fourth, reaction to language and content is crucial, too. We know that some words are problematic in some contexts: for example, “race” in a French or German speaking context. The use of such words can interfere in the relation between interviewers and interviewees. At the same time, one experiment has shown a relatively small effect of such terms (Joye et al., 2012). The recommendation is to be attentive to the choices of wording in different contexts and to have more empirical results, either qualitative or quantitative, in order to be able to make the best adjustments. This also underscores the importance of taking into account the conceptual space of an item when crafting it and not leaving this type of question to translation or adaption which is addressed at the end.
The fifth and last crucial point we propose is that the discussion about the relation with respondents can go even further. The discourse initiated by a survey may also have a mobilization effect, called a performative function in some sub-disciplines, not least because a survey typically addresses thousands of people. For example, imagine a survey insisting on the differences, largely fictitious, between group A and group B, planting the seed of division inside a country. Thus, we should choose the topics of our surveys carefully and take possible feedback effects into account when deciding about its implementation. In this way, we can defend the idea of a survey’s ethic, not limited to the scientific questions some researchers can have but considering the public debate that is implied by a sociological intervention on many thousands of individuals. Here again, detailed knowledge of the national circumstances is indispensable showing once more the importance of national survey collaborators.
In the end, a survey is not only about asking questions to respondents but also a sociological experience for all who are involved. The EVS seems to be well positioned in this regard with a core on “values”. In any case, this should be reflected when thinking about the future of comparative survey programs.
The third thesis we propose is: Comparative surveys must strive for an organizational structure which balances centralized and decentralized decision making.
The sustained involvement of methodologists and substantive researchers with different disciplinary backgrounds is only one ingredient contributing to the successful functioning of an international comparative survey. The consideration of users and respondents is another one. A further important element is the way the survey program is governed and organized. Some are advocating a centralized governance making it possible to systematically enforce common procedures and standards. Others are arguing that a more decentralized system is the best because only then will knowledge of local realities be fully utilized in order to find the most efficient implementation of the survey. This is an old debate in the organization of international surveys (Lynn, 2003).
ESS is an example of a relatively centralized survey program with many centralized controls at each step of implementation.
On the other end of the spectrum is the (old) Generations and Gender Survey (GGS). Here countries enjoyed a lot of freedom in sampling, fieldwork, questionnaire design etc. Some information in some countries even came from registers and not from a survey.
ISSP is an example for a specific mix of central and decentral decision making. Basic principles of sampling or fieldwork are fixed for all members. However, these mandatory rules are not defined by a remote centre but decided jointly by all involved parties and a committee is tasked with evaluating country-specific survey designs vis-à-vis these principles. Likewise, questionnaires are based on mutual agreements but then adapted within predefined limits to national circumstances. Thus, there is a mix of central rules that are set together combined with specified degree of freedom to adapt procedures to national conditions in order to make the survey as valid and relevant as possible.
The EVS, at least in its last two waves, was somewhat in between ESS and ISSP, having a strong involvement of countries in the decision process but also a set of rules and procedures defined by a small central circle.
We tend to defend a mixed approach which emphasizes common standards and central evaluations of survey quality with possibilities to make adaptations on the national level. Furthermore, the capacity for innovations will be greater in a less rigid system because the involvement of national collaborators will be stronger and interplay between them will be more diverse and mobile. Finally, an organization which does not rely on a strong centre could be more resilient to unforeseen shocks. If we are correct in these assumptions, ISSP should be the most “agile” comparative survey, at least with respect to the adaptation over time and reactions to changes in the environment. The EVS could reconsider what mix of central and decentralized decision making is most suitable in its future organization. Of course, there is a price to pay for a more decentralized organization. The need to document what happens in the different countries and the implication of diverse methodological choices on the comparativeness of the data have to be acknowledged. Furthermore, considering documentation does not suffice, rather one has to find suitable ways to present it and make it accessible and meaningful to the users.
As fourth thesis, we propose: Random sampling has to be mandatory for all participating countries; but the choice of sampling frame has to be based on national circumstances and choice mode.
For some years, face-to-face surveys—the assumed gold standard for high quality surveys—have become more and more difficult to organize in many countries. There are many reasons for this trend, from long-term changes in lifestyles to events like the COVID-19 pandemic which began at the end of 2019. In sum, these trends led to steep increases in survey costs and dwindling response rates (Wolf et al., 2021). Therefore, research on alternative modes of data collection is urgently needed keeping in mind that despite the flexibility in the survey mode, the demand for a sample of high quality remains. But what exactly is a sample of high quality?
A quality sample first of all is a random sample meaning that every element of the predefined target population has a known and non-zero probability to be selected for participation.2 This definition excludes quota samples but also all methods allowing respondents to self-select into a survey. This prevents, for example, the use of internet-based access panels based on volunteer participation. But a sampling frame consisting of email addresses can also be problematic, knowing that in most countries some individuals do not have an email address and others many email addresses, without information available for researchers in order to correct these selection biases. Only random samples allow the use of statistical estimates of errors and population values. Therefore, relatively small random samples usually are far better than large non-random samples.
In some countries researchers have tried to solve this problem by using what is called a probability-based online panel: typically very carefully designed studies based on a randomly selected set of respondents which comprise the panel and are then interviewed at different occasions and on different topics. Using such a panel as sample for a comparative survey is tempting. However, one must reckon with the possibility that the panel will be more and more selective because some participants will have a higher likelihood to leave the panel than others, i.e., the panel will suffer from differential attrition. This implies that the panel sample will deviate more and more from the population. Without any further elements of design aimed at controlling for such a bias, panels are not acceptable for this type of survey where the aim is to gain knowledge about the population. However, we should emphasize that probability-based panel studies that monitor and correct for differential attrition and regularly add randomly drawn refreshment samples should usually satisfy the demand for a quality sample.
To fulfil the demand of a random sample, a “translation process” in which the general rule is interpreted and adapted to national circumstances is needed. This process and its results should be validated by a specialized committee which will assess the proposal balancing costs and benefits of specific decisions.
Sampling is not the only element of design for which the right balance between standardization and adaption must be found. Among others, this is also true for the choice between different data collection modes. For the ESS face-to-face survey mode is mandatory. For the ISSP, the questionnaire is drafted with a self-completion format in mind, but each country is free to opt for face to face, mail or web. Only telephone interviews are not accepted because such a mode does not allow presenting visual material during the interview. For the EVS, the model was face to face but the move to other formats was tested during the last wave, where a very ambitious methodological program was realized in parallel to the main survey (Luijkx et al., 2020).
The fifth and final thesis read: Reaching measurement equivalence over time is particularly challenging and should be of central concern to comparative survey programs.
Measurement equivalence and, more generally, the quality of questions and scales over time are challenging even more because the challenges vary by type of question and content and multiply in comparative settings (Halman & Moor, 1993; Wolf et al., 2016). More specifically, the following three issues are at play.
Firstly, it is now widely accepted that for most concepts multi-item measurement, i.e., a scale comprising different items that can be combined into a composite measure, is the path to follow because it allows for estimation of measurement error and degree of equivalence between time points and countries. For the ESS this was an explicit part of the reasoning when launching this program. Items may change their meaning over time thereby changing the covariance structure in the scale. To be able to react to these changes and to be able to include new developments into the survey while keeping comparability over time is among the biggest challenges of survey research. In the ISSP, this challenge is met by a rule that for modules which are repeated two thirds of the items have to be replicated while up to one third of new items can be introduced. This institutionalized room for innovation is missing in the EVS.
Second, for single item measurement the choice of wording is even more sensitive than for multi-item measures and it could be problematic in two different directions at least. On the one hand, because of transferring meaning into different cultural contexts: all the work on translation shows how challenging it is to arrive at equivalent formulations (Behr & Shishido, 2016). On the other hand, because of the evolution of meaning in time: the EVS and ISSP cover nearly a forty-year time span. During this time words may well have changed their meaning (Smith, 2005). For example, up to the 1970s the term “printer” most likely was interpreted as denoting a person producing printed matter while nowadays this term would almost exclusively be seen as to refer to a machine producing printed material. But even if a term keeps its general semantic meaning its social significance may vary greatly. For example, “inflation” was a very important political issue in the 1970s in western Europe but currently is of only very little concern. Another example are gender roles which have changed a lot during the last decades. Therefore, the scales used to describe them in the 1980s are less accurate today (Walter, 2018).
Third, for socio-demographic variables the underlying nomenclatures – i.e., the administrative or societally acceptable categories – typically vary over time (Schneider, Joye & Wolf, 2016). Examples include the ISCO classification for occupations with its 1968, 1988 or 2008 variants or more recently the change in the number of genders classified with a respective question. The changes sometimes can be rather big as, for example, in the case of education where a very important work of standardization based on ISCED-11 has been developed (Schneider, 2011, but see also Ortmanns & Schneider, 2016). The result is very satisfying for more recent editions of surveys, but older surveys cannot be made fully forward compatible with this classification. The situation is the same for EVS as well as ESS or ISSP but has, nevertheless, to be addressed, perhaps in a common way between these programs.
All these examples emphasize the importance of the translation process but, more generally, the importance to put measurement in its cultural and temporal context. To meet this challenge, we should not only rely on using teams of translators as is now the standard for all the surveys we mention here, and we should not only rely on quantitative analysis of invariance. We must also consider a more carefully guided development of items and, for example, more often employ cross-cultural cognitive interviews (Miller et al., 2011; Willis, 2015) or probing approaches (Behr et al., 2014). On the quantitative side, some ideas of “scaling” in different contexts (Mohler, Smith & Harkness, 1998; Joye, Birkelund & Lemel, 2019) using eventually auxiliary information (Clogg, 1984), seem a path to follow. In other words, a mix of qualitative and quantitative approaches to ensure comparability over time and place should be developed, using and developing innovative approaches.
What consequences can we draw from these observations for the future of comparative surveys? From our point of view, three points are particularly important. First of all, to continue to closely monitor the fieldwork in each country is essential, with as much exchange as possible between the national team involved in doing the field work and the central coordination. In this respect EVS seems in a good position between the decentralization of ISSP and the centralization of the ESS. One challenge being still the documentation of the methodological choices and the way to communicate their consequence for the users.
Second, samples must remain random: for this type of survey, this is the only means to ensure quality in a comparative perspective. Furthermore, weighting procedures “redressing” the sample can be problematic. First, because the variables on which weights are based, are not necessarily those mostly related to the bias and, second, the mechanisms driving bias most likely will be country-specific implying that weighting not necessarily should rely on the same variables in each country (Joye, Sapin & Wolf, 2019b). Using a strict random sample and aiming at the highest possible response rate by employing a diversity of strategies is likely the best insurance for quality, in particular a comparative perspective.
Third, to adapt the questions between cultures and, probably more important, over time, is, in our mind, the most difficult challenge for the EVS.3 To meet it we will probably have to prepare an innovative research program putting together quantitative and qualitative approaches in a comparative frame. And such a research program could be at the heart of the future of the EVS.
Since the inception of the EVS, the global landscape of surveys in Europe has changed. The EVS now competes with other comparative surveys like the ESS, the ISSP, the GGS, SHARE and many more. The EVS must find its place in this new landscape and develop an infrastructure in order to support it. Or it will have to build a network platform and join forces with other comparative surveys. The SERISS project was certainly an example in this direction but a substantive research program, mixing quantitative and qualitative approaches in order to understand better what is involved in comparative survey method is still to develop.
Barton, A. H. (2011). Nonattitude. In P. J. Lavrakas (Ed.) Encyclopedia of Survey Research Methods (pp. 512-513) Sage.
Behr, D., Braun, M., Kaczmirek, L. & Bandilla, W. (2014). Item comparability in cross-national surveys. Results from asking probing questions in cross-national web surveys about attitudes towards civil disobedience. Quality & Quantity, 48, 127-148.
Behr, D. & Sishido, K. (2016). The Translation of Measurement Instruments for Cross-cultural Surveys. In
C. Wolf, D. Joye, T. W. Smith & Y. Fu (Eds.) The Sage Handbook of Survey Methodology (pp. 269-287). Sage.
Clogg, C. C. (1984). Some statistical models for analyzing why survey disagree. In C. F. Turner & E. Martin (Eds.), Surveying subjective phenomena (Vol. 2, pp. 319-367). Russel Sage Foundation.
Converse, P. E. (1974). Nonattitudes and American Public Opinion: Comment: The status of Nonattitudes.
The American Political Science Review, 68, 650-660.
Dillman, D. A., Smyth, J. D. & Christian, L. M. (2014). Internet, Phone, Mail and Mixed-Mode Surveys: The Tailored Design Method (4th edition). John Wiley.
de Graaf, P., & Halman, L. (2013). An early example in the international survey landscape: The European Values Study. In B. Kleiner, I. Renschler, B. Wernli, P. Farago, & D. Joye (Eds.), Understanding Research Infrastructures in the Social Sciences (pp. 114-122). Seismo Press.
Groves, R. M., Presser, S. & Dipko, S. (2004). The Role of Topic Interest in Survey Participation Decisions.
Public Opinion Quarterly 68, 2–31.
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E. & Tourangeau, R. (Eds.) (2004).
Survey Methodology. Wiley.
Halman, L., & Moor, R. d. (1993). Comparative Research on Values. In P. Ester, L. Halman, & R. d. Moor (Eds.), The Individualizing Society. Value Change in Europe and North America (2 ed., pp. 21-36). Tilburg University Press.
Halman, L., Luijkx, R., & van Zundert, M. (2005). Atlas of European Values. Tilburg University.
Hirsch Hadorn, G., Hoffmann-Riem, H., Biber-Klemm, S., Grossenbacher-Mansuy, W., Joye, D., Pohl, C., Wiesmann, U., Zemp, E. (Eds.) (2008) Handbook of Transdisciplinary Research, Springer.
Joye, D., Ernst Stähli, M., Pollien, A. & Sapin, M. (2012). Test, Retest and Translation [Conference presentation] CDSI 2012 Meeting, Washington D.C., United States.
Joye, D., Sapin, M. & Wolf, C. (2019a). Measuring Social Networks and Social Resources: An Exploratory ISSP Survey around the World (Schriftenreihe 22). GESIS https://nbn-resolving.org/urn:nbn:de:0168-ssoar-62256-9.
Joye, D., Sapin, M. & Wolf, C. (2019b). Weights in Comparative Surveys? A Call for Opening the Black Box. Harmonization Newsletter, Fall 2019. https://www.asc.ohio-state.edu/dataharmonization/harmonization-newsletter-fall-2019, accessed 2021/01/02.
Joye, D., Birkelund, G. E. & Lemel, Y. (2019). Traveling with Albert Gifi: Nominal, Ordinal and Interval Approaches in Comparative Studies of Social and Cultural Spaces. In J. Blasius, F. Lebaron, B. Le Roux & A. Schmitz (Eds.), Empirical Investigations of Social Space (pp. 393-410). Springer.
Luijkx, R., Jónsdóttir, G. A., Gummer, T., Ernst Stähli, M., Frederiksen, M., Ketola, K., Reeskens, T., Brislinger, E., Christmann, P., Gunnarsson, S. Þ., Hjaltason, Á. B., Joye, D., Lomazzi, V., Maineri, A. M., Milbert, P., Ochsner, M., Pollien, A., Sapin, M., Solanes, I., Verhoeven, S. & Wolf, C. (2020). The European Values Study 2017: On the Way to the Future Using Mixed-Modes. European Sociological Review, 37, 330-346.
Lyberg, L. & Weisberg, H. F. (2016). Total survey error: a paradigm for survey methodology. In C. Wolf, D. Joye, T. W. Smith & Y. Fu (Eds.), The Sage Handbook of Survey Methodology (pp. 27-40). Sage.
Lynn, P. (2003). Developing quality standards for cross-national survey research: five approaches. Int. J. Social Research Methodology, 6, 323-336.
Miller, K., Fitzgerald, R., Padilla, J.-L., Willson, S., Widdop, S., Caspar, R., Dimov, M., Gray, M., Nunes, C., Prüfer, P., Schöbi, N. & Schoua-Glusberg, A. (2011). Design and Analysis of Cognitive Interviews for Comparative Multinational Testing. Field Methods, 23, 379-396.
Mohler, P. P., Smith, T. W. & Harkness, J. (1998). Respondents’ ratings of expressions form response scales: A two country, two-language investigation on equivalence and translation. ZUMA-Nachrichten Spezial, January, 159-184.
Ortmanns, V. & Schneider, S. (2016). Harmonization still failing? Inconsistency of education variables in cross-national public opinion surveys. International Journal of Public Opinion Research 28, 562-582.
Schneider, S. (2011). Nominal comparability is not enough:(In-) equivalence of construct validity of cross-national measures of educational attainment in the European Social Survey. Research in Social Stratification and Mobility 28, 343-357.
Schneider, S., Joye, D. & Wolf, C. (2016). When Translation is not Enough: Background Variables in Comparative Survey. In C. Wolf, D. Joye, T. W. Smith & Y. Fu (Eds.), The Sage Handbook of Survey Methodology (pp. 288-307). Sage.
Smith, T. W. (1984). Nonattitudes: A Review and Evaluation. In C. E.Turner & E. Martin (Eds.), Surveying Subjective Phenomena, (pp. 215-256). Russel Sage Foundation.