June 14, 2018
I never thought that I’d be programming software in my career. I started using R a little over 2 years now and it’s been one of the most important decisions in my career. Secluded in a small academic office with no one to discuss/interact about my new hobby, I started searching the web for tutorials and packages. After getting to know how amazing and nurturing the R community is, it made me want to become a data scientist. So I set out to do it. Throughout the journey I repeatedly found myself using the European Social Survey (ESS from now on), a really neat dataset that collects information on attitudes, beliefs and behaviour patterns of diverse populations in more than thirty European nations since 2002.
After seeing a niche in the R package community, I created the package
ess in CRAN) to access this data easily from R.
The package was accepted in CRAN in September 2017 and was well received
among social scientists.
The 4th of March of 2018 I submitted the package to rOpensci, intimidated but very excited about the peer review process. To my surprise, the process was enriching, respectful and transparent, unlike my previous experience in academic research.
essurvey package is fairly easy. There are are two main
families of functions:
show_*. They complement
each other and allow the user to almost never have to go to the ESS website.
The only scenario where you need to enter is to register your new account. If you haven’t registered,
create an account at https://www.europeansocialsurvey.org/user/new and
validate through your email inbox.
You can install the development version with
# install.packages("devtools") devtools::install_github("ropensci/essurvey")
or the stable version from CRAN with
Let’s load the CRAN version and load it together with the
set of packages to do some data manipulation.
Given that some
essurvey functions require your email address, this
vignette will use a fake email but everything should work accordingly if
you registered with the ESS. We can set our email as an environment
Before we continue, let’s briefly explain some terminology of the ESS data. Surveys are carried out every 2 years and each survey is called rounds or waves. For example, round one was the first round ever implemented, which dates back to 2002. The second round followed up in 2004 and it’s usually referred to as second round or second wave. There are currently eight rounds freely available.
The ESS has over a thousand questions that include interesting topics such as attitudes towards national governments, democracy, immigration, nationalism, public policy as well demographic and subjective health data on the participants of the survey. Most of these questions use likert-type scales which means that the possible answers to any given question range either from 0 through 5 or 0 through 10. For example, an average question would be something like: how satisfied are you with democracy in your country? and the possible answers range from 0 to 10 where 0 means very unsatisfied and 10 very satisfied.
Let’s suppose you don’t know which countries or rounds (waves) are
available for the ESS. Then the
show_* family of functions is your
To find out which countries have participated you can use
show_countries() ##  "Albania" "Austria" "Belgium" ##  "Bulgaria" "Croatia" "Cyprus" ##  "Czech Republic" "Denmark" "Estonia" ##  "Finland" "France" "Germany" ##  "Greece" "Hungary" "Iceland" ##  "Ireland" "Israel" "Italy" ##  "Kosovo" "Latvia" "Lithuania" ##  "Luxembourg" "Netherlands" "Norway" ##  "Poland" "Portugal" "Romania" ##  "Russian Federation" "Slovakia" "Slovenia" ##  "Spain" "Sweden" "Switzerland" ##  "Turkey" "Ukraine" "United Kingdom"
This function actually looks up the countries in the ESS website. If new
countries enter, this will automatically grab those countries as well.
Let’s check out Spain. How many rounds has Spain participated in? We can
sp_rnds <- show_country_rounds("Spain") sp_rnds ##  1 2 3 4 5 6 7
Note that country names are case sensitive. Use the exact name printed
Using this information, we can download those specific rounds easily
spain <- import_country( country = "Spain", rounds = 1:7 )
essurvey 1.0.0 all
ess_* functions have been deprecated
in favour of the
spain will now be a list of
length(rounds) containing a data frame
for each round. The
import_* family is concerned with downloading the
data and thus always returns a list containing data frames unless only
one round is specified, in which it returns a
tibble. Conversely, the
show_* family grabs information from the ESS website and always
To download all rounds for a country automatically you can use
ESS datasets flag missing values differently between questions.
For example, questions with possible answers ranging from
5 have missing categories
such as “Don’t know” and “Refusal” coded as
9. But for questions with possible answers ranging from
10 missing values are coded as
recode_missings accepts a
tibble as a main argument and automatically returns a new
with all missing values recoded as
NA. You should check out
?recode_missings for more details and elaborated examples.
Note: I urge the reader not to recode these categories to missing without previously investigating the importance of these categories.
For example, let’s recode missing values in all Spanish waves, bind them
into one single
tibble and visualize how satisfied are Spaniards with
their government. First, let’s extract a cleaner data.
semi_cleaned <- spain %>% map(recode_missings) %>% bind_rows() %>% mutate(name = str_sub(name, end = 4)) %>% select(name, stfgov) semi_cleaned ## # A tibble: 13,543 x 2 ## name stfgov ## <chr> <dbl> ## 1 ESS1 0. ## 2 ESS1 0. ## 3 ESS1 5. ## 4 ESS1 5. ## 5 ESS1 4. ## 6 ESS1 7. ## 7 ESS1 2. ## 8 ESS1 2. ## 9 ESS1 3. ## 10 ESS1 3. ## # ... with 13,533 more rows
There we go. The scale of
stfgov is between
10, where 0
means very unsatisfied with government and 10 very satisfied. Let’s
collapse that into smaller categories, calculate the percentage of
respondents within each category and visualize the change over time.
semi_cleaned %>% mutate(stfgov = case_when(stfgov <= 3 ~ "Low", between(stfgov, 4, 6) ~ "Mid", stfgov >= 7 ~ "High"), stfgov = factor(stfgov, levels = c("Low", "Mid", "High"))) %>% count(name, stfgov) %>% group_by(name) %>% mutate(perc = n / sum(n)) %>% ggplot(aes(name, perc, group = stfgov, colour = stfgov)) + geom_line() + theme_bw() + labs(x = "ESS rounds", y = "Satisfaction with Government (%)") + scale_colour_discrete(name = NULL)
Looks like Spaniards are increasingly unhappy with their government!
import_country, we can use other functions to download
rounds containing all countries. To see which rounds are currently
show_rounds() ##  1 2 3 4 5 6 7 8
show_rounds interactively looks up rounds
in the ESS website, so any future rounds will automatically be included.
To download selected rounds, you can use
selected_rounds <- import_rounds(1:7)
import_all_rounds to download all available rounds.
all_rounds <- import_all_rounds()
To build on the previous example, we can compare two different countries
on their satisfaction with governments. Let’s
map through each round,
select our columns of interest, filter for Spain and France, and bind
those data frames into one single tidy
semi_cleaned <- selected_rounds %>% map(~ select(.x, name, cntry, stfgov)) %>% map(~ filter(.x, cntry %in% c("ES","FR"))) %>% bind_rows() %>% mutate(name = str_sub(name, end = 4)) %>% recode_missings() semi_cleaned ## # A tibble: 26,524 x 3 ## name cntry stfgov ## <chr> <chr> <dbl> ## 1 ESS1 ES 0. ## 2 ESS1 ES 0. ## 3 ESS1 ES 5. ## 4 ESS1 ES 5. ## 5 ESS1 ES 4. ## 6 ESS1 ES 7. ## 7 ESS1 ES 2. ## 8 ESS1 ES 2. ## 9 ESS1 ES 3. ## 10 ESS1 ES 3. ## # ... with 26,514 more rows
Now that we’ve got that down, let’s visualize the change over time by calculating the percentage of respondents in each category for every year/country combination and visualize the results.
semi_cleaned %>% mutate(stfgov = case_when(stfgov <= 3 ~ "Low", between(stfgov, 4, 6) ~ "Mid", stfgov >= 7 ~ "High"), stfgov = factor(stfgov, levels = c("Low", "Mid", "High"))) %>% count(name, cntry, stfgov) %>% group_by(name, cntry) %>% mutate(perc = n / sum(n)) %>% ggplot(aes(name, perc, group = stfgov, colour = stfgov)) + geom_line() + theme_bw() + labs(x = "ESS rounds", y = "Satisfaction with Government (%)") + scale_colour_discrete(name = NULL) + facet_wrap(~ cntry)
Spain and France follow very similar patterns although the changes are much steeper in Spain! A more elaborate analysis would perhaps be interested in finding out if this trend line is similar in other countries.
To finish off, all
import_* functions have an equivalent
function that allows the user to save the datasets in a specified folder
For example, to save round two from Turkey in a folder called
./my_folder, we use:
download_country("Turkey", 2, output_dir = "./myfolder/")
By default it saves the data as
'stata' files. Alternatively you can
download_country("Turkey", 2, output_dir = "./myfolder/", format = 'sas')
This will save the data to
./myfolder/ESS_Turkey and inside that
folder there will be the
ESS2 folder that contains the data. The round
Be aware that for analyzing data from the ESS survey you should take into consideration the sampling and weights of each country/wave. The survey package provides very good support for this. A useful example comes from the work of Anthony Damico and Daniel Oberski here. This example calculated percentages manually and appropriate statistical inference should consider the above.
The package was improved greatly thanks to the reviews of Thomas Leeper and Nujcharee Haswell and editorial skills of Maëlle Salmon. I am indebted to their work. I would also like to thank Wiebke Weber at the Research and Expertise Centre for Survey Methodology for giving support and feedback in the development of the package.
The official package repository is at Github here.
If you find any bugs or would like to request a feature, please file an issue there. The package
is still very young and will most likely evolve in the near future. If you’re interested in contributing
to the development
essurvey, don’t hesitate to file a pull request which I’ll gladly review.
One important feature that is still missing is being able to download the associated weight
data for each country/wave. These files are called “SDDF” and can be found in the ESS website.
For example, the SDDF files for some countries in round 6 can be found here.