This assignment is designed to test your ability to gather, aggregate, evaluate and describe
data. Additionally, this assignment tests your ability to differentiate between different forms of
data visualization. Please save your answer to these questions as one .pdf file (use the
save as function in most word processors). Be sure to include your name, the assignment
number, and a separate file with your data. Submit all files to Canvas by the due date. If you
are working in a group (3 people max), turn in only one assignment with all names on it.
Part I: Data Scavenger Hunt!
You assignment is to answer the following questions by finding the appropriate datasets online
and examining them. Note that some datasets do not correctly exist and must be compiled using
Outwit Hub (specifically, you will need to use Outwit Hub for the Ohio State Football question).
CIA Factbook. Gather data from the CIA World Factbook on one variable of interest on all
nations (such as GDP, population, etc.), and save it as a .csv file. Describe the data you
have gatheredwhy did you choose this variable? Where does the U.S. lie in comparison
to other states with regard to this variable? Save your data as cia.csv and submit with your
Policy Agendas Project. Visit the Policy Agenda Project website (https://www.comparativeagendas.
net) and find the data associated with policy change in the United States. Download the data
on TV News Policy Agenda. Make sure to also download and open the corresponding
Codebook. How many news stories are included in this dataset? What proportion of the
stories are about Education? What proportion are about Campaigns? Save your data
as tvnews.csv and submit with your assignment.
Cities in California. What is the largest city (defined as any place with a population of 6,000
or more) in California? The smallest? Find data on the population size of all cities in
California (Hint: visit city-data.com). Save this data as a .csv file (calicities.csv) and submit
with your assignment.
Ohio State Football. Go to http://www.sports-reference.com/cfb/schools/ohio-state/,
where you will find links to the details of every Ohio State football season since 1940. Using
Outwit Hub, collect data on three variablesyear, average points per game scored by
OSU and average points per game scored by OSUs opponentsfrom each years page.
Save the data you collect as a .csv file (osufootball.csv) and submit with your assignment
Part II: Exploratory Data Analysis
This is an exercise in exploratory data analysis. Dont worry too much about finding The Right
Answer, because there isnt one (or rather, there is more than one.) Rather, focus on the process
of exploring the data.
1. In his bestselling book The Better Angels of Our Nature, psychologist Steven Pinker argues that
human empathy has been steadily growing over the last two centuries. As a result, Pinker
argues, we have seen decreases in all forms of violent human behavior, from warfare to
crime to interpersonal violence to corporal punishment for children. It occurs to you that,
if Pinker is correct, we might see changes in literature as well: less violent people should use
less violent language and be less inclined to write about violent situationsor perhaps to
write about them more as they become more objectionable. First, draw up a list of words
or phrases that you would expect to see either increasing or decreasing in frequency over
time and note your expectation for each. Then use Googles ngram viewer to see whether
those expectations are met. All in all, does our use of language over the past two centuries
support Pinkers thesis? Submit your list of words and phrases, screenshots of your graphs,
and a one-paragraph summary of your conclusions.
2. Use Googles Public Data Explorer to find data on the 2014 ebola outbreak in Africa
(Ebola data). Which countries were most severely hit? Which had dangerous outbreaks
that they managed to contain? How can you tell? Explain in one paragraph and use the
link icon in the upper-right corner of the screen to generate hyperlinks to any relevant
graphs. Still using Public Data Explorer, look at regional data within one of the worsthit
countries, both over time and across space (on a map). Can you ascertain where the
outbreak started and how it spread? Describe the spread of the disease in 1-3 paragraphs,
including links to any relevant graphs.
3. Now think about what might account for differing murder rates across countries. Go
to Gapminder World, select the Y (vertical) axis, and select Murder per 100,000 people
under the Society category. What do you think would correlate with murder rate, internationally?
Scan the available data on the X axis and try out a few different variables.
Explore the relationships between those variables and murder rates over time. Which
relationship looks most interesting to you? Why do you think you might be seeing those
patterns in the data? Use the Share Graph button to create a short URL to your graph
and write 2-3 paragraphs describing the patterns you see in the data and speculating about what might have produced them.
Part III: Women and Conflict
You have been contacted by an organization that wants to understand the impact of subnational
and international conflict on the career prospects of women. They want you to compare educational
outcomes for women in high-, medium-, and low-income countries to those in countries
that are experiencing conflict.
Go to the World Bank databank. Find data on the percentage of women who complete
primary education in high-income (OECD), medium-income, and low-income countries as well
as for those in fragile and conflict-affected situations (Hint: The variable on women education
completion rates is under Gender Statistics). Create a single line chart with survival rate to
the last year of primary education on the Y axis, year on the X axis, and four lines, one for each
category of country. Make the graph as attractive as possible, then save it as a JPEG or PDF.
Part IV: Downloading R and RStudio
Next week, we will begin using R; this is the primary program we will use for the remainder of
the semester. If you plan on using the computers in the lab in Hagerty, then there is no need
to do this part of the assignment. However, if you plan to use your personal computer to do
assignments, you should go ahead and download and install R and RStudio on your computer.
Both are free, and easy to access.
In order to do this, you need to first download R: https://cran.r-project.org/. In the
online lectures, Dr. Braumoeller uses R by itself. You are of course welcome to do the same;
however, I recommend using RStudio. RStudio is an interface that runs R, but is slightly more
user friendly. If you choose to use RStudio, you will never actually open R (though you still need
to install it first); you will simply access it through the RStudio program.
To download RStudio: https://www.rstudio.com/products/rstudio/download/