Types of Data

Data will be spreadsheets on contaminant concentrations and companion descriptior variables (location, location descriptors, timestamps, etc.). Data will be saved in .csv files. Data will be sourced from existing datasets online and processed in R, with collaboration through GitHub.

Data and Metadata Standards

Metadata will be saved in a corresponding .txt file (eg EUSO2.csv has a companion EUSO2.txt metadata file.) This file will describe the data source, variables and units, date of download, of the data in the .csv file.

Policies for Access and Sharing

All data will be readily available through GitHub @pollute. Requests for forking will enable access to the data and associated processing code. The R Programming Software will be nessisary to view and process the data in this manner. Forseeable users include instructors and other students in the Environmental Informatics course.

Policies and Provisions for Re-Use, Re-Distribution

Only existing data will be used, so no delay will be present in data availability. Our processing of the data will be live on GitHub for anyone to fork. No limitations will be placed on the data in terms of privacy.

Plans for Archiving and Preservation of Access

Data, metadata, and processing code will be preserved on the @pollute GitHub repository at least until the end of the Winter 2016 quarter, and possibly for longer.

Data Question

Here we examine the question… How many monitoring stations in each city in Great Britain and Germany have air pollution >= 10 micrograms/meter cubed?

Cities in Great Britain that have pollution >= 10 micrograms/meter cubed
City Stations Exceeding 10 ug/m3 Mean of Exceeding Stations (ug/m3)
Barnsley 1 17.70
London 3 13.13
Leeds 1 12.60
Cities in Germany that have pollution >= 10 micrograms/meter cubed
City Stations Exceeding 10 ug/m3 Mean of Exceeding Stations (ug/m3)
Dresden 1 17.6
G rlitz 1 17.1
Hamburg 3 16.2
Ingolstadt 1 15.4
Duisburg 1 13.8
Bremen 1 12.1
Essen 1 11.2
Augsburg 1 11.1
M nchen 1 11.0
Frankfurt Oder 1 10.4
N rnberg 1 10.3





NT suggests answering the “what is the difference” question more directly with a complimenting direct ‘urban average’ vs ‘suburban average’ bargraph?





R Package Project

We are currently developing two functions for processing EU pollution data.

NumCities: Find the number of cities who exceed a given pollution level for a given country, pollutant and threshold pollution level.

#' SO2 air pollution in EU cities (2013)
#'
#' This function summarizes the number of of cities that have air pollution levels
#'  in micrograms/cubic meter above a certain level.
#'
#' @author Mitchell, Steph, Elise
#' @param COUNTRY Country of interest, use code (2 letters) chosen from EUSO2 dataset
#' @param LEVEL Concentration of SO2 pollution (ug/m^3) as threshold level
#' @param POLLUTION is the pollution type (SO2,PM10, PM2.5,NO2,O3)
#' @return OUTPUT Number of cities within the country exceeding threshold level
#' @examples
#' COUNTRY = 'DE'
#' LEVEL = 10
#' POLLUTIOn='SO2'
#' NumCities(COUNTRY, LEVEL,POLLUTION)
#' @export

NumCities=function(COUNTRY='AT',LEVEL='6',POLLUTION='SO2'){

  if(LEVEL<0){return("Pollution Levels Cannot be Negative!")}

  #select the desired polluiton type
  if(POLLUTION=='SO2'){SHEET=1}
  if(POLLUTION=='PM10'){SHEET=2}
  if(POLLUTION=='PM2.5'){SHEET=3}
  if(POLLUTION=='NO2'){SHEET=4}
  if(POLLUTION=='O3'){SHEET=5}

  library(readr)
  library(dplyr)
  library(stringr)
  library(readxl)

 EUSO2=read_excel('data/EUSO22013.xlsx', sheet=SHEET)

  OUTPUT=EUSO2 %T>%
    select(country_iso_code, city_name, µg_m3) %>%
    filter(µg_m3 > LEVEL) %>%
    filter(country_iso_code == COUNTRY) %>%
    group_by(city_name)  %>%
    summarize(n = n(), mean = mean(µg_m3)) %>%
    arrange(desc(mean));
  return(as.data.frame(OUTPUT))
  
}

library(testthat)

#Automated Test 1: If all ug_m3 observations < LEVEL, then output should be 0.
expect_that(str_length(NumCities('AT',50,'SO2')$n), equals(integer(0)))

#Automated Test 2: If all ug_m3 observations > LEVEL, then output should be all cities.
expect_that(length(NumCities('AT',0,'SO2')$n), equals(6))

#Automated Test 3: Should have 6 values for AT no matter pollution type.
expect_that(length(NumCities('AT',0,'SO2')$n), equals(6))
expect_that(length(NumCities('AT',0,'PM10')$n), equals(6))
expect_that(length(NumCities('AT',0,'PM2.5')$n), equals(6))
expect_that(length(NumCities('AT',0,'NO2')$n), equals(6))
expect_that(length(NumCities('AT',0,'O3')$n), equals(6))

LvlCities: Summarize the average of each of the 5 pollutants for a given country.

#' (Still needs tests)
#' SO2 air pollution in EU cities (2013)
#'
#' This function summarizes the mean of multiple pollutantss for a given city 
#'
#' @author Mitchell, Steph, Elise
#' @param COUNTRY Country of interest, use code (2 letters) chosen from EUSO2 dataset
#' @return OUTPUT Number of cities within the country exceeding threshold level
#' @examples
#' COUNTRY = 'DE'
#' LvlCities(COUNTRY)
#' @export

LvlCities=function(COUNTRY='AT'){

  library(readr)
  library(dplyr)
  library(stringr)
  library(readxl)
  
  OUTPUT=vector(length=5)
  
  for(n in c(1, 2, 3, 4, 5)) {
    
    EUSO2=read_excel('data/EUSO22013.xlsx', sheet=n)
    
    OUTPUT[n]=EUSO2 %>%
      filter(country_iso_code == COUNTRY) %>%
      select(µg_m3) %>%
      summarize(mean(µg_m3));
  }
  
  names(OUTPUT) = c('SO2', 'PM10', 'PM2.5', 'NO2', 'O3')
  return(OUTPUT)
  
}

# library(testthat)