Data will be spreadsheets on contaminant concentrations and companion descriptior variables (location, location descriptors, timestamps, etc.). Data will be saved in .csv files. Data will be sourced from existing datasets online and processed in R, with collaboration through GitHub.
Metadata will be saved in a corresponding .txt file (eg EUSO2.csv has a companion EUSO2.txt metadata file.) This file will describe the data source, variables and units, date of download, of the data in the .csv file.
All data will be readily available through GitHub @pollute. Requests for forking will enable access to the data and associated processing code. The R Programming Software will be nessisary to view and process the data in this manner. Forseeable users include instructors and other students in the Environmental Informatics course.
Only existing data will be used, so no delay will be present in data availability. Our processing of the data will be live on GitHub for anyone to fork. No limitations will be placed on the data in terms of privacy.
Data, metadata, and processing code will be preserved on the @pollute GitHub repository at least until the end of the Winter 2016 quarter, and possibly for longer.
Here we examine the question… How many monitoring stations in each city in Great Britain and Germany have air pollution >= 10 micrograms/meter cubed?
City | Stations Exceeding 10 ug/m3 | Mean of Exceeding Stations (ug/m3) |
---|---|---|
Barnsley | 1 | 17.70 |
London | 3 | 13.13 |
Leeds | 1 | 12.60 |
City | Stations Exceeding 10 ug/m3 | Mean of Exceeding Stations (ug/m3) |
---|---|---|
Dresden | 1 | 17.6 |
G rlitz | 1 | 17.1 |
Hamburg | 3 | 16.2 |
Ingolstadt | 1 | 15.4 |
Duisburg | 1 | 13.8 |
Bremen | 1 | 12.1 |
Essen | 1 | 11.2 |
Augsburg | 1 | 11.1 |
M nchen | 1 | 11.0 |
Frankfurt Oder | 1 | 10.4 |
N rnberg | 1 | 10.3 |
NT suggests answering the “what is the difference” question more directly with a complimenting direct ‘urban average’ vs ‘suburban average’ bargraph?
We are currently developing two functions for processing EU pollution data.
NumCities: Find the number of cities who exceed a given pollution level for a given country, pollutant and threshold pollution level.
#' SO2 air pollution in EU cities (2013)
#'
#' This function summarizes the number of of cities that have air pollution levels
#' in micrograms/cubic meter above a certain level.
#'
#' @author Mitchell, Steph, Elise
#' @param COUNTRY Country of interest, use code (2 letters) chosen from EUSO2 dataset
#' @param LEVEL Concentration of SO2 pollution (ug/m^3) as threshold level
#' @param POLLUTION is the pollution type (SO2,PM10, PM2.5,NO2,O3)
#' @return OUTPUT Number of cities within the country exceeding threshold level
#' @examples
#' COUNTRY = 'DE'
#' LEVEL = 10
#' POLLUTIOn='SO2'
#' NumCities(COUNTRY, LEVEL,POLLUTION)
#' @export
NumCities=function(COUNTRY='AT',LEVEL='6',POLLUTION='SO2'){
if(LEVEL<0){return("Pollution Levels Cannot be Negative!")}
#select the desired polluiton type
if(POLLUTION=='SO2'){SHEET=1}
if(POLLUTION=='PM10'){SHEET=2}
if(POLLUTION=='PM2.5'){SHEET=3}
if(POLLUTION=='NO2'){SHEET=4}
if(POLLUTION=='O3'){SHEET=5}
library(readr)
library(dplyr)
library(stringr)
library(readxl)
EUSO2=read_excel('data/EUSO22013.xlsx', sheet=SHEET)
OUTPUT=EUSO2 %T>%
select(country_iso_code, city_name, µg_m3) %>%
filter(µg_m3 > LEVEL) %>%
filter(country_iso_code == COUNTRY) %>%
group_by(city_name) %>%
summarize(n = n(), mean = mean(µg_m3)) %>%
arrange(desc(mean));
return(as.data.frame(OUTPUT))
}
library(testthat)
#Automated Test 1: If all ug_m3 observations < LEVEL, then output should be 0.
expect_that(str_length(NumCities('AT',50,'SO2')$n), equals(integer(0)))
#Automated Test 2: If all ug_m3 observations > LEVEL, then output should be all cities.
expect_that(length(NumCities('AT',0,'SO2')$n), equals(6))
#Automated Test 3: Should have 6 values for AT no matter pollution type.
expect_that(length(NumCities('AT',0,'SO2')$n), equals(6))
expect_that(length(NumCities('AT',0,'PM10')$n), equals(6))
expect_that(length(NumCities('AT',0,'PM2.5')$n), equals(6))
expect_that(length(NumCities('AT',0,'NO2')$n), equals(6))
expect_that(length(NumCities('AT',0,'O3')$n), equals(6))
LvlCities: Summarize the average of each of the 5 pollutants for a given country.
#' (Still needs tests)
#' SO2 air pollution in EU cities (2013)
#'
#' This function summarizes the mean of multiple pollutantss for a given city
#'
#' @author Mitchell, Steph, Elise
#' @param COUNTRY Country of interest, use code (2 letters) chosen from EUSO2 dataset
#' @return OUTPUT Number of cities within the country exceeding threshold level
#' @examples
#' COUNTRY = 'DE'
#' LvlCities(COUNTRY)
#' @export
LvlCities=function(COUNTRY='AT'){
library(readr)
library(dplyr)
library(stringr)
library(readxl)
OUTPUT=vector(length=5)
for(n in c(1, 2, 3, 4, 5)) {
EUSO2=read_excel('data/EUSO22013.xlsx', sheet=n)
OUTPUT[n]=EUSO2 %>%
filter(country_iso_code == COUNTRY) %>%
select(µg_m3) %>%
summarize(mean(µg_m3));
}
names(OUTPUT) = c('SO2', 'PM10', 'PM2.5', 'NO2', 'O3')
return(OUTPUT)
}
# library(testthat)