A Real-World Linked Open Data Story
We have been talking about the US Environmental Protection Agency’s forays into Linked Open Data for a while. Naturally, it takes a while for a government agency to adopt both a new technology and a new way of engaging with the public. We are getting closer to an official launch.
In the meantime, I spoke with several old semantic friends while at the Semantic Technology & Business Conference in New York City last week and showed them some interesting things we can do with EPA Linked Data. Specifically, I would ask them for a ZIP code and then show them facilities of potential interest to the EPA within that location. Roughly 1% of those facilities provide detailed reports of pollution annually. The EPA Linked Data contains information about the facilities, the reports and the chemical substances.
Elisa Kendall had the most interesting ZIP code and a specific request. She lives near a particular cement plant and has long suspected that it released mercury into the air. She wondered what information we might have on it. I searched for ZIP code 95014 on our prototype search form and then sorted the “Pollution Data?” column twice (the first time sorted descending, the second ascending) to bring the facilities with pollution reports to the top of the list. The entry for Hanson Permanente Cement appeared near the top of the list and Elisa confirmed that was the one she was looking for.
Within a few clicks, we could see that Hanson does in fact release mercury into the air stack. The levels seem to be significant to me, but I am neither an ecologist nor a biologist, so I’ll leave that to others to say. We were also able to see that the facility has released several other kinds of chemicals, including lead compounds, chromium compounds, dioxin compounds, nickel compounds, manganese compounds and hydrochloric acid.
The EPA is quick to point out that the companies that report pollution may be the “good guys”. The anecdotally suggest that they can only afford to track something like one one-thousandth of the pollution occurring in the United States. The system by which the companies provide information is voluntary and self-reported. However, that shouldn’t stop the public from making use of this information as it sees fit.
Another aspect to this is that the EPA has provided the raw data for the Toxics Release Inventory program for many years. Raw CSV files may be downloaded from the EPA and discovered via data.gov. The Linked Open Data versions of the data provides at least two advantages: The data is structured for reuse and tools like Callimachus make it easy to build applications on top of it.
Elisa ask me to plot the reported pollution levels over time, which I was quickly able to do. The SPARQL query looks like this:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX tri: <http://usepa.3roundstones.net/id/us/fed/agency/epa/tri/schema/>
PREFIX : <#>
SELECT DISTINCT ?year ?pounds
?tri_facility owl:sameAs <http://usepa.3roundstones.net/facilities/110000484039>
; tri:has_report ?report .
?report tri:reports_release_of ?chem
; tri:reporting_year ?year
; tri:released_to ?location .
?chem skos:prefLabel “$chemical” .
?location tri:amount_in_pounds ?pounds
; tri:environmental_medium <http://usepa.3roundstones.net/id/us/fed/agency/epa/tri/environmental_medium/AIR_STACK> .
} order by ?year
You might note the use of a non-standard construct in the query: “$chemical”. This query was saved as a Callimachus “named query” and the use of a variable within a quote (or a URI) allows the query to be called with parameters. In this case, the name of a chemical is appended to the query’s URL to get the results. The query definition is available here. Calling it like this allows it to return information about Lead compounds formatted in the JSON required for Google Charts:
This page shows the plots over time of all the reported pollution that particular facility put into the air stack over the reporting years. We currently have data through 2010. The chart for mercury looks like this:
What will Elisa do with this information? I don’t know. That’s the beauty of it. She had a pre-existing concern about a particular facility and a suspicion that they were putting mercury into the air. Now she has access to specific data about what they are really producing and how that has changed over time. She is closer to the truth. Perhaps she will be appalled at the results and seek changes in her community. Alternatively, she might research the acceptable levels of pollution and the science of exposure and determine that her concerns may be best applied elsewhere. I can’t judge at this time. Whichever way she proceeds she will do so with better knowledge of actual facts and less on rumor and suspicion. That must be a good thing.