...

Design and Development of Linked Data from The National Map

by user

on
0

views

Report

Comments

Transcript

Design and Development of Linked Data from The National Map
Design and Development of Linked Data
from The National Map
Editor (s): Krzysztof Janowicz, University of California, Santa Barbara, USA
Solicited r eview(s): Carsten Keßler, University of Münster, Germany; Rainer Simon, Austrian Institute of Technology, Vienna, Austria;
Claus Stadler, University of Leipzig, Germany
E. Lynn Userya* and Dalia Varankaa
a
U.S. Geological Survey,1400 Independence Road, Rolla, MO, USA
Abstract. The development of linked data on the World-Wide Web provides the opportunity for the U.S. Geological Survey
(USGS) to supply its extensive volumes of geospatial data, information, and knowledge in a machine interpretable form and
reach users and applications that heretofore have been unavailable. To pilot a process to take advantage of this opportunity, the
USGS is developing an ontology for The National Map and converting selected data from nine research test areas to a
Semantic Web format to support machine processing and linked data access. In a case study, the USGS has developed initial
methods for legacy vector and raster formatted geometry, attributes, and spatial relationships to be accessed in a linked data
environment maintaining the capability to generate graphic or image output from semantic queries. The description of an initial
USGS approach to developing ontology, linked data, and initial query capability from The National Map databases is
presented.
Keywords: Geospatial semantics, topographic data, The National Map, SPARQL Endpoint, geographic features
1. Introduction
The USGS is a primary supplier of geospatial and
environmental datasets that are used extensively in
mapping, planning, resource and land management,
emergency response, and many other applications. A
sampling of these public domain data is presented in
Table 1 with URLs for access. Use of these data
often requires combining one or more of these
datasets or combining these data with user-generated
data. Since the data exist in many different formats,
some proprietary, the integration or conflation of the
data for use in a specific application requires
significant data processing and manipulation by the
user. The National Map (Figure 1), which is the 21st
*
century topographic map for the USGS, is viewed as
a primary basis for these integration processes.
The Semantic Web offers an alternative approach
to data formatting, access, and integration for use in
applications [55]. By use of the standard triple
model of the Resource Description Framework
(RDF) of the Semantic Web [54], applications are
able to link to other data and to use and share data
effectively to answer queries and support specific
applications [20]. The USGS has begun exploring
the potential of the Semantic Web, particularly for
geospatial data access, integration, synthesis, and
use in applications. This paper provides a case study
description of that initial exploration with the
following three primary objectives:
Corresponding author. E-mail: [email protected]
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Table 1
Sample Datasets Managed by the U.S. Geological Survey
Dataset
National Hydrography
Dataset (NHD)
National Transportation
Dataset
National Boundaries
Dataset
National Structures
Dataset
Geographic Names
Information System
(GNIS)
National Elevation
Dataset (NED)
National Digital
Orthophotos
National Land Cover
Dataset (NLCD)
Global Land Cover
Dataset
LiDAR
Satellite images
Hazards
(Earthquakes,
Volcanoes)
Minerals
Energy
Landscapes and Coasts
Astrogeology
Geologic Map Database
Geologic Data
Digital Data Series
National Water
Information System
Floods and High Flow
Drought
Monthly Stream Flow
Ground Water
Water Quality
National Biological
Information Infrastructure (NBII)
Vegetation
Characterization
Wildlife
Invasive Species
Geometry/
Format
Vector
Attribution/
Scaling
Discrete/nominal
Vector; tables
Discrete/nominal
Vector
Discrete/nominal
http://viewer.nationalmap.gov/viewer/
http://gisdata.usgs.net/website/MRLC/viewer.htm
http://viewer.nationalmap.gov/viewer/
Vector
Discrete/nominal
http://viewer.nationalmap.gov/viewer/
Vector
Discrete/nominal
http://geonames.usgs.gov/domestic/download_data.htm
Raster
Continuous/ratio
Raster
Continuous/
interval
Raster
Discrete/nominal
Raster
Discrete/nominal
http://viewer.nationalmap.gov/viewer/
http://seamless.usgs.gov/website/seamless/viewer.htm
http://www.ndop.gov/data.html
http://viewer.nationalmap.gov/viewer/
http://gisdata.usgs.net/website/MRLC/viewer.htm
http://viewer.nationalmap.gov/viewer/
http://gisdata.usgs.net/website/MRLC/viewer.htm
http://landcover.usgs.gov/landcoverdata.php
Point
Raster
Graphics
Continuous/ratio
Continuous/
interval
Multiple forms
http://viewer.nationalmap.gov/viewer/
http://edcsns17.cr.usgs.gov/NewEarthExplorer/
http://glovis.usgs.gov/
http://earthquake.usgs.gov/hazards/
http://volcanoes.usgs.gov/activity/status.php
Vector; text
Discrete/nominal
Vector; graphics
databases;
Reports
Databases
Vector; maps; text
Maps; tables
Multiple forms
http://mrdata.usgs.gov/; http://tin.er.usgs.gov/mrds/
http://tin.er.usgs.gov/geochem/
http://crustal.usgs.gov/geophysics/index.html
http://energy.usgs.gov/search.html
Discrete/nominal
Discrete/nominal
Discrete/nominal
Discrete/nominal
http://geochange.er.usgs.gov/info/holdings.html
http://astrogeology.usgs.gov/DataAndInformation/
http://ngmdb.usgs.gov/
http://pubs.usgs.gov/dds/dds-060/
Graphics; charts;
tables
Graphics; charts;
tables
Graphics; tables
Graphics; tables
Vector; tables;
graphics;
Graphics
Continuous/ratio
http://wdr.water.usgs.gov/nwisgmap/
Continuous/ratio
http://waterwatch.usgs.gov/new/index.php?id=ww
Continuous/ratio
Continuous/ratio
Continuous/ratio
URL
http://viewer.nationalmap.gov/viewer/nhd.html?p=nhd
Graphics; vector;
geodatabases
Multiple forms
http://waterwatch.usgs.gov/new/index.php?id=ww
http://waterwatch.usgs.gov/new/index.php?id=ww
http://waterdata.usgs.gov/nwis/gw
http://groundwaterwatch.usgs.gov/
http://waterdata.usgs.gov/nwis/qw
http://waterwatch.usgs.gov/wqwatch/
http://www.nbii.gov/portal/server.pt/community/nbii_home/236
Vector; text ;
graphics;
databases; photos;
Vector; text;
graphics;
images; video
Vector; reports;
databases;
graphics, image
Multiple forms
http://biology.usgs.gov/npsveg/
Multiple forms
http://www.nwhc.usgs.gov/
Multiple forms
http://www.nbii.gov/portal/server.pt/community/invasive_species/221
Continuous/ratio
2. Previous Research
Fig. 1. Nationwide data-layers of The National Map.
− To present a USGS approach to building
semantics for topographic geospatial data
through the use of a taxonomy, ontology,
relations (particularly spatial), and data
formatting for semantic access, query, and
retrieval including geometry,
− To show an initial conversion of data to RDF to
provide interaction with the potential semantic
user community, and
− To provide an approach for connecting
semantics with the geometry of both vector
objects and raster pixels that allows generation
of graphic output in the form of maps or images
as the result of queries based on semantics.
The remainder of the paper is organized as
follows. Section 2 provides an overview of previous
research focused on conversion of topographic data
to the Semantic Web. Section 3 introduces the
ontology for The National Map and describes the
general approach to building semantics for USGS
geospatial data. Section 4 describes an initial
conversion of geospatial data for point and vector
objects to RDF and an approach for raster data
conversion to RDF. Section 5 describes a process
for connecting the semantics and geometry and
provides a method to access, download, and query
the converted data with SPARQL Protocol and RDF
Query Language (SPARQL) with a sample result.
Section 6 presents conclusions based on the current
work and directions for future research.
A sample of the anticipated problems to be
addressed by a USGS semantic approach is rooted in
the broader geographic information science research
agenda [52]. Specific solutions to challenges of
establishing spatial semantics, designing ontology,
and converting existing and new data sources to
triples build on research findings reported in
geospatial, ontological, and semantic literature.
Examples of existing research in these areas are
briefly documented below.
Topographic data are a subset of geospatial data.
The national mapping agency of Great Britain, the
Ordnance Survey (OS), has published ontologies
and a number of research papers on various aspects
of relating topography and geography to geospatial
semantic technology. Some of these topics are the
extraction of RDF data and OWL files from
relational databases, conceptual ontology, and
reasoning software [30,11]. In the context of
science-driven national mapping agencies, similar to
the USGS, Broderic [3] developed a framework for
geographical categorization that integrates the range
of topographical feature categories with the
foundational, upper-level ontology DOLCE [26] and
aligned with the OntoClean analytical method
[53,17] see also [21]. Semantic richness is created
by category criteria based on such characteristics as
feature qualities, processes, roles, and relations.
An important approach in ontology design stems
from temporal, activity, or event-based geographical
representation [38]. These ontologies are presented
as aligned with geographic theory of humanenvironmental
interactions.
Ontological
representations are in part based on the forces and
motivations driving events and actions in space, and
themselves are influenced by intentions that impact
the design of the semantic information and
representation [6,9]. Though these intentional
aspects of ontology development are an influence on
topographical semantics represented by The
National Map, the ontology approach applied in this
research is based on natural language discourse of
topographical features.
Semantic interoperability is a broad field of
research for purposes of linking data across a
semantic network. Spatial reference systems were
conceptualized to provide a framework for
connecting data [22,18]. Crucial aspects of data
integration require the ontology of content data
characteristics, such as the data resolution affecting
geographical feature detail, data sources and
uncertainty, or data maintenance [10].
Technical formalizations have emerged that are
centered on linked geospatial or geoinformatics data
[1,29]. The GeoVocab group defined a vocabulary
for geometric coordinates and spatial object relation
properties [36]. Though informal, GeoVoCamps
have produced vocabulary developments for scales,
complex geometries, metadata, and temporal change
[19] and [31].
3. Ontology and Semantics Development for The
National Map
The ontology development combines a top-down
approach based on the organization of general
categories taken from standard feature classes and
bottom-up approaches shaped by legacy data
models. Some categories, such as transportation,
which is not feature-based, require more work to
align the conceptual and database models than
others that are feature-based, such as NHD. The
vocabulary of topographic features, to be
represented as triple subjects and/or objects, was
developed from standard feature list sources derived
from more than a century of topographic feature data
collection [41,42]. The semantic commitments of
these feature lists were discussed and debated with
time in a centralized way within the USGS, with
input from a wide range of user communities
[37,40]. Feature terms were reviewed for currency
and relevance to the geographical areas within the
domestic United States, so that terms such as
“demilitarized zone” were edited from the list.
Features that have become common since the
development of the standards, such as ‘windfarm,’
were reviewed as new vocabulary without the full
development and review of a new standard. Features
are classified into six taxonomic modules; terrain,
surface water, ecological regime, structures,
divisions, and events [49]. These reflect topographic
science modeling needs and closely resemble the
geographic information system (GIS) thematic layers
of The National Map. The classification was guided
with regard to regional context, feature morphology
as natural or engineered structures, and descriptive
attributes, such as shape and texture (fluid vs.
frozen), in accordance with empirical experience
and scientific concepts. The digital files form a
vocabulary in OWL format, and consist of feature
type classes under the taxonomic module domain.
Each class has a URI, a definition, the definition
source from on-line documentation, and an initial
logical axiom list. The hierarchy is flat [44]. The
URIs will be released to the public in the near
future.
The actual implementation of conceptual systems
from legacy data models is complicated by the
individually created data layers contributed by
partners. For example, The National Map includes
data from the U.S. Census Bureau, a Federal partner.
Thematic integration of The National Map data
layers occurs to support graphic map production of
the
U.S.
Topo
product
(http://nationalmap.gov/ustopo/). Data layers, such
as the National Hydrography Dataset (NHD), closely
resemble the USGS ontology because the NHD data
model defines features [43,48]. Other layers, such as
transportation, are poorly matched to the conceptual
ontology because they were not developed under
feature-based system guidelines.
The legacy semantics extracted from standards
lists that were originally developed for topographic
mapping and digital data are simplistic compared to
the semantic richness potentially available through
the geospatial semantic web [12,50]. Engineering
semantic topographic data allows complexity and
decomposition that was difficult to produce in layerbased systems. The representation of topography
combines natural and built-up (human-constructed)
features in complex assemblages. Complex features
require spatial relations among their basic
components, such as the relation between an airport
runway and control tower, but together build the
complex feature identity. Spatial relations are often
considered to form the predicate between
semantically distinct feature subjects and objects of
triples, but topographic features and their relations
together form the semantics of complex features.
Complex features are particularly common in the
largest group of topographic features in the USGS
vocabulary, built-up structures. In these cases, the
base vocabulary allows relating simple classes into
complexes for ontology design patterns (ODP)
[15,16]. ODP have spatial relations that are essential
to feature meaning, but a greater variety of spatial
relations can be applied between distinct features
when ODP are reused as specific instances.
In addition to quantitative spatial relations of
location, such as coordinate pairs or geometric
distances between features, spatial relation terms for
the ontology development are also drawn from a set
of Open Geospatial Consortium (OGC) standards for
topological relations, mereological models, and
verb/preposition pairs identified from the
topographic feature type standards [27]. Samples of
USGS topographic data reside in a triple store
enabling topological reasoning according to the
OGC GeoSPARQL standard [32].
Topographic features may specifically include
spatial relations within the scope of the feature class
meaning, although the relation term may vary. For
example, a tributary is a body of water that flows
into a larger stream, or in the science vocabulary,
‘drains’ into another stream. In such cases, the
appropriate spatial relation can be modeled with
mereo-topological relations, such as ‘part’ or as a
network ‘connects,’ or with logical concepts, such as
the
Web
Ontology
Language
(OWL)
FunctionalProperty relation [8,39]. The logical
axioms to be applied to the topographic triples are
the W3C standards and functionalities offered by
specific reasoning software platforms.
To capture spatial relations that support semantic
identity, predicates in the form of verb/preposition
pairs are presently (2011) being researched [7] in
which preposition semantics reflect geometric
cognition. Several categories of relations were
found, including descriptive terms, such as aligned,
depth, sloped, or narrowing; geometric terms, e.g.,
angled, confluent, curved, or extend; generative
(process) terms, such as eroded, forced, suspended,
and swing; and terms of intentionality, including
established, determined, designated, and defined.
4. Initial Data Conversion Approach
The USGS approach to using the Semantic Web
is to convert specific datasets from The National
Map to RDF and make these data available for
download and/or direct query in the RDF format. As
a pilot project, the USGS selected nine test areas
based on specific geographic characteristics,
extracted all data of the eight layers of The National
Map for these areas, and converted the vector and
point data to the Geography Markup Language
(GML) based on the OGC standard [33]. The nine
research test areas include six watershed sub-basin
areas defined from the NHD that reflect differing
combinations of physiography and climate (Figure
2). In addition to the watershed areas, the sites
include three urban areas, Atlanta, Georgia; St.
Louis, Missouri; and New Haven, Connecticut,
included as an urban coastal site. Each of these test
areas includes the eight standard layers of The
National Map, land cover, structures, boundaries,
hydrography, geographic names, transportation,
elevation, and orthoimagery (see Figure 1).
To make USGS data available to the Semantic
Web and the Linked Open Data Community, the
USGS converted data for the nine research test areas
to RDF and GML. Conversion of the sample site
datasets to RDF has followed the general approach
of defining the subject, predicate, and object of RDF
as the feature identifier, feature name or other
attribute or relation, and feature instance or object of
the relation, respectively. A requirement is the
capability to pose SPARQL queries from which
results can be graphically displayed on map. Thus,
the coordinates must be associated with the RDF
resource. This association is done through GML and
allows access and use by any traditional program
that can process GML. A SPARQL query of the
RDF data can retrieve the needed result and the final
output can be used to generate a map from the GML
coordinate store as needed. All GML entities and
operations used in the data conversion and semantic
queries follow the OGC standard for GML [28].
In the initial conversion the native format (usually
ArcGIS GeoDatabase, [14]) data were converted to
GML with each entity possessing a unique identifier.
The eight standard topological relations defined by
OGC were precomputed from the GML (see Figure
3 for an example). The feature data were converted
from GML to RDF triples maintaining identifiers
from the GML.
The required conversion processes and structure
of the resulting data with access to the original
geometry are different based on the original
geometry of the geographic data sources. The
following discussion is separated into point, vector,
and raster to describe the different processes
required for conversion. The structures and
geographic names layers use point objects as the
geometric base of the data elements. The boundary,
hydrography, and transportation layers use vector
geometry with point, line, and area objects as the
basic data elements. The land cover, elevation, and
orthoimage layers use raster geometry with pixels or
cells as the basic geometric unit. Objects in the
Fig. 2. Location of USGS research datasets for developing ontology and semantics for The National Map.
raster layers must be defined and referenced over the
cell geometry for access and manipulation.
4.1. Point Data
The point datasets for The National Map include
geographic names and structures. Whereas structures
data in The National Map will eventually be
generated using the polygonal boundary for the
structure outline, currently available data use a
single point at the proximate center of the building
or other structure. Thus, at present structures are
converted to RDF using a point geometry model.
The basic conversion for the point data proceeded
as follows. Point data for The National Map are
stored in Esri geodatabase or shape file formats [14].
These files are used to create GML documents to
store the geometric data. The output of the
conversion process is written to an N3 document [2].
Complete description of this process including
conversion from geodatabase, personal geodatabase,
and shape files to GML and to N3 is presented in
[4].
Each point feature in the Geographic Names
Information System (GNIS) is formatted as a name
associated with a location. The conversion of this
format to RDF triples uses the simple convention
that the feature identifier is the object in the RDF
triple (Figure 3). Figure 3 also presents the result in
GML including the coordinates for the structure
location.
4.2. Vector Data
The conversion of vector formatted geospatial
data for hydrography, transportation, boundaries,
and structures (Table 2) for the test sites to the
linked data format of the Semantic Web proceeded
Query Text
PREFIX struct: <http://cegis.usgs.gov/rdf/struct#>
PREFIX gt: <http://cegis.usgs.gov/rdf/geometry#>
PREFIX structfid: < http://cegis.usgs.gov/rdf/struct/featureID#>
PREFIX transfid: < http://cegis.usgs.gov/rdf/trans/featureID#>
Select ?name ?gml where
{
structfid:_CT001425 struct:name ?name
structfid:_CT001425 gt:gml ?gml
}
Fig. 3. Query text for a structure feature from RDF data of a sample from The National Map. The resulting GML from the query is shown in
the bottom of the figure.
Table 2
Count and volume for converted triples
Dataset
Hydrography
Transportation
Boundaries
Structures
Triple Count
20,000.000
25,000,000
52,000
388,000
File Size
2.7 Gb
2.4 Gb
189 Mb
37 Mb
with the following general approach. The subject,
predicate, object format of RDF for the semantic
web was constructed from the entities as defined in
formats of The National Map. For example, for a
stream in the NHD of The National Map, flowline is
the primary feature of the stream reach that provides
connections of the hydrographic network. The
subject is the feature identifier, in the case of a
Flowline, it is the reach code as defined in NHD
(fid: 77127453 in Figure 4). The predicate is the
particular property of the flowline being modeled in
the
triple,
its
length,
for
example
http://cegis.usgs.gov/rdf/geometry#length.
There are 17 objects and depend on the predicate.
For example, the object of the predicate
geometry#length is a literal number; the object of
geometry#intersects is another flowline. The object
of geometry#gml are the coordinates of the flowline.
Figure 4 shows a query and the detailed set of
flowline characteristics that are the distinct
properties or predicates of the flowline. Each subject
(reach code identifier) has many distinct predicates
Query Text
PREFIXqgis: <http://cegis.usgs.gov/rdf/geometry#>
PREFIXfid: <http://cegis.usgs.gov/rdf/nhd/featureID#>
PREFIXroad: <http://cegis.usgs.gov/rdf/trans/featureID#>
PREFIXnhd: <http://cegis.usgs.gov/rdf/nhd#>
PREFIXtrans:<http://cegis.usgs.gov/rdf/trans#>
selectdistinct?predicatewhere{
fid:_77127453?predicate[]
}
Result at:
http://131.151.2.169:8890/sparql?default-graphuri=&query=PREFIX+fid%3A+++%3Chttp%3A%2F%2Fcegis.usgs.gov%2Frdf%2Fnhd%2FfeatureID%23%3E%0D%0A%0D%0Aselect+distinct+%3
Fpredicate+where+{%0D%0Afid%3A_77127453+%3Fpredicate+[]%0D%0A%0D%0A}%0D%0A&format=text%2Fhtml
Fig. 4. Query for stream flowline from RDF data. The results are the predicates of the flowline for a sample from the National Hydrography
Dataset of The National Map.
and objects associated with it to capture the stream
characteristics. As with the point data, the geometry
of the flowline is represented by coordinates stored
in GML.
4.3. Raster Data
Query and access to raster data on the Semantic
Web poses unique problems since geographic
features to be represented as ontological objects are
not defined in the structure of the data, which is a
grid of pixel values or digital numbers. Traditional
processing of raster data has treated the entire raster
grid as a coverage, as in Web Coverage Services, or
provided procedures to extract vector objects from
the raster matrix. Unless each pixel in a raster data
matrix is treated as a separate entity in an ontology,
definition of geographic features or ontological
objects over the raster grid is required. Although a
significant literature exists on image segmentation
and object extraction from raster image data (see
standard texts on remote sensing and image
processing, such as [24]), there has been little work
on ontology and semantics with raster geometry. In
general, the approach to this problem is first to
develop vector objects from image segmentation
then use existing methods for building ontology and
semantics for the vector objects. However, for
relational data, [25] proposed methods to extend the
Geographic Structured Query language (GSQL) to
support raster data. By defining specific abstract
data types (ADTs), such as Pixel, Raster Region,
and RasterCoverage and formalizing data objects
and operations on these ADTs, GSQL has been
extended to query raster objects. [35] also provide
an approach to raster data semantics. Their approach
is in three stages requiring conceptualization,
synthesis, and description of objects in the raster
data. Neither of these approaches is directly
implemented for the Semantic Web and neither uses
an RDF structure for the raster objects.
The raster data layers in The National Map are
land cover, elevation, and orthographic images (see
Figure 1). Geomorphic entities are typical examples
of geographic features dependent on a raster
representation. For a specific feature example, this
discussion will use the feature crater with the
particular feature instance of Meteor Crater (Figure
5 a and b), Arizona. Note that on a topographic map,
Meteor Crater is represented only by a name and the
map user must interpret the feature from the extent
of the name and contours or from the orthographic
image (orthographic images are now included as a
layer of US Topo). Thus, a part of the task of
representing the crater feature is the definition of its
extent in a form a user will understand. Whereas,
Meteor Crater is a graphically well-defined feature
and easily interpreted by most users from the image
or contour map, other geomorphic features, such as
hills, are more difficult to identify and have
indeterminate boundaries [5].
Unlike other approaches that extract the semantic
objects from the raster data, our approach is to
determine relevant objects and maintain the raster
A
B
Fig. 5 Orthographic image (A) and topographic map (B) representation of Meteor Crater.
matrix as the geometric basis of the geographic
features of interest. This is essential since a user may
want to see a source map or image of the feature in
concert with a query result or with other data. This
can be understood by examining Meteor Crater as
presented in Figure 5a and 5b. A single vector
polygon outline of Meteor Crater would not convey
the feature characteristics nearly as well as the image
or contour map, both of which are raster. The
contours could be shown as vector lines and provide
the same presentation, but in that case the entities
are individual contour lines and not a single entity
that is Meteor Crater. The interpretation of the lines
as Meteor Crater is again left to the user. Thus, the
connection between the ontological object and actual
geographical entity in the real world and the raster
representation is essential.
The steps involved in the conversion of these
types of entities to a semantic representation require
that the features be identified in the raster source and
a pixel or set of pixels selected as the basic
geometric footprint for the feature [48]. This
identification results in a single pixel for features
that can be treated as point features at the resolution
of the raster data. An example is well or spring. A
linear set of pixels can be used to represent line
types of features, such as roads or rivers, based on
size of the feature and resolution of the data.
Features that span areas, such as Meteor Crater,
require contiguous groups of pixels or in some cases
non-contiguous groups of pixels to be identified
[46,47]. The identification step must be followed by
an identification of the relations of the specified
feature to other neighboring features.
The specification of the definition, attributes, and
relationships of a feature, a prototype from category
theory [34,23,45], provide an ODP, which can be
used as a basis for similarity matching to classify
and identify features. Such patterns are for actual
geographical features and may be used for features
represented with vector geometry [51] or raster
geometry as in the case of Meteor Crater. For
Meteor Crater, the ODP would only include the
definitional characteristics appropriate for all craters
whereas Table 3 provides the set of attributes and
relationships of the particular feature instance. For
example, the ODP for the class crater includes the
relations: has definition: circular-shaped depression
…; has attribute: depth; has attribute: shape, etc.
This ODP is generic for all craters since all craters
share the definition and all have attributes of depth
and shape. Meteor Crater has other attributes and
relationships that may not be shared by all craters.
Table 3
Meteor Crater Attributes and Relationships
Feature
Definition
Crater
Circular-shaped depression at the summit of a volcanic cone or
one on the surface of the land caused by the impact of a meteorite;
a manmade depression caused by an explosion (caldera, lua).
Instance
Meteor Crater
GNIS ID 7945
Attributes
Location
UTM
PLSS
MBR
E 497,959.94 m
N 3,876,020.68 m
Zone 12
T 19 N, R 12 1/2 E, Section 13 and 24
Max E 498,536.79 m
Min E 497,317.62 m
Max N 3,876,632.29 m
Min N 3,875,479.58
High
5,723 ft
Low
5,123 ft
600 ft
Elevation
Depth
Shape
Circular
Rim width
Contour at outer perimeter
Contour at inner perimeter
Relationships
Surrounded by roads
Adjacent to Museum
Near sand pits
Near well
Benchmarks on crater
Inner Diameter
Outer Diameter
0.125 mi (0.2 km)
5,600 ft
5,180 ft
Museum Name:
0.50 mi (0.833 km)
0.75 mi (1.25 km)
Meteor Crater Museum
BM 5723 BM East 5706
Once the features and relations, as specified in the
ontology, are identified, the feature is matched to an
existing ODP and additional attributes and
relationships are defined for the feature instance, as
with Meteor Crater above. The newly defined
feature instance is linked to the geometric pixel
patterns of the raster image. At this point an RDF
structure can be created for the feature. Similar to
the representation of point and vector data in RDF
above, the conversion of the feature and relations to
RDF is performed and the raster geometry, pixel,
linear set of pixels, or pixel aggregation, is
structured in GML, using the GML coverage. To
define the gml:Grid element, a minimum bounding
rectangle (MBR) is used for the feature since at this
point GML does not allow storage of pixels in other
than a rectangular fashion. Eventually, the exact set
of pixels that represent a line or polygon will be
stored, but currently to remain within the GML
standard, only the MBR is used.
5. Access to USGS RDF Data for Research Test
Sites of The National Map
To provide access to the research test data
converted to RDF, the USGS established a server
accessible
to
the
public
(http://usgsybother.srv.mst.edu). On this server users external to
the USGS Intranet can access and download the data
in the original Esri and image formats (Geodatabase,
shapefile, TIFF) of The National Map or in RDF.
The USGS has also established a SPARQL Endpoint
at http://usgs-ybother.srv.mst.edu:8890/sparql that
allows direct query of the data using SPARQL. To
illustrate the use of the SPARQL Endpoint, the
USGS implemented the relations standardized by
OGC from the 9-intersection model [13]. An
example relation illustrating a use case with the
SPARQL Endpoint and the converted data is shown
below. The relation is touches and the use case is
“For a given feature, find all other features that
touch the given feature.” (Figure 6). Placing the
query in the geographic space of data from The
National Map, it can be phrased about a specific
feature: “Find all the tributaries of West Hunter
Creek.” The result is a series of URIs and when the
coordinates from GML of the result are placed on a
background map, the graphic in Figure 7 is the
result.
The current capabilities of the endpoint are
restricted to the precomputed relationships provided
and the values included from the native datasets. For
example, one can ask "Which features intersect any
feature with the NHD reach code X?" and receive a
correct result. However, one could not ask "Which
features inside rectangle R have reach code X?"
because the rectangle R isn't a precomputed
relationship and isn't stored as a predicate. We
continue to refine our conversion processes and
expand the capabilities of the RDF data. Our current
research is to eliminate the precomputation in the
conversion and rely on the ontology with defined
relationships to drive the query processing. We
anticipate applications if these data in environmental
modeling and graphical display of model results.
Query
Default Graph URI
http://cegis.usgs.gov/rdf/ontologytest/
PREFIX ogc: <http://www.opengis.net/rdf#>
PREFIX fid: <http://cegis.usgs.gov/rdf/nhd/featureID#>
SELECT ?feature ?type
WHERE {
fid:_102217454 ogc:hasGeometry ?geo1.
?geo1 ogc:touches ?geo2.
?feature ogc:hasGeometry ?geo2.
?feature a ?type }
Fig. 6. Initial screen accessed on the USGS SPARQL Endpoint with example query using relation touches.
http://cegis.usgs.gov/rdf/nhd/featureID#_102216432
http://cegis.usgs.gov/rdf/nhd/featureID#_102216448
http://cegis.usgs.gov/rdf/nhd/featureID#_102216340
http://cegis.usgs.gov/rdf/nhd/featureID#_102216320
http://cegis.usgs.gov/rdf/nhd/featureID#_102217454
http://cegis.usgs.gov/rdf/nhd/featureID#_102216276
http://cegis.usgs.gov/rdf/nhd/featureID#_102216358
Fig. 7. Graphical result of the query in Fig. 6. West Hunter Creek (http://cegis.usgs.gov/rdf/nhd/featureID#102217454) is shown in red and
its tributaries are shown in blue with associated URIs. The background image is a standard USGS Digital Raster Graphic for the quadrangle
that includes West Hunter Creek, Colorado.
6. Conclusions
The USGS is researching the capabilities of the
Semantic Web for supporting query and analysis of
geographic data from The National Map. As a part
of that research, point and vector data for nine
research test areas have been converted to RDF and
made available to the public. A vocabulary of
topographic terms has been developed to form the
basis for ontology for The National Map. To support
user interaction with the converted data, the USG
provides access for download of the research test
data in original formats of the The National Map,
RDF formatted data, and a SPARQL Endpoint for
direct query of the data. The USGS is participating
with these data in testing the evolving GeoSPARQL
standard and providing methods for users to
semantically interact with the data.
Raster data representation on the Semantic Web
requires constructing object representations and
developing the complete set of attributes and
relationships that comprise the ontology for the
entities while maintaining the pixel geometry for
user access. Approaches to date have relied on
conversion from raster to vector geometry thus
losing the original geometric source of the data. The
USGS approach is to maintain the pixel structure of
the entity from the raster image and build ontology
from ODP and specific feature instance attributes
and relationships.
References
[1] Auer, S., Lehmann, J., and Stadler, C. 2011.
LinkedGeoData. Agile Knowledge Engineering and
Semantic Web (AKSW), University of Leipzig, accessed
August 2, 2011, at URL http://linkedgeodata.org.
[2] Berners-Lee, T., 2005, Primer: Getting into RDF &
Semantic Web using N3: v. 1.61, World Wide Web
Consortium,
accessed
March
16,
2011,
at
http://www.w3.org/2000/10/swap/Primer.
[3] Brodaric, B. 2008. A Foundational Framework for
Structuring Geographical Categories. First International
Workshop on Informational Semantics and its Implications
for Geographical Analysis, Geoscience 2008. Park City,
Utah,
September
23,
2008.
http://cogsci.uniosnabrueck.de/~isga08/Brodaric.pdf.
[4] Bulen, A., Carter, J., and Varanka, D., 2011. A Program
For The Conversion of The National Map Data From
Proprietary Format to Resource Description Framework
(RDF), U.S. Geological Survey Open-File Report 20111142, 9 p. http://pubs.usgs.gov/of/2011/1142/.
[5] Burrough, P.A. and A.U. Frank, (eds.), 1996. Geographic
Objects with Indeterminate Boundaries," Taylor and
Francis, London, 352 p.
[6] Câmara, G., A.M. Vieira-Monteiro, J. Paiva, and R.C.M.
deSouza, 2000. Action-driven Ontologies of the
Geographical Space: Beyond the Field-Object Debate. In
M.J. Egenhofer and D.M. Mark, editors, GIScience 2000First International Conference on Geographic Information
Science, Savannah, GA. pp. 52-54.
[7] Caro, H.K. and Varanka, D.E. 2011. Analysis of Spatial
Relation Predicates in U.S. Geological Survey Feature
Definitions. U.S. Geological Survey Open-File Report
2011-1235, 37 p. http://pubs.usgs.gov/of/2011/1235.
[8] Casati, R. and Varzi, A. 1999. Parts and Places, The
Structures of Spatial Representation. Cambridge, Mass.
The MIT Press.
[9] Couclelis, H. 2010. Ontologies of Geographic Information.
International Journal of Geographical Information Science
24(12): 1785–1809.
[10] Dean, D.J. 2007. Characterizing Spatial Databases via
Their Derivation: A Complement to Content Ontologies.
Transactions in GIS 11(3): 399–412.
[11] Dolbear, C., Hart, G., and Goodwin, J. 2006. What OWL
Has Done for Geography and Why We Don’t Need It to
Map Read. Bernardo Cuenca, Pascal Hitzler, Conor
Shankey, and Evan Wallace, eds. Proceedings of the OWL
Experiences and Directions (OWLED) 2006 Workshop.
Athens, Georgia, USA, CEUR Workshop Proceedings 216
CEUR-WS.org.
[12] Dolbear, C. and Goodwin, J. 2007. Position Paper on
Expressing Relational Data as RDF. W3C Workshop on
RDF Access to Relational Databases. 25–26 October,
2007,
W3C,
Cambridge,
MA.,
USA.
http://www.w3.org/2007/03/RdfRDB/papers/dolbear.pdf.
[13] Egenhofer, M.F. and Franzosa, R.D., 1991. Point Set
Topological Spatial Relations, International Journal of
Geographical Information Systems, vol 5, no 2, 161–174.
[14] Esri, 2011, Geodatabases. Esri, accessed March 30, 2011,
at
http://resources.arcgis.com/content/geodatabases/10.0/abo
ut
[15] Gangemi, A. 2005. Ontology Design Patterns for Semantic
Web Content. M. Musen, et. al, eds. Proceedings of the
Fourth International Semantic Web Conference. Lecture
Notes in Computer Science, vol. 3729/2005, Berlin:
Springer, p. 262–276.
[16] Gangemi A. and Presutti V. 2009. Ontology Design
Patterns. In S. Staab, R. Studer (Ed.), Handbook of
Ontologies (2nd edition). Springer: Berlin.
[17] Guarino, N. and Welty, C. 2002. Evaluating ontological
decisions with OntoClean. Communications of the ACM
45(2): 61–65. New York, ACM Press.
[18] Hahmann, S. and Burghard, D. 2010. Connecting
LinkedGeoData and Geonames in the Spatial Semantic
Web. R. Purves and R. Weibel, eds., GIScience 2010
Extended Abstracts, Zurich, Switzerland, p. 28–34.
[19] Hart, G., Goodwin, J., and Pehle, T. 2011.
GeoVoCampsSouthampton201,.aAccessed Nov. 10, 2011,
at
:
http://vocamp.org/wiki/GeoVoCampSouthampton2011.
[20] Heath, T. and Bizer, C. 2011. Linked Data: Evolving the
Web into a Global Data Space (1st edition). James
Hendler and Frank van Harmelen, eds. Synthesis Lectures
on the Semantic Web: Theory and Technology, 1:1, 1–136.
Morgan & Claypool accessed May 31, 2011, at
http://linkeddatabook.com/editions/1.0/.
[21] Kokla, M. and Kavouras, M. 2001. Fusion of top-level and
geographical domain ontologies based on context
formation and complementarity. International Journal of
Geogrpahical Information Science 15(7): 679–687.
[22] Kuhn, W. 2005. Geospatial Semantics: Why, of What, and
How? In Spaccapietra, S. Zimányi, E., Eds., Journal on
Data Semantics III. Lecture Notes in Computer Science,
3534 (3), 1-24.
[23] Lakoff, G., 1987. Women, Fire, and Dangerous Things:
What Categories Reveal about the Mind. Chicago:
University of Chicago Press, 602 p.
[24] Lillesand, T.M., Kiefer, R.W., and Chipman, J., 2007.
Remote Sensing and Image Interpretation, Sixth Edition,
John Wiley and Sons, New York, 804 p.
[25] Liu, Y. Lin, Y., Qin, S., Zhang, Y., Wu, L., 2005. Research
on GSQL Estension Supporting Raster Data, Journal of
Image and Graphics, accessed May 31, 2011, at
http://en.cnki.com.cn/Article_en/CJFDTOTALZGTB20050100J.htm
[26] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., and
Oltramari, A. 2003. WonderWeb Deliverable D18,
Ontology Library (final). Laboratory for Applied Ontology,
http://www.loa-cnr.it/Papers/D18.pdf.
[27] OGC, 2010, Open Geospatial Consortium, Inc.: accessed
March 25, 2010, at: http://www.opengeospatial.org/.
[28] OGC, 2011. Geography Markup Language (GML)
Encoding Standard, Open Geospatial Consortium, Inc.,
accessed
May
31,
2011,
at
http://www.opengeospatial.org/standards/gml
[29] Ontology Engineering Group (OEG). 2011. GeoLinked
Data. Universidad Politecnica de Madrid, accessed August
2, 2011, at http://geo.linkeddata.es.
[30] Ordnance Survey (OS). 2011. Ordnance Survey
Ontologies.
Accessed
August
8,
2011,
at:
http://www.ordnancesurvey.co.uk/oswebsite/ontology/
[31] Pehle, T. 2009. NeoGeoVoCamp Summary Report.
Neogeo,
accessed
August
9,
2011,
at:
http://sites.google.com/site/neogswvocs/.
[32] Perry, M., and Herring, J., eds. 2011. GeoSPARQL – A
Geographic Query Language for RDF Data.
Open
Geospatial Consortium Inc., project document reference
number OGC 09-157-r1.
[33] Portele, C., 2007, OpenGIS Geography Markup Language
(GML) Encoding Standard, v. 3.2.1: Open Geospatial
Consortium, Inc., OGC 07–036, accessed March 30, 2011,
at http://www.opengeospatial.org/standards/gml.
[34] Rosch, E., 1978. Principles of Categorization. In E. Rosch
and B.B. Lloyd, eds, Cognition and Categorization
NewYork: Halstead Press, 27–48.
[35] Quintero, R., Torres, M., Moreno, M., and Guzman, G.
2009. Towards A Semantic Representation of Raster
Spatial Data. GeoSpatial Semantics, Third International
Conference, GeoS 2009, Mexico City, Mexico, December
2009 Proceedings. Springer, LNCS 5892 P. 63–82.
[36] Salas, J.M. and Harth, A. 2011. NeoGeoVocabulary:
Defining a shared RDF representation for GeoData.
Accessed
August
9,
2011,
at:
http://geovocab.org/doc/survey.html.
[37] Spatial Data Transfer Standard Technical Review Board,
1997, Spatial Data Transfer Standard (SDTS) – Part 2,
Spatial Features; Draft for Review, Federal Geographic
Data Committee.
[38] Sen, Sumit. 2007. Two Types of Hierarchies in Geospatial
Ontologies. In, F. Fonseca, M.A. Rodrigues, and S.
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
Levashkin, eds., GeoS 2007, LNCS 4853 pp. 1–19.
Springer-Verlag Berlin Heidelberg 2007.
Smith, M.K., Welty, C. and McGuinness, D.L. 2009. OWL
Web Ontology Language. W3C Recommendation 10
February 2004. http://www.w3.org/TR/2004/REC-owlguide-20040210/#FunctionalProperty.
Sugarbaker, Larry, Kevin Coray, and Barbara Poore. 2009.
The National Map Customer Requirements: Findings from
Interviews and Surveys. USGS Open-File Report 2009–
1222, USGS, Reston, VA.
U.S. Board on Geographic Names, 2010, Geographic
Names Information System: accessed March 25, 2010, at:
http://geonames.usgs.gov/domestic/index.html.
U.S. Geological Survey, 2001, National Geospatial
Program Standards, accessed March 23, 2010, at
http://rockyweb.cr.usgs.gov/nmpstds/dlgstds.html.
U.S. Geological Survey, 2010. National Hydrography
Dataset (NHD) Model (v 2.0), accessed August 15, 2011,
at: http://nhd.usgs.gov/NHDv2.0_poster_6_2_2010.pdf
U.S. Geological Survey, 2011, A Topographic Feature
Vocabulary for Geospatial Ontology Development. U.S.
Geological Survey. http://cegis.usgs.gov/ontology.html.
Usery, E.L., 1993. "Category Theory and the Structure of
Features
in
Geographic
Information
Systems,"
Cartography and Geographic Information Systems, v. 20,
no. 1, p. 5–12.
Usery, E.L., 1994a. "Implementation Constructs for Raster
Features,"
Proceedings,
American
Society
for
Photogrammetry and Remote Sensing Annual Convention,
Reno, Nevada, ASPRS, Bethesda, MD, pp. 661–670.
Usery, E.L., 1994b. "Display of Geographic Features from
Multiple Image and Map Databases," Proceedings,
International Society for Photogrammetry and Remote
Sensing, Commission IV Symposium on Mapping and
Geographic Information Systems, International Archives
of Photogrammetry, Volume XXX, Part B4, Athens, G, pp.
1–9.
Usery, E.L., 1996. "A Feature-Based Geographic
Information
System
Model,"
Photogrammetric
Engineering and Remote Sensing, v. 62, no. 7, p. 833–838.
Varanka, D., 2009a, Landscape Features, Technology
Codes, and Semantics in U.S. National Topographic
Mapping Databases, The International Conference on
Advanced Geographic Information Systems & Web
Services (GEOWS), Cancun, Mexico, February 1–7, 2009.
Varanka, D., 2009b. A Topographic Feature Taxonomy for
a U.S. National Topographic Mapping Ontology,
Proceedings, International Cartographic Conference,
Santiago, Chile, ICA CD-ROM publication.
Varanka, D., 2011. Ontology Patterns for Complex
Topographic Feature Types, Cartography and Geographic
Information Science, v. 38, no. 2, p. 126–136.
Varanka, D. and Usery, E.L., 2010, Special Section:
Ontological Issues for The National Map: Cartographica:
The International Journal for Geographic Information
and Visualization, v. 45, n. 2, p. 103–104.
Welty, C., Guarino, N. 2001. Supporting ontological
analysis of taxonomic relations. Data and Knowledge
Engineering 39(2001): 51–74.
W3C 2004. Resource Description Framework (RDF),
accessed May 31, 2011, at http://www.w3.org/RDF/.
W3C 2011. Semantic Web, accessed May 31, 2011, at
http://www.w3.org/standards/semanticweb/.
Fly UP