Monday, January 16, 2012

Mapping New York - more polygons

Last time, we got the map to a pretty decent topographic road map. (In a future installment, I may discuss curating the OpenStreetMap data; it has recurrent errors that come from the Census Bureau TIGER files. But that's a side issue.) The next thing that I want to get into place is some polygon data: "who owns this land? What's it used for? Does the public have right of access? Where are the landmark buildings?

The relevant part of the first set of questions can mostly be answered by a database of publicly-owned lands. This database is getting into territory where the public databases aren't quite up to snuff. Within the Adirondack and Catskill Blue Line, the data are readily available from NYSGIS. The usual drill of using ‘ogr2ogr’ loads them into PostGIS:

ogr2ogr -f PostgreSQL -overwrite -t_srs EPSG:32618 \
"PG:dbname=gis" DEC_Lands.shp \
-nln nys_dec_lands -nlt MULTIPOLYGON -lco PRECISION=NO

Once again, the ‘-t_srs’ option is there to reproject the data into the projection that I intend to use for the finished map, and the ‘-lco PRECISION=NO’ works around a bug that causes a failure in inserting some of the numeric data.

Outside the Blue Line, the data come from a different place: the New York State Office of Cybersecurity. (I'd be intrigued to know why they became the custodian of the data.) In any case, they have a collection of files available on the NYSGIS web site.

Rather than putting each of these files into a separate table in the database, and hence needing a separate layer to show them, I decided to integrate them into a single table, and add a column to the data representing which data set a given row came from. For this, I decided to resort to scripting. Pulling out my handy-dandy Tcl interpreter, I ran the following loadall.tcl script:

set firsttime true
# Find all the shapefiles in the working directory
foreach file [glob *.shp] {

# Extract the base name of each file, and create an 'ogr2ogr' command to load it
set base [file rootname [file tail $file]]
set cmd [list ogr2ogr -f "PostgreSQL"]
if {$firsttime} {
set firsttime false
lappend cmd -overwrite
} else {
lappend cmd -append
}
# Add a 'data_source' column to identify which file we loaded
lappend cmd -sql "SELECT *, '$base' AS data_source FROM $base" \
-t_srs EPSG:32618 \
-skipfailures \
"PG:dbname=gis" \
$file \
-nln nys_public_land_boundaries \
-nlt MULTIPOLYGON \
-lco PRECISION=NO
# Report on the console which file we're processing, and load it
puts $cmd
exec {*}$cmd >@stdout 2>@stderr
}

With these boundaries, what I mostly care about is “recreational” (go ahead and access) versus “nonrecreational” (permission needed, or special land use such as prisons and schools). (And I also want to treat the ‘AdirondackCatskill’ file specially, because that's the Blue Line, rather than reflecting public ownership.

Both of these layers need some styling. Rather than walk through that whole process, I have QML files attached at the end of the post.

Dealing with OpenStreetMap polygon data is rather more complicated, because it's got so many different things in the same file. I therefore made several different layers, with SQL queries to extract specific features.

(1) The first layer, I just left with the name, new_york_osm_polygon. This layer really represents "here are polygons that I don't know what to do with, yet." It is stacked behind everything else, and I usually leave it unchecked unless I'm actively working on styling for polygons. Its query looks like:

admin_level IS NULL
AND ("boundary" IS NULL OR "boundary" NOT IN ('national_park'))
AND ("waterway" IS NULL OR "waterway" IN ('boatyard','dam','dock','rapids','waterfall'))
AND ("natural" IS NULL OR "natural" NOT IN ('bay', 'marsh', 'pond', 'swamp', 'water','waterway','wetland'))

This query excludes:

  • Anything with an ‘admin_level’ attribute: these are administrative regions (states, counties, cities, towns, etc.)

  • National park boundaries. I have these more accurately in the NYS Public Lands file.

  • All waterways, other than man-made assets on the water.

  • All wetlands, I've taken care of those already.


I style this letter either in bright yellow or icky purple, just to call attention to the unclassified features.

(2) The next layer is the layer where I describe land use. Its SQL query looks like:

"landuse" IS NOT NULL
OR ("leisure" IS NOT NULL AND "leisure" NOT IN ('ice_rink','pitch', 'track', 'tennis_court'))
OR "aeroway" IS NOT NULL
OR (amenity IN ('school', 'college', 'university', 'hospital') AND building IS NULL)

which translates to:

  • Land use polygons

  • Polygons marked 'leisure', except for a handful that usually appear inside parks and want to be rendered at a higher level.

  • Polygons marked 'aeroway'.

  • Polygons marked 'school', 'college', 'university', or 'hospital', except for buildings (these allow highlighting of campuses).


This set may need to be considered a work in progress; I expect these rules will need to be tweaked depending on the theme of the map.

For this layer, I created a fairly complex style with rules that fill the polygons in different colors according to land use. The QML is attached.

(3) Next up are the 'public lands' and 'DEC lands' layers, which I already discussed.

(4) Next, I have a few more types of region from OpenStreetMap. I call the layer OSM Subregion, and its query looks like:

(amenity IN ('parking')
OR leisure IN ('Dog Run', 'pitch' ,'tennis court','track'))
OR aeroway IN ('apron')
AND building IS NULL

so that it includes parking lots, dog runs, playing fields, tennis courts, racetracks, and airport aprons. What these types of object have in common is that they are usually layered atop another object (a shopping mall, industrial facility, park, airport, etc.), and so they look better rendered in an upper layer. Once again, I've attached a QML file to style them. This layer goes above the 'public lands' layers, but below the contour lines.

(5) Finally, there are polygons for a few man-made features:

building IS NOT NULL
OR leisure IN ('pool','swimmin_pool','swimming_pool','wading_pool')

that is to say, buildings and swimming pools. This layer goes very high in the stack - above the roads. It provides footprints for these structures. Once again, I've attached the QML that styles it.

Wow, that's a lot of layers from one data set. But with them all in place, our map now has quite a lot of detail.
Added polygon data to the basemap
Next time, we'll make the map prettier, by adding shaded relief.

Attachments:

4 comments:

Andy said...

The reason why the NYS Department of Homeland Security curates the park data is they oversee the NYSGIS consortium, at least post-9/11. For a while they where very restrictive about that kind of stuff.

Another Kevin said...

Yeah, God forbid the terrorists should know where the state parks are! They might, uhm, do what exactly? No matter, be afraid!

Anonymous said...

I didn't have Tcl installed, so I tried issuing the commands manually, but I keep getting an unrecognized field error. I've tried searching the web but I can't seem to figure out what I'm doing wrong. Anyone have any ideas?

cwyse:~/gis/NYS.Public_land_boundaries$ ogr2ogr -f "PostgreSQL" -overwrite -t_srs EPSG:32618 -skipfailures "PG:dbname=gis" AdirondackCatskill.shp -nln nys_public_land_boundaries -nlt MULTIPOLYGON -lco PRECISION=NO -sql "SELECT *, 'AdirondackCatskill' AS data_source FROM AdirondackCatskill "
ERROR 1: SQL: Unrecognised field name AdirondackCatskill

Anonymous said...

Well - after upgrading to the latest version of ogr2ogr, and getting over some issues with my shell command. I was able to get it working....