Joining US Census data + Tiger Shapefiles Using PostGIS

In this walk through, I want to go over a very simple join of shapefile + data at the database level. If you are not familiar with PostgreSQL (and for our purposes, its PostGIS extension) you might want to read more about that before proceeding. This tutorial assumes that you have installed PostgreSQL+PostGIS on your machine and have saved both the .csv file containing population counts by block group, as well as the Tiger Shapefile of block groups. In this example, we will work with Los Angeles County (CA) data.

If you have done this join from inside a GIS GUI such as ArcMap, or QGIS you might remember that one must first start by creating a new field and assigning a double data type. The reason for this is the .csv we downloaded is formatted so as the geoid, the common key used to make the join, has a data type of double. If we look at the data type of the current Tiger shapefile we’ll notice it is a string. String and numbers do not mix. Let’s start by running the following to create the field:

ALTER TABLE ca_block_groups
SET geoid_2 bigint;

 

Now let’s populate our new, empty column with the values in the geoid field. Remember, we want the same records, just different data type.

UPDATE ca_block_groups
SET geoid_2 = geoid::bigint

 

We are ready to make the join! Type the following snippet of code into Postgres:

SELECT 

      bg.gid,  

     bg.geom, 

     bg.geoid_2, 

     bg.countyfp, 

     bg.namelsad,  

     bg.aland,            

     pop."GEO_id2",  

     pop.estimate

INTO laco_pop_by_bg

FROM ca_block_groups as bg

JOIN laco_blockgroup_pop as pop ON bg.geoid_2 = pop."GEO_id2"

WHERE bg.countyfp = '037' AND bg.aland > 0

Here, the SQL is taking our two tables and telling the database “Hey, the columns from these two separate tables have this key in common, let’s grab all the attributes from this non-spatial table and copy them onto this neat shapefle.”

Note that the WHERE clause is telling the databse to only select block groups with the FIPS code 037, which is California. The AND clause that follows is telling it to further scrutinize the dataset and out of the California block groups, only return those records that are land– CA has quite a number of block groups which are just off shore but obviously have no population.

 


Pro Tip: Postgres is quite picky some times. While it is more forgiving than Python in terms of indentation, it is picky about column names! Postgres does not like having columns with mixed case ie) in some columns you use capital letters, and the rest you don’t. Postgres wants you to specifically tell it that you acknowledge it’s a capital but that’s indeed the column you want. For example

SELECT 
    pop.GEO_ID

has to be written as:

SELECT
     pop."GEO_ID"

I know. *Rolls eyes.*


 

 

 

This is the result

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s