I first came across the idea of Gravity Models about a month before the time of this writing (December 2017). The model I saw was Carto’s interactive Gravity Model that predicts how many customers might patron a store location and the probability of them doing so. It’s a beautiful map with user-engaging design that also allows for user input into the model. Go check it out!
I have been using SQL for a few months now and so their model appears quite complicated to me, but I thought I’d give it a shot at re-creating something similar using the SQL I do feel comfortable with. This is work in progress and I will write a new blog post when I solve the issues I have. For now, here’s my attempt.
I used data I downloaded from Los Angeles GeoHub Portal—the libraries dataset and selected only libraries in Long Beach. I believe this is perhaps a year or two dated but it will do. I also used block group population data which I downloaded and joined (see how I did that here). The data reflects American Community Survey estimates for the year 2016.
Things to remember:
|Dataset||Long Beach Libraries
|Long Beach Block Group Population lb_bg_pop|
|Number of Rows||13||336|
I am using this tutorial as my guide more or less. It explains the components of Huff’s Model but uses ArcMap. The first step was to populate the block group table with amenity information for the libraries. Here I went with a very simplistic approach and assigned 2,000 square feet to the Main Library and 850 square feet to the rest, which can be seen in the first part of the query:
Disregard the first set of parentheses for now, I will explain them soon. As you can see, I am selecting the geometry of the block groups, the population estimated, calculating distance from each block group to each library, and populating amenity_size in that order.
Once I had the distance from every block group to every library, it was time to calculate the “attractiveness” score for each. In other words, the attractiveness score will show a numeric representation of each block group’s attraction to each library. If I live downtown, it makes more sense that my block group will have a higher attraction score to libraries near downtown and lower attractiveness scores to North Long Beach Branch.
In order to access the newly created fields in the query above and keep writing in the same query, I used a Common Expression Table, CET—that’s what the parentheses are for! I wrapped the first part of the query in the parenthesis and gave the able an alias. In the following chunk of code, I wanted to keep the geometry of the block groups, the name of the library, the average distance, and amenity_size.
As for the new fields being calculated, the attract_score is the size of the amenity divided by the distance squared. The tot_attract_score is the sum of all the attract_scores. This bit was tricky to me—in order to calculate the total attractiveness score, I replicated the calculation of the attract score and divided it over the sum of the squared distance. The resulting dataset has 4,355 records and that is because we are calculating attractiveness levels for each block group to each library, so (13 libraries) *(335 block groups) = 4,355 records. Also, I did not want to run into the issue of dividing by zero so I chose to ignore the block groups for which the distance is zero. This too, needs work.
Currently, I have not figured out a way to combine them all in one dataset. In other words, the goal for me is to have a single dataset where I can choose the visualize data for a given library and consider the fact that certain areas are in other library territories. I want to be able to identify which block groups are “up for grabs.” [I know this, model is best applied to retail locations.] I can’t do this now, but I’m working on it. For now, what I could do was “query out” the visualization for each library by using a “view” in PGAdmin for each one of the 13 libraries. Here’s the query with all its components.
And just in case you were curious, here’s the query used for making the view—pretty simple. Notice that the row_number() over () as new_id adds a number that increments by 1, which will act as our unique identifier in QGIS when we go to visualize it.
The resulting maps show the likelihood of each block group to gravitate towards one of the individual 13 libraries.
Final notes: This is still work in progress and of course if you have suggestions or corrections I would love to hear from you. Below is a brief list of improvements I can think of right off the bat.
- Getting the data for all on one dataset and calculating the likelihood of patronage for each library while accounting for the fact that other libraries exist in the surroundings and will “grab” certain areas.
- Keeping the population data—this is essential. I have been using PostGIS for 2 months and so for me it was a challenge to try to keep it and get away with the grouping but I am sure there is a way.
- Keeping the block group (and population data) for block groups where the library sits i.e.) where the distance is zero. If you look closely, said block group is missing from the visualizations.