Setting Up Local Geocoding in the US with TIGER / CENSUS Data via PostGIS for PostgreSQL

Small Introduction

Transforming addresses into geographic coordinates (geocoding) is not just a technical process, but a vital one for precise location data applications. We require exact spot data, making the conversion of addresses into map points a critical step for apps like the HealthPorta API. The local geocoding system is synonymous with enhancing the speed of geo-search functionalities. This is particularly crucial for services that rely on high location precision. In essence, we need to obtain the coordinates of the address or someplace locally to enable quick and efficient searches around that location.

Internal geocoding is important for a platform like HealthPorta API, as we focus on delivering precise healthcare location services in less time. The integration of a geocoding system directly into the framework, HealthPorta API offers more efficient, location-based search capabilities. The functionality is indispensable for users who need to find healthcare services within specific geographical boundaries quickly and accurately, for instance, by searching for the closest pharmacy or doctor.

We use PostgreSQL, so the best solution to avoid creating a zoo of different technologies is to combine the search by provider types and their health insurance coverage with the geo-search. PostgreSQL has PostGIS as the particular extension that works with GEO data. It provides special data types and indexes for Geometry and Geography operations. Additionally to the data types, we need the actual data about all streets, cities, and states. As an official source, the TIGER database provided by the U.S. Census Bureau is an essential data source for local geocoding. It doesn’t look hard initially, as most of the work is done by others, and you need to send the pack of commands to get it locally. However, some difficulties still need to be addressed, as you need to import every state separately, and running this operation manually is quite dull. To keep the data current, you will need automation in your operations.

Step-by-Step Guide to Importing TIGER / CENSUS Data

  • Preparing Your Environment

Ensure PostgreSQL and PostGIS are Installed: Verify that your system has PostgreSQL with the PostGIS extension installed and properly configured.

  • Follow the official docs, Steps 1-5

Please open the official docs from Postgis website and follow the steps 1-5 (works for the current version of Postgis 3.4.2). This includes checking extensions and applying settings for zip codes importing.

  • You can follow steps 6 and 7 from Postgis docs

Steps 6-7 propose saving all the downloaded CENSUS data into the /gisdata folder on your server. You can follow their recommendations or use another folder; our script will allow you to modify this.

  • Download the script on your server

You can easily download the Census Postgis geocoding import script into the folder on your server. If you work with PostgreSQL, you will have no problems having Python3 of any version on your server.

wget https://pharmacy-near-me.com/static/share/postgis-tiger-geocoding-import.py
  • Edit the script for your local configuration

The Postgis code for shell script generation for national script and per-state data provides a template you need to edit. Our script modifies these templates according to your settings, simplifying the geo data import process. Please give the connection configuration.

vim postgis-tiger-geocoding-import.py
...
## Configuration
POSTGRES_HOST = 'localhost'
DATABASE = "healthporta"
YOUR_POSTGRES_PORT = 5434
YOUR_POSTGRES_PGUSER = 'postgres'
YOUR_POSTGRES_PGPASSWORD = ''
POSTGRES_CONNECTION_EXTRA = '?sslmode=require'
LOCAL_PATH = "/gisdata"
SERVER_PSQL_BIN = '/usr/lib/postgresql/16/bin'
LOCAL_PSQL_BIN = '/usr/lib/postgresql/16/bin'
## /Configuration
...

One of the main problems of the process described in the official docs is the fact, that you need to create shell scripts for national level and for each steps one by one. And, as it generates “templates” – you get the run with default data. Like default PostgreSQL port is 5432. On our dev environment we have it on 5434 port, so to make modifications, please just apply YOUR_POSTGRES_PORT = 5434 in this case in Configuration part.

We assume you’re on the v16 version of PostgreSQL, but your client might be a different version (older). You can “play” with the server and local PSQL_BIN folders if required to apply the appropriate paths. We also assume that psql is in your path when you run the Python script. If not, please modify it to get the script as it is straightforward.

As PostGIS generates default values for tiger data, the script just reapplies your connection settings according to configurations, including the LOCAL_PATH where import is going to store all downloaded files (steps 6-7 from the original docs).

Also, there is a list of states started with debie. If you need only few states, please remove others, but keep the debie in the list first, as it generates national script, which must be run first.

STATES = ('debie', 'AK','AL','AR','AZ','CA','CO','CT','DE','FL', ....)

When you’re done, please run:

python3 postgis-tiger-geocoding-import.py

After you run it, you will get a bunch of files in this folder: national script and per state scripts, however you will need only one that includes all other commands. If you check the result all_import.sh file you will something like:

sh ./nation_script_load.sh
sh ./state_load_AK.sh
sh ./state_load_AL.sh
sh ./state_load_AR.sh
sh ./state_load_AZ.sh
sh ./state_load_CA.sh
....
  • Run the actual import
sh all_import.sh

It will run the national script import and then go by all the states you defined in the initial script. It might take an hour or so to go over all the data. This depends on many factors.

  • Once the process is done, please return to the official docs and run part 15 from the docs.

This will install all required indexes and analyze all the tiger tables to update the stats (including inherited stats), which is quite an essential step in case of the speed of work of your database.

Few additional notes

I suggest checking that you have Postgis 3.4.2 as the default version in many systems (3.4.1, date: 12th of May 2024), which had some issues with zip code import for many states. At least, that was our case.

The re-import or updating of the data. It is pretty easy to do, just make

DROP SCHEMA tiger_data CASCADE;

in your database, and then go over the process again.

The speed of import: the national script must be always run first. There was an idea to process the state’s data in parallel, as a lot of time is taken by downloading the data. However, there is an issue with dependency on the tiger_staging schema during the import process. You might get conflicts and wrong data import if two internal import functions are run together. We may return to the idea of this improvement, but for now, it is paused as CENSUS data updates are coming not more often than once every 6 months.

You got what you need, but there should be something in the end

Hope the process was simple for you or you got some ideas on improving things for your special case. Anyway, setting up local geocoding with TIGER data in PostGIS significantly bolsters the capabilities for our case: the HealthPorta API, as we can operate GEO data locally in relatively a fast way enough precisely.  Of course, Google Maps or MapBox are much more precise in their data, and HealthPorta API can use commercial systems too. However, the local way is always faster when you work with some areas or just count some stats in the areas. Combining different approaches is always the compromise between the high cost and speed.

Be the first to comment on "Setting Up Local Geocoding in the US with TIGER / CENSUS Data via PostGIS for PostgreSQL"

Leave a comment

Your email address will not be published.


*