How To Navigate JSON Trees in Postgres using Recursive CTEs
February 9th, 2017
I recently ran into an interesting challenge trying to use Postgres to work with JSON trees. Specifically, I had to figure out how to navigate a tree structure stored as JSONB
in a Postgres table. Here’s how I did it.
The Setup
Let’s start by defining the schema. We’ll create a table that will store org charts as a JSONB
object:
CREATE TABLE org_charts ( id BIGSERIAL PRIMARY KEY, chart JSONB );
Next, let’s define the JSON. It’s going to be a recursive structure of employee nodes with a reports
collection containing the employee’s direct reports. Since there can be multiple levels of reporting, our org chart can be arbitrarily deep:
{ "id": 1, "name": "Charles Montgomery Burns", "title": "Owner", "reports": [ { "id": 2, "name": "Waylon Smithers, Jr." }, { "id": 3, "name": "Inanimate carbon rod", "reports": [ { "id": 4, "name": "Homer Simpson", "title": "Safety Engineer" } ] }, { "id": 5, "name": "Lenny Leonard" }, { "id": 6, "name": "Carl Carlson" }, { "id": 7, "name": "Frank Grimes", "status": "deceased" } ] }
The Query
We would like to write a query that can navigate this tree and, for instance, print out the names of all employees. To do that, we need to use Postgres’ powerful recursive common table expression (CTE). Recursive CTEs sound very intimidating, but are actually fairly straight forward to write. Here’s the one I wrote for this purpose:
WITH RECURSIVE reports (id, json_element) AS ( -- non recursive term SELECT id, chart FROM org_charts UNION -- recursive term SELECT id, CASE WHEN jsonb_typeof(json_element) = 'array' THEN jsonb_array_elements(json_element) WHEN jsonb_exists(json_element, 'reports') THEN jsonb_array_elements(json_element -> 'reports') END AS json_element FROM reports WHERE jsonb_typeof(json_element) = 'array' OR jsonb_typeof(json_element) = 'object' )
Let’s dissect this a bit. The query starts with:
WITH RECURSIVE reports (id, json_element) AS (
Here, we’re declaring a recursive CTE called reports
, which takes two parameters id
and json_element
. Note that parameters represent the columns being returned by the overall query as well as internal subqueries.
Next, we define the non-recursive term:
-- non recursive term SELECT id, chart FROM org_charts
This query tells Postgres how to start evaluating the CTE. It’s evaluated first (and only once) and its results drive the rest of the query. Note that this query must return two columns that match those defined by the CTE (id
and json_element
in this case).
After the non-recursive term, we need to define the recursive term:
-- recursive term SELECT id, CASE WHEN jsonb_typeof(json_element) = 'array' THEN jsonb_array_elements(json_element) WHEN jsonb_exists(json_element, 'reports') THEN jsonb_array_elements(json_element -> 'reports') END AS json_element FROM reports WHERE jsonb_typeof(json_element) = 'array' OR jsonb_typeof(json_element) = 'object'
A few things to note here:
- this query references the CTE itself (
FROM reports
), which is what allows it to be recursive. - this query must also return two columns that represent the
id
andjson_element
. - You must combine the results of the non recursive term and the recursive term using
UNION
orUNION ALL
.
Now, let’s talk about what this query is doing. Essentially, it’s unrolling the nested JSON structure into rows. If it runs into a JSON array, it creates one row per each member using jsonb_array_elements()
function. If it runs into an element that contains the reports
collection, it unrolls that as well using jsonb_array_elements()
. Finally, it only looks at JSON arrays or objects (and not values or NULLs).
The Results
So, how do we actually run this CTE? By simply treating reports
as any other table:
SELECT * FROM reports;
Running the query above results in the following:
id | json_element |
---|---|
1 |
{ "id": 1, "name": "Charles Montgomery Burns", "title": "Owner", "reports": [ { "id": 2, "name": "Waylon Smithers, Jr." }, { "id": 3, "name": "Inanimate carbon rod", "reports": [ { "id": 4, "name": "Homer Simpson", "title": "Safety Engineer" } ] }, { "id": 5, "name": "Lenny Leonard" }, { "id": 6, "name": "Carl Carlson" }, { "id": 7, "name": "Frank Grimes", "status": "deceased" } ] } |
1 |
{ "id": 2, "name": "Waylon Smithers, Jr." } |
1 |
{ "id": 3, "name": "Inanimate carbon rod", "reports": [ { "id": 4, "name": "Homer Simpson", "title": "Safety Engineer" } ] } |
1 |
{ "id": 5, "name": "Lenny Leonard" } |
1 |
{ "id": 6, "name": "Carl Carlson" } |
1 |
{ "id": 7, "name": "Frank Grimes", "status": "deceased" } |
1 | NULL |
1 |
{ "id": 4, "name": "Homer Simpson", "title": "Safety Engineer" } |
As you can see here, our query created one row per employee plus a NULL. To be honest, I’m not sure why the NULL row is being returned (I tried but failed to get rid of it). That said, it’s easy enough to filter the NULL row out.
To get the list of names (which was our original goal), we can do this:
SELECT json_element -> 'name' AS employee_name FROM reports WHERE jsonb_exists(json_element, 'name');
Here’s we’re looking only at those rows that contain the element name
and then returning the value of name:
employee_name |
---|
“Charles Montgomery Burns” |
“Waylon Smithers, Jr.” |
“Inanimate carbon rod” |
“Lenny Leonard” |
“Carl Carlson” |
“Frank Grimes” |
“Homer Simpson” |
Overall, using Postgres recursive CTEs to navigate JSON trees proved to be a challenging, but rewarding exercise. If you grok the basics, it’s fairly straight forward to build up complex queries.
You may also like:
Did you love / hate / were unmoved by this post?
Then show your support / disgust / indifference by following me on
Twitter!