I recently ran into an interesting challenge trying to use Postgres to work with JSON trees. Specifically, I had to figure out how to navigate a tree structure stored as JSONB in a Postgres table. Here’s how I did it.

The Setup

Let’s start by defining the schema. We’ll create a table that will store org charts as a JSONB object:

CREATE TABLE org_charts (
  id    BIGSERIAL PRIMARY KEY,
  chart JSONB
);

Next, let’s define the JSON. It’s going to be a recursive structure of employee nodes with a reports collection containing the employee’s direct reports. Since there can be multiple levels of reporting, our org chart can be arbitrarily deep:

{
    "id": 1,
    "name": "Charles Montgomery Burns",
    "title": "Owner",
    "reports": [
        {
            "id": 2,
            "name": "Waylon Smithers, Jr."
        },
        {
            "id": 3,
            "name": "Inanimate carbon rod",
            "reports": [
                {
                    "id": 4,
                    "name": "Homer Simpson",
                    "title": "Safety Engineer"
                }
            ]
        },
        {
            "id": 5,
            "name": "Lenny Leonard"
        },
        {
            "id": 6,
            "name": "Carl Carlson"
        },
        {
            "id": 7,
            "name": "Frank Grimes",
            "status": "deceased"
        }
    ]
}

The Query

We would like to write a query that can navigate this tree and, for instance, print out the names of all employees. To do that, we need to use Postgres’ powerful recursive common table expression (CTE). Recursive CTEs sound very intimidating, but are actually fairly straight forward to write. Here’s the one I wrote for this purpose:

WITH RECURSIVE reports (id, json_element) AS (
  -- non recursive term
  SELECT
    id,
    chart
  FROM org_charts

  UNION

  -- recursive term
  SELECT
    id,
    CASE
    WHEN jsonb_typeof(json_element) = 'array'
      THEN jsonb_array_elements(json_element)
    WHEN jsonb_exists(json_element, 'reports')
      THEN jsonb_array_elements(json_element -> 'reports')
    END AS json_element
  FROM
    reports
  WHERE
    jsonb_typeof(json_element) = 'array' OR jsonb_typeof(json_element) = 'object'
)

Let’s dissect this a bit. The query starts with:

WITH RECURSIVE reports (id, json_element) AS (

Here, we’re declaring a recursive CTE called reports, which takes two parameters id and json_element. Note that parameters represent the columns being returned by the overall query as well as internal subqueries.

Next, we define the non-recursive term:

-- non recursive term
  SELECT
    id,
    chart
  FROM org_charts

This query tells Postgres how to start evaluating the CTE. It’s evaluated first (and only once) and its results drive the rest of the query. Note that this query must return two columns that match those defined by the CTE (id and json_element in this case).

After the non-recursive term, we need to define the recursive term:

-- recursive term
  SELECT
    id,
    CASE
    WHEN jsonb_typeof(json_element) = 'array'
      THEN jsonb_array_elements(json_element)
    WHEN jsonb_exists(json_element, 'reports')
      THEN jsonb_array_elements(json_element -> 'reports')
    END AS json_element
  FROM
    reports
  WHERE
    jsonb_typeof(json_element) = 'array' OR jsonb_typeof(json_element) = 'object'

A few things to note here:

Now, let’s talk about what this query is doing. Essentially, it’s unrolling the nested JSON structure into rows. If it runs into a JSON array, it creates one row per each member using jsonb_array_elements() function. If it runs into an element that contains the reports collection, it unrolls that as well using jsonb_array_elements(). Finally, it only looks at JSON arrays or objects (and not values or NULLs).

The Results

So, how do we actually run this CTE? By simply treating reports as any other table:

SELECT * FROM reports;

Running the query above results in the following:

id json_element
1
{
    "id": 1,
    "name": "Charles Montgomery Burns",
    "title": "Owner",
    "reports": [
        {
            "id": 2,
            "name": "Waylon Smithers, Jr."
        },
        {
            "id": 3,
            "name": "Inanimate carbon rod",
            "reports": [
                {
                    "id": 4,
                    "name": "Homer Simpson",
                    "title": "Safety Engineer"
                }
            ]
        },
        {
            "id": 5,
            "name": "Lenny Leonard"
        },
        {
            "id": 6,
            "name": "Carl Carlson"
        },
        {
            "id": 7,
            "name": "Frank Grimes",
            "status": "deceased"
        }
    ]
}
1
{
    "id": 2, 
    "name": "Waylon Smithers, Jr."
}
1
{
    "id": 3, 
    "name": "Inanimate carbon rod", 
    "reports": [
        {
            "id": 4, 
            "name": "Homer Simpson", 
            "title": "Safety Engineer"
        }
    ]
}
1
{
    "id": 5, 
    "name": "Lenny Leonard"
}
1
{
    "id": 6, 
    "name": "Carl Carlson"
}
1
{
    "id": 7, 
    "name": "Frank Grimes", 
    "status": "deceased"
}
1 NULL
1
{    
    "id": 4, 
    "name": "Homer Simpson", 
    "title": "Safety Engineer"
}

As you can see here, our query created one row per employee plus a NULL. To be honest, I’m not sure why the NULL row is being returned (I tried but failed to get rid of it). That said, it’s easy enough to filter the NULL row out.

To get the list of names (which was our original goal), we can do this:

SELECT json_element -> 'name' AS employee_name
FROM reports
WHERE jsonb_exists(json_element, 'name');

Here’s we’re looking only at those rows that contain the element name and then returning the value of name:

employee_name
“Charles Montgomery Burns”
“Waylon Smithers, Jr.”
“Inanimate carbon rod”
“Lenny Leonard”
“Carl Carlson”
“Frank Grimes”
“Homer Simpson”

Overall, using Postgres recursive CTEs to navigate JSON trees proved to be a challenging, but rewarding exercise. If you grok the basics, it’s fairly straight forward to build up complex queries.

You may also like:

Did you love / hate / were unmoved by this post?
Then show your support / disgust / indifference by following me on Twitter!

Comments are closed.