Using SQL To Work With JavaScript Arrays In DuckDB WASM

Problem

You want to use DuckDB WASM in-browser to work with JavaScript arrays of objects in a SQL context.

Solution

Use DuckDB’s inherent ability to work with JavaScript arrays of objects.

Discussion

The introductory chapter showed a very simple two element array of ojects, each with one field. This toy example was just enough to explain the core loading idiom and some initial database machinations, but is not very realistic. You’re more likely to have an array of objects as a result of deliberately loading them, or because you made an API call and have some results you needs to work with.

The Observable team curated a number of example datasets for their various tutorials and sets of documentation, so we’ll use one of those, and lean on some D3 helpers load it into the browser.

They (somewhat) recently added a pizza dataset, so we’ll use that. Here’s a sample of it:

order_date,day_of_week,category,name,price,orders,revenue
2020-01-01T00:00:00.000Z,Wednesday,Classic,Pepperoni Pizza Small,13,16,208
2020-01-01T00:00:00.000Z,Wednesday,Classic,Hawaiian Pizza Medium,16,7,112
2020-01-01T00:00:00.000Z,Wednesday,Classic,Margherita Pizza Extra Large,21,14,294
2020-01-01T00:00:00.000Z,Wednesday,Classic,Margherita Pizza Small,11,20,220

We’ll load it up and use it to answer the following questions:

🔵 DuckDB loading…

  • How many orders are in the dataset? ()
  • Which category is most ordered? ()
  • How much revenue was brought in each year? ()

One of the first differences from the example in the DuckDB WASM introductory chapter is the need to import the D3 csv loader and use it to populate our database. Here’s the code for those two operations:

1import { csv } from 'https://cdn.jsdelivr.net/npm/d3-fetch@latest/+esm';

2const db = await DuckDBClient().of({
  pizza: await csv("https://static.observableusercontent.com/files/c653108ab176088cacbb338eaf2344c4f5781681702bd6afb55697a3f91b511c6686ff469f3e3a27c75400001a2334dbd39a4499fe46b50a8b3c278b7d2f7fb5", (d) => {
    d.order_date = new Date(d.order_date);
    d.revenue = +d.revenue;
    return d
  })
})
1
Import just what we need from D3’s fetch library
2
Make the call to the CSV loader and ensure we have proper data types for the fields we’ll be working with.

There are three <span>s in the question and answer block and we put the results of the three queries there.

Our first query was for the total number of orders:

1const res1 = (await db.sql`FROM pizza SELECT COUNT (*) AS nrows`)[0].toJSON()

2document.getElementById("orders").innerHTML =
  res1.nrows.toLocaleString()
1
Make the query and retrieve the first (and, only) row; then, convert it to something we can use in plain JavaScript
2
Render the value to a string in the format of reader

The db.sql construct is a tagged template (sometimes called a tag function). It’s an advanced form of the template literals which are supercharged strings that can excute java code between adorned braces. For example:

const msg = "'sup, world"
const fancy = `Yo, ${msg}!! Today is ${new Date()}`

This is how the sql is defined (i.e., there is no “magic”):

async sql(strings, ...args) {
 return await this.query(strings.join("?"), args);
}

The parser/interpreter knows that a construct like sql can be used like template literals. It’s a shortcut that usually results in cleaner looking code, but you can 100% use db.query() or the other db operations if you are more comfortable doing so.

The await is necessary since db.sql returns a Promise.

The result of a successful query is an Arrow table. While you can perform all the usual operation on this iterable object, it is not a plain JavaScript array, so we have to do some surgery to get to the data.

We know there is only one row in the resultset, so we use indexing [0] to get to it, then convert it into something we can use in JavaScript via the oddly named toJSON() function.

The second answer just involves a bit more SQL:

const res2 = (await db.sql`
FROM pizza 
SELECT 
  category, 
  COUNT (*) AS n_orders
GROUP BY 1
ORDER BY 2 DESC
`)[0].toJSON()

document.getElementById("category").innerHTML = 
  res2.category + ": " + res2.n_orders.toLocaleString()

and, the third just does more post-resultset JavaScript machinations:

const res3 = (await db.sql`
FROM pizza 
SELECT 
  date_part('year', order_date) AS year, 
  SUM(revenue) AS total_revenue
GROUP BY 1
ORDER BY 2 DESC
`).map(d => d.toJSON())

document.getElementById("revenue").innerHTML = 
  res3.map(d => 
    `${d.year}: $${d.total_revenue.toLocaleString()}`
  ).join("; ")

Adding in automatic (like Lit, Svelte, React, Vue, etc.) “reactivity” (where component changes trigger other actions) or manual (think “form submit”) “reactivity” is something you will need to layer on. DuckDB WASM is just another data component for you to work with.