You need to read and work with JSON data from the internets.
Solution
Treat JSON web content as database tables
Discussion
Most of the responses we get from APIs ends up being JSON data. This is great, since JSON is easy to work with both inside and outside of DuckDB. However, it’s not so great in that every REST API (friends don’t let friends jsut GraphQL) seems to pick a bespoke format for the responses.
NOTE: reading in remote JSON relies on the httpfs extension being installed. After you do that once, it will auto load when DuckDB sees https URLs.
We’ve already seen how this works in the chapter on installing DuckDB. Let’s re-visit that example:
The response from this GitHub API endpoint is pretty interesting. DuckDB figured out the *_at fields are TIMESTAMPs, and the data was clean enough that we have accessible STRUCT sub-fields in some of the top-level fields.
We got lucky in that there have only been 38 releases. We’re also lucky in that these endpoints are not authenticated. By the time I’m done with this tome, this chapter will be re-written to take advantage of some new capabilities DuckDB will have in sending arbitrary headers along with HTTP requests.
Let’s see how excited folks were about the releases:
duckdb-markdown-c"WITH ReleaseReactions AS ( FROM read_json_auto('https://api.github.com/repos/duckdb/duckdb/releases?per_page=100') SELECT name, tag_name, UNNEST(reactions))SELECT * EXCLUDE (url, total_count, confused)FROM ReleaseReactions"
name
tag_name
+1
-1
laugh
hooray
heart
rocket
eyes
v0.10.1 Bugfix Release
v0.10.1
3
0
0
27
0
12
0
DuckDB 0.10.0 “Fusca”
v0.10.0
3
0
0
69
11
33
0
0.9.2 Bugfix Release
v0.9.2
14
0
0
0
0
2
0
0.9.1 Bugfix Release
v0.9.1
14
0
0
9
0
8
0
0.9.0 Preview Release “Undulata”
v0.9.0
9
0
0
30
10
17
0
0.8.1 Bugfix Release
v0.8.1
0
0
0
29
0
0
0
0.8.0 Preview Release “Fulvigula”
v0.8.0
18
0
0
34
3
2
0
0.7.1 Bugfix Release
v0.7.1
6
0
0
14
0
8
0
0.7.0 Preview Release “Labradorius”
v0.7.0
15
0
0
21
0
0
0
0.6.1 Bugfix Release
v0.6.1
0
0
0
11
0
0
0
0.6.0 Preview Release “Oxyura”
v0.6.0
0
0
0
18
0
0
0
0.5.1 Bugfix Release
v0.5.1
6
0
0
12
0
2
0
0.5.0 Preview Release “Pulchellus”
v0.5.0
11
0
0
8
0
9
0
0.4.0 Preview Release “Ferruginea”
v0.4.0
10
0
0
12
0
0
0
0.3.4 Bugfix Release
v0.3.4
9
0
0
0
0
0
0
0.3.3 Preview Release “Sansaniensis”
v0.3.3
3
0
0
10
1
0
0
0.3.2 Preview Release “Gibberifrons”
v0.3.2
2
0
0
13
0
0
0
0.3.1 Preview Release “Spectabilis”
v0.3.1
10
0
0
5
0
2
0
0.3.0 Preview Release “Gracilis”
v0.3.0
1
0
0
0
0
0
0
0.2.9 Preview Release “Platyrhynchos”
v0.2.9
0
0
0
3
0
0
0
0.2.8 Preview Release “Ceruttii”
v0.2.8
4
0
0
0
0
2
0
0.2.7 Preview Release “Mollissima”
v0.2.7
3
0
0
7
0
0
0
0.2.6 Preview Release “Jamaicensis”
v0.2.6
0.2.5 Preview Release “Falcata”
v0.2.5
0.2.4 Preview Release “Jubata”
v0.2.4
0.2.3 Preview Release “Serrator”
v0.2.3
0.2.2 Preview Release “Clypeata”
v0.2.2
0.2.1 Preview Release
v0.2.1
0.2.0 Preview Release
v0.2.0
0.1.9 Preview Release
v0.1.9
0.1.8 Preview Release
v0.1.8
0.1.7 Preview Release
v0.1.7
0.1.6 Preview Release
v0.1.6
0.1.5 Preview Release
v0.1.5
0.1.3 Preview Release
v0.1.3
0.1.2 Preview Release
v0.1.2
0.1.1 Preview Release
v0.1.1
0.1.0 Preview Release
v0.1.0
Ah, there’s that friendly SQL dialect again, making our queries less ugly. This comes in especially handy for gnarly JSON responses.