Use SQLite Databases From The Filesystem
Problem
You need to read and work with data in SQLite databases stored on your your filesystem.
Solution
Use DuckDB’s ATTACH and USE to work directy with SQLite databases.
Discussion
Via the sqlite extension, DuckDB can work with any existing SQLite databases you may have. You should read the DuckDB documentation on SQLite as there are a few caveats around types and other database characteristics due the the way DuckDB was designed.
We’ll show how to work with SQLite tables by using the classic “chinook” database. For those not familiar with chinook, it’s a database with eleven tables that model a record store. These are the tables:
employeestable stores employees data such as employee id, last name, first name, etc. It also has a field named ReportsTo to specify who reports to whom.customerstable stores customers data.invoices&invoice_itemstables: these two tables store invoice data. Theinvoicestable stores invoice header data and theinvoice_itemstable stores the invoice line items data.artiststable stores artists data. It is a simple table that contains only the artist id and name.albumstable stores data about a list of tracks. Each album belongs to one artist. However, one artist may have multiple albums.media_typestable stores media types such as MPEG audio and AAC audio files.genrestable stores music types such as rock, jazz, metal, etc.trackstable stores the data of songs. Each track belongs to one album.playlists&playlist_tracktables: playlists table store data about playlists. Each playlist contains a list of tracks. Each track may belong to multiple playlists. The relationship between the playlists table and tracks table is many-to-many. The playlist_track table is used to reflect this relationship.
To access the copy of chinook that ships with this book, we’ll use the ATTACH statement which adds a new database file to the catalog, and USE statment which will let us switch the default schema
duckdb -markdown <<EOF
ATTACH 'data/chinook.db';
USE chinook;
SHOW TABLES;
EOF| name |
|---|
| albums |
| artists |
| customers |
| employees |
| genres |
| invoice_items |
| invoices |
| media_types |
| playlist_track |
| playlists |
| sqlite_sequence |
| sqlite_stat1 |
| tracks |
Because this is a frequent use-case for DuckDB, we can also open SQLite databases directly. Let’s do that and find the top 10 best-selling tracks:
duckdb data/chinook.db -markdown <<EOF
FROM
tracks t,
invoice_items i
SELECT
t.Name AS Track,
SUM(i.Quantity) AS "Total Sold"
WHERE
t.TrackId = i.TrackId
GROUP BY t.Name
ORDER BY "Total Sold" DESC
LIMIT 10
EOF| Track | Total Sold |
|---|---|
| The Trooper | 5 |
| Hallowed Be Thy Name | 4 |
| Untitled | 4 |
| The Number Of The Beast | 4 |
| Eruption | 4 |
| Sure Know Something | 4 |
| Good Golly Miss Molly | 3 |
| Heaven Can Wait | 3 |
| Flying High Again | 3 |
| Sweetest Thing | 3 |