Read A CSV File From The Web

Problem

You need to read and work with CSV data stored on the internets.

Solution

Treat web-hosted CSV files as database tables

Discussion

We’ll continue use the CSV data from Datablist. Except this time, we won’t worry about the curl part.

To work with CSV files from the web, we need the help of the DuckDB httpfs extension. This extension allows DuckDB to read and write CSV files from the web (amongst other things).

To use this extension, we need to install it and load it:

duckdb -c "
INSTALL httpfs;
LOAD httpfs;
"

However, this can be annoying to do every time we start a new session. And, do we reall have to tell DuckDB about all new file formats and remember all the names?

The answer is “no”, and we can use the DuckDB config file (located at ~/.duckdbrc) to help us avoid repeition.

Add the following to that file:

SET autoload_known_extensions = true;
SET autoinstall_known_extensions = true;

and DuckDB will automatically install and load the extension for us if it successfully guesses what we’re trying to do.

Now we can load the CSV file from the web:

duckdb -markdown -c "
FROM read_csv('https://drive.google.com/uc?id=1N1xoxgcw2K3d-49tlchXAWw4wuxLj7EV&export=download')
LIMIT 5;
"
Index Customer Id First Name Last Name Company City Country Phone 1 Phone 2 Email Subscription Date Website
1 ffeCAb7AbcB0f07 Jared Jarvis Sanchez-Fletcher Hatfieldshire Eritrea 274.188.8773x41185 001-215-760-4642x969 gabriellehartman@benjamin.com 2021-11-11 https://www.mccarthy.info/
2 b687FfC4F1600eC Marie Malone Mckay PLC Robertsonburgh Botswana 283-236-9529 (189)129-8356x63741 kstafford@sexton.com 2021-05-14 http://www.reynolds.com/
3 9FF9ACbc69dcF9c Elijah Barrera Marks and Sons Kimbury Barbados 8252703789 459-916-7241x0909 jeanettecross@brown.com 2021-03-17 https://neal.com/
4 b49edDB1295FF6E Sheryl Montgomery Kirby, Vaughn and Sanders Briannaview Antarctica (the territory South of 60 deg S) 425.475.3586 (392)819-9063 thomassierra@barrett.com 2020-09-23 https://www.powell-bryan.com/
5 3dcCbFEB17CCf2E Jeremy Houston Lester-Manning South Brianna Micronesia +1-223-666-5313x4530 252-488-3850x692 rubenwatkins@jacobs-wallace.info 2020-09-18 https://www.carrillo.com/

You can do all the same SQL operations as you would with a local CSV file.

If you’re curious, here’s the User-Agent DuckDB sends along with requests:

duckdb -c "PRAGMA user_agent"
┌───────────────────────────────┐
│          user_agent           │
│            varchar            │
├───────────────────────────────┤
│ duckdb/v0.10.1(osx_arm64) cli │
└───────────────────────────────┘

We’ll be able to set all headers for HTTP connections very soon, so monitor the DuckDB changelog (or check back here every so often).