duckdb -c "
INSTALL httpfs;
LOAD httpfs;
"
Read A CSV File From The Web
Problem
You need to read and work with CSV data stored on the internets.
Solution
Treat web-hosted CSV files as database tables
Discussion
We’ll continue use the CSV data from Datablist. Except this time, we won’t worry about the curl
part.
To work with CSV files from the web, we need the help of the DuckDB httpfs
extension. This extension allows DuckDB to read and write CSV files from the web (amongst other things).
To use this extension, we need to install it and load it:
However, this can be annoying to do every time we start a new session. And, do we reall have to tell DuckDB about all new file formats and remember all the names?
The answer is “no”, and we can use the DuckDB config file (located at ~/.duckdbrc
) to help us avoid repeition.
Add the following to that file:
SET autoload_known_extensions = true;
SET autoinstall_known_extensions = true;
and DuckDB will automatically install and load the extension for us if it successfully guesses what we’re trying to do.
Now we can load the CSV file from the web:
duckdb -markdown -c "
FROM read_csv('https://drive.google.com/uc?id=1N1xoxgcw2K3d-49tlchXAWw4wuxLj7EV&export=download')
LIMIT 5;
"
Index | Customer Id | First Name | Last Name | Company | City | Country | Phone 1 | Phone 2 | Subscription Date | Website | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | ffeCAb7AbcB0f07 | Jared | Jarvis | Sanchez-Fletcher | Hatfieldshire | Eritrea | 274.188.8773x41185 | 001-215-760-4642x969 | gabriellehartman@benjamin.com | 2021-11-11 | https://www.mccarthy.info/ |
2 | b687FfC4F1600eC | Marie | Malone | Mckay PLC | Robertsonburgh | Botswana | 283-236-9529 | (189)129-8356x63741 | kstafford@sexton.com | 2021-05-14 | http://www.reynolds.com/ |
3 | 9FF9ACbc69dcF9c | Elijah | Barrera | Marks and Sons | Kimbury | Barbados | 8252703789 | 459-916-7241x0909 | jeanettecross@brown.com | 2021-03-17 | https://neal.com/ |
4 | b49edDB1295FF6E | Sheryl | Montgomery | Kirby, Vaughn and Sanders | Briannaview | Antarctica (the territory South of 60 deg S) | 425.475.3586 | (392)819-9063 | thomassierra@barrett.com | 2020-09-23 | https://www.powell-bryan.com/ |
5 | 3dcCbFEB17CCf2E | Jeremy | Houston | Lester-Manning | South Brianna | Micronesia | +1-223-666-5313x4530 | 252-488-3850x692 | rubenwatkins@jacobs-wallace.info | 2020-09-18 | https://www.carrillo.com/ |
You can do all the same SQL operations as you would with a local CSV file.
If you’re curious, here’s the User-Agent DuckDB sends along with requests:
duckdb -c "PRAGMA user_agent"
┌───────────────────────────────┐
│ user_agent │
│ varchar │
├───────────────────────────────┤
│ duckdb/v0.10.1(osx_arm64) cli │
└───────────────────────────────┘
We’ll be able to set all headers for HTTP connections very soon, so monitor the DuckDB changelog (or check back here every so often).