Read Remote Files From AWS/Object Storage

Problem

You need to read and work with data stored in AWS or a compatible object storage service.

Solution

Treat files in AWS/Object Storage as database tables.

Discussion

DuckDB has extensive support for services with S3-compatible APIs.

Below is one way to access files on S3. This is also not the preferred way to do it, but it is the most straightforward. You should strongly consider using DuckDB temporary secrets.

time duckdb -c "
LOAD aws;
CALL load_aws_credentials('rud');
FROM read_parquet('s3://is.rud.taxi/*.parquet')
SELECT
  COUNT(*)
"
┌──────────────────────┬──────────────────────────┬──────────────────────┬───────────────┐
│ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
│       varchar        │         varchar          │       varchar        │    varchar    │
├──────────────────────┼──────────────────────────┼──────────────────────┼───────────────┤
│ AKIAYJHTHWVGZKR3GW7O │ <redacted>               │                      │ us-east-1     │
└──────────────────────┴──────────────────────────┴──────────────────────┴───────────────┘
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│      5980721 │
└──────────────┘

real    0m0.935s
user    0m0.093s
sys 0m0.020s