You need to read and work with CSV data that is streaming into stdin.
Solution
Leverage DuckDB’s build-in methods for working with data from stdin.
Discussion
There are times when you don’t feel like taking the extra step make a temporary file associated with a running shell command process just to get access to the output from said command. One example might be analyzing various aspects of Git history.
We can take advantage of DuckDB’s ability to read from stdin to avoid creating unnecessary cruft.
Let’s see who are the top authors of Git commits to RStudio’s IDE (what, you don’t keep an up-to-date fork/clone of RStudio lying around?):
cd ~/Development/clones/rstudiogit pull > /dev/null # get the latest commitsgit log --pretty=format:%as,%an |\duckdb-c"FROM read_csv('/dev/stdin', columns = { 'day': 'DATE', 'author': 'VARCHAR'})SELECT author, COUNT(*) AS n_commitsGROUP BY 1ORDER BY 2 DESC"
By having this data in a SQL context, we can ask questions that require some wrangling or use of many tools in other contexts. Like, Who is the top author this month?
cd ~/Development/clones/rstudiogit pull > /dev/null # get the latest commitsgit log --pretty=format:%as,%an |\duckdb-markdown-c"FROM read_csv('/dev/stdin', auto_detect = false, columns = { 'day': 'DATE', 'author': 'VARCHAR' })SELECT strftime(date_trunc('month', day), '%Y %B') AS month, author, COUNT(*) AS n_commits_this_monthWHERE date_trunc('month', day) = date_trunc('month', now()::DATE)GROUP BY 1, 2ORDER BY 3 DESC"
month
author
n_commits_this_month
2024 March
Kevin Ushey
50
2024 March
Gary Ritchie
41
2024 March
olivroy
16
2024 March
Zach Hannum
4
2024 March
GCRev
4
2024 March
Ashley Henry
4
2024 March
Graham Held
3
2024 March
Tim Mok
1
2024 March
Melissa Barca
1
2024 March
petetronic
1
2024 March
Mervin Fansler
1
2024 March
Pete
1
2024 March
Adam Foryś
1
Remember, you can always use tee to store a copy of the data you’re piping into DuckDB if you think you’ll need to reprocess it, especially if generating the CSV is resource intensive.