You’re coming from a R/{tidyverse}-esque environment and really need to be able to use factors for sorting and other operations.
Solution
Use DuckDB’s enums like you would R/{tidyverse} factors.
Discussion
Factors are very handy for ensuring values are in a certain domain and also for turning (sorting) chaos into order.
DuckDB has the concept of an enum type which represents a dictionary data structure with all possible unique values of a column. They’re similar to R’s factor type in that the numeric value is stored in the column, but you have access to the string value. This can save a great deal of space, especially in larger datasets.
┌────────────────────────────┬───────────────────┬───────┐
│ Activity │ measure │ value │
│ varchar │ varchar │ int64 │
├────────────────────────────┼───────────────────┼───────┤
│ Using a landline │ Very secure │ 16 │
│ Calling on your cell phone │ Very secure │ 9 │
│ Sending text messages │ Very secure │ 7 │
│ Sending email │ Very secure │ 5 │
│ Using chat or IM │ Very secure │ 4 │
│ Using social media sites │ Very secure │ 2 │
│ Using a landline │ Somewhat secure │ 51 │
│ Calling on your cell phone │ Somewhat secure │ 43 │
│ Sending text messages │ Somewhat secure │ 32 │
│ Sending email │ Somewhat secure │ 35 │
│ Using chat or IM │ Somewhat secure │ 25 │
│ Using social media sites │ Somewhat secure │ 14 │
│ Using a landline │ Not very secure │ 19 │
│ Calling on your cell phone │ Not very secure │ 29 │
│ Sending text messages │ Not very secure │ 37 │
│ Sending email │ Not very secure │ 36 │
│ Using chat or IM │ Not very secure │ 38 │
│ Using social media sites │ Not very secure │ 28 │
│ Using a landline │ Not at all secure │ 12 │
│ Calling on your cell phone │ Not at all secure │ 17 │
│ Sending text messages │ Not at all secure │ 22 │
│ Sending email │ Not at all secure │ 21 │
│ Using chat or IM │ Not at all secure │ 32 │
│ Using social media sites │ Not at all secure │ 53 │
├────────────────────────────┴───────────────────┴───────┤
│ 24 rows 3 columns │
└────────────────────────────────────────────────────────┘
If we were to sort the questions or answers, they’d be in “alphabetical” order. We can changer that by defining an enum. We’ll create one manually for the “feeling” folks had about various communications channels:
duckdb<<EOFCREATE TYPE feeling AS ENUM ('Activity', 'Very secure', 'Somewhat secure', 'Not very secure', 'Not at all secure');FROM read_ndjson('https://data.hrbrmstr.dev/pew-secure-comms.ndjson')SELECT activity, measure::feeling AS measure, valueORDER BY 2 DESCEOF
┌──────────────────────┬───────────────────────────────────────────────────────────────────────────────────────┬───────┐
│ Activity │ measure │ value │
│ varchar │ enum('activity', 'very secure', 'somewhat secure', 'not very secure', 'not at all s… │ int64 │
├──────────────────────┼───────────────────────────────────────────────────────────────────────────────────────┼───────┤
│ Using a landline │ Not at all secure │ 12 │
│ Calling on your ce… │ Not at all secure │ 17 │
│ Sending text messa… │ Not at all secure │ 22 │
│ Sending email │ Not at all secure │ 21 │
│ Using chat or IM │ Not at all secure │ 32 │
│ Using social media… │ Not at all secure │ 53 │
│ Using a landline │ Not very secure │ 19 │
│ Calling on your ce… │ Not very secure │ 29 │
│ Sending text messa… │ Not very secure │ 37 │
│ Sending email │ Not very secure │ 36 │
│ Using chat or IM │ Not very secure │ 38 │
│ Using social media… │ Not very secure │ 28 │
│ Using a landline │ Somewhat secure │ 51 │
│ Calling on your ce… │ Somewhat secure │ 43 │
│ Sending text messa… │ Somewhat secure │ 32 │
│ Sending email │ Somewhat secure │ 35 │
│ Using chat or IM │ Somewhat secure │ 25 │
│ Using social media… │ Somewhat secure │ 14 │
│ Using a landline │ Very secure │ 16 │
│ Calling on your ce… │ Very secure │ 9 │
│ Sending text messa… │ Very secure │ 7 │
│ Sending email │ Very secure │ 5 │
│ Using chat or IM │ Very secure │ 4 │
│ Using social media… │ Very secure │ 2 │
├──────────────────────┴───────────────────────────────────────────────────────────────────────────────────────┴───────┤
│ 24 rows 3 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
We aren’t limited to statically crafted enums. We can use the results of a SELECT statement to create one as well:
duckdb-markdown<<EOFCREATE TYPE feeling AS ENUM ('Activity', 'Very secure', 'Somewhat secure', 'Not very secure', 'Not at all secure');CREATE TABLE responses AS ( FROM read_ndjson('https://data.hrbrmstr.dev/pew-secure-comms.ndjson') SELECT activity, measure::feeling AS measure, value ORDER BY 2 DESC);CREATE TYPE activity AS ENUM ( FROM responses SELECT activity WHERE measure = 'Very secure' ORDER BY value DESC);PIVOT ( FROM responses SELECT activity::activity AS activity, measure, value ORDER BY 1, 2) ON measure USING first(value)EOF