Skip to main content

Global Variables

Bringing SQL to a Python Fight

A couple weeks ago I read two posts by Dr. Drang. He was documenting the creation of a table for a previous blog post using Pandas, a data analysis package for Python. Working with data is my day job now, so it was interesting to follow his process and the updates he made in the follow-up post. Of course I got nerd-sniped, and just had to work out how I’d approach the problem with my own preferred tools.

This will be the most technical piece I’ve written here, so if wrangling code and crunching numbers sounds like a good time, read on.[1]

The Problem

The original post—and the table in question—was looking at states’ percentage of the Electoral College vote compared to their population as a percentage of the US total. He started with a CSV containing data for each state. The header and first ten rows look like this:

State Abbrev Population Electors
Alabama AL 5108468 9
Alaska AK 733406 3
Arizona AZ 7431344 11
Arkansas AR 3067732 6
California CA 38965193 54
Colorado CO 5877610 10
Connecticut CT 3617176 7
Delaware DE 1031890 3
District of Columbia DC 678972 3
Florida FL 22610726 30

From that he calculated this table for his post:

Electors States Pop Pct EC Pct
3 AK, DE, DC, ND, SD, VT, WY 1.61% 3.90%
4 HI, ID, ME, MT, NH, RI, WV 3.04% 5.20%
5 NE, NM 1.22% 1.86%
6 AR, IA, KS, MS, NV, UT 5.60% 6.69%
7 CT, OK 2.29% 2.60%
8 KY, LA, OR 3.98% 4.46%
9 AL, SC 3.13% 3.35%
10 CO, MD, MN, MO, WI 8.93% 9.29%
11 AZ, IN, MA, TN 8.49% 8.18%
12 WA 2.33% 2.23%
13 VA 2.60% 2.42%
14 NJ 2.77% 2.60%
15 MI 3.00% 2.79%
16 GA, NC 6.53% 5.95%
17 OH 3.52% 3.16%
19 IL, PA 7.62% 7.06%
28 NY 5.84% 5.20%
30 FL 6.75% 5.58%
40 TX 9.11% 7.43%
54 CA 11.63% 10.04%

Both Dr. Drang and I write our posts in Markdown, so the plaintext version of the table looks like this:

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

The Tools

My go-to language for working with data is SQL. I turn to Python[2] for things that involve more scripting (like loops or complex functions). But for most data analysis needs, I find SQL better suited to the job.

From personal projects to one-off data transformation/analysis tasks at work, I keep finding more and more uses for DuckDB—in its own words, “a fast in-process analytical database”. DuckDB can import (and export) a variety of file and database formats or even query them directly. It can also be used from within Python, which allows for workflows combining DuckDB and Pandas.

The Solution

I worked through this a couple different ways. The first time through was more piece-by-piece, and then I condensed that down to a one-shot query, which is what I used in a (very short) Python script to generate the final Markdown table.

I started by importing the CSV to a new table in the database:[3]

create table states as from "states.csv";

I queried that table to get something like Dr. Drang’s initial summary table.

with totals as (
     -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from states
)

   select electors,
          string_agg(abbrev, ', ') as states,
          count(*) as num_states,
          sum(population) as population,
          sum(population/total_pop) as pop_pct,
          sum(electors) as electors_sum,
          sum(electors/total_electors) as ec_pct

     from states,
          totals
 group by electors;

This is where the meat of the aggregation happens.

In the first part, I sum the population and electors columns for the whole states table. This gives me a single-row result set which I use along with the states table in the second part.

The group by electors clause tells the query what to group, and all the other columns are aggregate functions for all rows (or states) with that number of electors. string_agg does the thing Dr. Drang wrote a custom lambda function for, combining the column values with a given separator—in this case, a comma and space.

Here’s DuckDB’s output:

┌──────────┬────────────────────────────┬────────────┬────────────┬──────────────────────┬──────────────┬──────────────────────┐
│ Electors │           states           │ num_states │ population │       pop_pct        │ electors_sum │        ec_pct        │
│  int64   │          varchar           │   int64    │   int128   │        double        │    int128    │        double        │
├──────────┼────────────────────────────┼────────────┼────────────┼──────────────────────┼──────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │          7 │    5379033 │ 0.016060895111876104 │           21 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │          7 │   10196485 │ 0.030445003050700387 │           28 │ 0.052044609665427496 │
│        5 │ NE, NM                     │          2 │    4092750 │ 0.012220268674524016 │           10 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │          6 │   18766882 │  0.05603477862637313 │           36 │  0.06691449814126393 │
│        7 │ CT, OK                     │          2 │    7671000 │  0.02290432618710494 │           14 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │          3 │   13333261 │  0.03981089285383978 │           24 │  0.04460966542750929 │
│        9 │ AL, SC                     │          2 │   10482023 │ 0.031297571880163765 │           18 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │          5 │   29902889 │  0.08928503762127392 │           50 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │          4 │   28421431 │  0.08486165119649276 │           44 │  0.08178438661710037 │
│       12 │ WA                         │          1 │    7812880 │ 0.023327956196155443 │           12 │ 0.022304832713754646 │
│       13 │ VA                         │          1 │    8715698 │ 0.026023620119971076 │           13 │ 0.024163568773234202 │
│       14 │ NJ                         │          1 │    9290841 │  0.02774090116236843 │           14 │ 0.026022304832713755 │
│       15 │ MI                         │          1 │   10037261 │ 0.029969586751284978 │           15 │ 0.027881040892193308 │
│       16 │ GA, NC                     │          2 │   21864718 │  0.06528440008617711 │           32 │  0.05947955390334572 │
│       17 │ OH                         │          1 │   11785935 │  0.03519083557033198 │           17 │ 0.031598513011152414 │
│       19 │ IL, PA                     │          2 │   25511372 │    0.076172700530384 │           38 │  0.07063197026022305 │
│       28 │ NY                         │          1 │   19571216 │  0.05843638575704434 │           28 │  0.05204460966542751 │
│       30 │ FL                         │          1 │   22610726 │  0.06751185551183085 │           30 │ 0.055762081784386616 │
│       40 │ TX                         │          1 │   30503301 │  0.09107776768184646 │           40 │  0.07434944237918216 │
│       54 │ CA                         │          1 │   38965193 │  0.11634356543025655 │           54 │  0.10037174721189591 │
├──────────┴────────────────────────────┴────────────┴────────────┴──────────────────────┴──────────────┴──────────────────────┤
│ 20 rows                                                                                                            7 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Finally, I rewrote the query to cut out the import step and select just the columns for the final table. For some reason, when doing it this way the table was no longer sorted by the electors column, even though it had been in the two-step query before. That can be remedied with order by electors:

with totals as (
        -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from "states.csv"
)

   select electors as "Electors",
          string_agg(abbrev, ', ') as "States",
          sum(population/total_pop) as "Pop Pct",
          sum(electors/total_electors) as "EC Pct"

     from "states.csv",
          totals
 group by electors
 order by electors;

This gets closer, but it’s not Markdown yet:

┌──────────┬────────────────────────────┬──────────────────────┬──────────────────────┐
│ Electors │           States           │       Pop Pct        │        EC Pct        │
│  int64   │          varchar           │        double        │        double        │
├──────────┼────────────────────────────┼──────────────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │ 0.016060895111876104 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │ 0.030445003050700387 │ 0.052044609665427496 │
│        5 │ NE, NM                     │ 0.012220268674524016 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │  0.05603477862637313 │  0.06691449814126393 │
│        7 │ CT, OK                     │  0.02290432618710494 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │  0.03981089285383978 │  0.04460966542750929 │
│        9 │ AL, SC                     │ 0.031297571880163765 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │  0.08928503762127392 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │  0.08486165119649276 │  0.08178438661710037 │
│       12 │ WA                         │ 0.023327956196155443 │ 0.022304832713754646 │
│       13 │ VA                         │ 0.026023620119971076 │ 0.024163568773234202 │
│       14 │ NJ                         │  0.02774090116236843 │ 0.026022304832713755 │
│       15 │ MI                         │ 0.029969586751284978 │ 0.027881040892193308 │
│       16 │ GA, NC                     │  0.06528440008617711 │  0.05947955390334572 │
│       17 │ OH                         │  0.03519083557033198 │ 0.031598513011152414 │
│       19 │ IL, PA                     │    0.076172700530384 │  0.07063197026022305 │
│       28 │ NY                         │  0.05843638575704434 │  0.05204460966542751 │
│       30 │ FL                         │  0.06751185551183085 │ 0.055762081784386616 │
│       40 │ TX                         │  0.09107776768184646 │  0.07434944237918216 │
│       54 │ CA                         │  0.11634356543025655 │  0.10037174721189591 │
├──────────┴────────────────────────────┴──────────────────────┴──────────────────────┤
│ 20 rows                                                                   4 columns │
└─────────────────────────────────────────────────────────────────────────────────────┘

To get it into a Markdown table, we can run this query from Python, convert it to a Pandas dataframe, and run that through the to_markdown function.

import duckdb
import pandas

df = duckdb.sql("""
    with totals as (
         select sum(population) as total_pop,
                sum(electors) as total_electors
           from 'states.csv'
    )
       select electors as 'Electors',
              string_agg(abbrev, ', ') as 'States',
              sum(population/total_pop) as 'Pop Pct',
              sum(electors/total_electors) as 'EC Pct'
         from 'states.csv',
              totals
     group by electors
     order by electors;
""").df()

print(df.to_markdown(
    index=False,
    floatfmt='.2%',
    colalign=['center', 'center', 'right', 'right']
))

I’ve made two tweaks to the to_markdown call from Dr. Drang’s code. It was adding an index column by default, so I’ve disabled that. Second, since I’ve already renamed the columns in SQL, the headers parameter is no longer needed.

This code combines everything: reading from the CSV, transforming it to the new table, and converting the results (via Pandas) to Markdown. So all we have to do is run this script in the same working directory as states.csv.

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

Success!


  1. Joke’s on me: writing this post and updating my website to handle all these tables and code blocks nicely ended up being way more involved than solving the original problem. ↩︎

  2. Some Pandas, but mostly Snowpark Python, which is Pandas-inspired but native to Snowflake, the data warehouse we use at work. ↩︎

  3. To pick this apart a bit: I could read the contents of the file with select * from "states.csv", which gives me every column (and row) of the data. DuckDB has a convenient syntax for those select * queries—namely, you can drop the select * and just type from "states.csv". Here I’m taking it a step further and creating a table from the results of that query. ↩︎

Three Sixty-Five

When I wrote back in January I was just embarking on a 365. I planned to take a photo a day for the duration of the year, publishing them to Glass as I went. Somehow, the year is already drawing to a close—and as I write I have one photo left to take.

I’m glad I decided to take the plunge, and I’m glad it’s over. This was a drastic shift from only posting photos from time to time, and only sharing the ones I was reasonably proud of. It felt like a tilt towards quantity over quality. I remember approaching 100 days and wondering how I was going to keep it up for another 265. After the novelty of the first couple weeks had worn off, the next couple months were the drought. I regularly found myself camera in hand after 11pm, trying to find a photo of something around the house so I could just go to bed. The real low point was a couple sad slices of Domino’s pizza.

As I pushed through, though, things improved. I got more diligent about taking photos earlier in the day. Even when I did end up shooting things around the house in the evening, once I got past the obvious stuff, I started finding more interesting subjects, lighting, and camera angles. Or sometimes I just went for a late-night walk around the neighborhood.

When possible, I found inspiration in Glass’s monthly categories. In lieu of hashtags, there is a list of available categories, up to three of which can be added to each photo. Each month the Glass team adds and highlights a new one, which becomes a fun focal point for the community. Given the abundance of street cats here, I expected to take more cat photos than I did when they made April the month of the cat—but I did get featured among the staff favorites at the end of the month, and I’ve continued to add to the ranks of the cats of Glass as the year has gone on.

Overall, I think my average quality towards the end of the year was markedly better than the first part. The practice of carrying a camera almost everywhere and looking for photos in the everyday began to reinforce itself. Getting a lightweight, weather-resistant pancake lens made a huge difference to the ease of keeping a camera on hand. This past month I’ve also been using my new iPhone camera a lot, which has paired well with Glass’s new Everyday category for December.

I’m looking forward to putting this project behind me while taking some of the gains with me. I’m looking forward to feeling the freedom to be more selective in what I post, and to publishing photos from the archives—especially finding good ones when new monthly categories are added. On the other hand, I hope to settle into more frequent sharing and more everyday/street photography than my pre-2023 average. In short, I hope the stretching I have felt through this project is at least somewhat inelastic.

Diapers & Coffee

I’ve started what I guess could be described as a parenting blog. This idea came up sometime toward the end of last year, when Megan commented on my habit of reasoning very matter-of-factly with our daughter about the kind of ridiculous things that come up in parenting a toddler. She thought it would be entertaining to save some of these situations, exchanges, and one-liners for…posterity?

As recounted in my opening post, the name came from an exchange in probably the spring or fall, when we were still eating brunch on our terrace in pleasant weather. The phrase “diapers and coffee” came back to me when I started kicking around the idea for this blog, and then I had the inspiration for the domain, diapers.coffee, which was too perfect.[1]

I expressly wanted to keep this a very lightweight, short-form blog to which I could post quickly and often—but I couldn’t help but spin my tires for two months trying to put together the perfect system to allow me to do that. I let that be the enemy of the good until this weekend, when I realized I needed to cut loose the distraction that was all the custom code, and focus on launching with something simple.

So that’s what I’ve done.[2] Enjoy.


  1. I just looked and saw that I registered diapers.coffee two months ago to the day. ↩︎

  2. For those interested, I’m using Blot, a static site generator that automatically turns a folder of Markdown files into a website. I stuck with the default blog template instead of rolling my own, at least for now. (Update 2023-03-19: I’m now using Eleventy—a more configurable, open-source static site generator—deployed on GitHub Pages (Update 2024-01-11: Cloudflare Pages).) ↩︎

Olive branches on the tree, in focus with heavy background blur.

2023 in Focus

It’s been a couple years since I’ve written about a yearly theme, but I wanted to post something about this year’s. I think it is helpful to put a public stake in the ground, and since I don’t have a podcast to talk about these on anymore, here we are.

The word that kept coming to mind towards the end of last year was focus. It initially felt a bit on the nose, but what made me come around on it was thinking of focus as a photography term. I picture spinning a focus ring to isolate one thing at a time, or adjusting the aperture to bring more things into focus at once if I want the bigger picture. It’s a useful lens[1] to approach the year with.

The impetus for this theme came from two observations. The glaring one is that I’ve been trying to do way too many things in life lately. Secondarily, I’ve had a hard time doing any one thing for long without getting sidetracked. Sure, these are nothing new—but I hit a breaking point with them in the last third or so of 2022.

A large part of what I need to do this year is is to make the hard calls about what is important for me to give my time and focus to, and to pare back everything else. The important things include being a husband to my wife and a dad to our one-and-a-half-year-old daughter. They include my work—not just because it pays the bills, but because going into this year I have an exciting new role that I want to have the headspace to do well. That means making the space to focus on things that will contribute to my mental and emotional wellbeing—particularly creative outlets like photography, writing, and perhaps filmmaking. It also means taking intentional steps to ensure that when I’m doing any one of these things, it is the one thing in focus and everything else fades into a nice background blur.[2]

One creative outlet I want to pay attention to is photography. The launch of Glass in late 2021 and my purchase of a new camera this past spring together spurred a renewed interest in this creative hobby for me. I’ve been inspired to branch out from the travel and landscape photos I take on trips and hikes and to start carrying my camera around town to take photos of the everyday. This year I want to foster that artistic expression by embarking on a 365: inspired by another photographer on Glass, I plan to post a photo each day that was taken that day. The first one is up; the rest will be published to my profile each day. This is the most resolution-y (no pun intended) part of this theme, but it is a daily habit I’d like to cultivate, and it dovetails well with the idea of Focus.

As usual, a yearly theme is meant to be a North Star in making decisions throughout the year, not a specific set of pass/fail objectives.[3] All of this is just a snapshot of some of my thoughts going into this year.[4]

So begins the Year of Focus.


  1. See what I did there? ↩︎

  2. A couple paragraphs into this, I pulled out a notebook and pen to write the rest of it; doing so made a night-and-day difference to being able to tune everything else out and let the words flow onto the page. ↩︎

  3. Basically the opposite of those SMART goals they used to make us write in school. ↩︎

  4. I also need to go in for another vision checkup and probably get a new pair of glasses. ↩︎

A screenshot of the first few blocks of the Thing Builder shortcut on an iPad

Thing Builder

Today I’m taking the wraps off a side project I started almost three years ago. It started with me wishing for project templates in my task manager, Things. At the time, I was using Drafts a lot for text automation, and I found a couple different user-created text parsers for Things in the Action Directory. Inspired by these, I wanted something more Markdown-like, and as I thought about it, something in Shortcuts and not tied to a single third-party app.[1]

I created the Thing Builder, a 250-action shortcut that accepts text using a particular markup, which can be used to define and create a new Things project with all its associated tasks and dates, or to batch-add tasks to existing projects or areas. In addition to using the Markdown syntax for headings and bullets, I chose a handful of symbols, which can prefix a string of text to mark it as a list, date, tag, or note.

I did this in July 2019, but I decided to sit on it for a while before sharing it—mainly to actually put the tool through its paces and iron out any wrinkles. I’d also run into a bug with one of Shortcuts’ actions while initially working on this. (I was running an iOS beta, after all.) I found a workaround for the time being, but I did want to come back and use the more elegant solution whenever that got taken care of.[2]

This has been longer of “a while” than I expected—but in the meantime, I’ve used it a lot, made a few tweaks, and built several other shortcuts that themselves call the Thing Builder to generate a project. I have several templates for work that I use on a regular basis. This outlasted another fling with the Bullet Journal and was waiting for me when I came back to the warm embrace of Things in January. At some point I’d seen that that buggy action was working again, so I’d had swapping those actions out on my to-do list for some time, along with writing documentation and sharing the shortcut (all in a Things project, of course). Yesterday that caught my eye, I made the updates, and then I went ahead with writing docs. And here we are.

If you’re looking for a way to create reusable project templates for Things—or just add a bunch of tasks from a text editor—you can find a link to download the shortcut and an explanation of the syntax in the documentation.


  1. Though there were plenty of times during development where I would’ve rather been working with code in a text editor for something of this scale, using Shortcuts turned out to be a good move: I’ve long since moved on from Drafts, but the shortcut I built is alive and well, and accessible anywhere through the iOS share sheet. ↩︎

  2. It was the “Get Group from Matched Text” action, for extracting sub-patterns from text matched with a regular expression. In those cases, I ended up using the “Match Pattern” action again to match within the match. ↩︎