New Horizons

Saturday, 21 June 2025

I definitely didn’t plan this, but as I’m finally catching a few minutes to write, I find myself once again eleven days from an international move. I remember the final stretch feeling like a crunch last time, but seven years later, this is so much more overwhelming—because this time we’re moving a family of four, and trying to sell off and pack up a much larger place than half of the little apartment I shared back then. No matter how much progress we make, there always seems to be so much more left to do before we can move out of our apartment at the end of the month, and board a plane two days later.

I say “we”, but the vast majority of the work going into this move is being pushed forward by my incredible wife, with me playing a supporting role at best—trying to juggle keeping on top of my job, taking kids so that Megan can get focused time to work on this move, and somewhere in there handle the pieces of the move that can only come down to me. We’re both so tired.

But when I manage to get my head enough above water to see past all that, I am really excited for what’s next. For our next adventure. A fresh start. New horizons.

“New Horizons” has been my theme for 2025. We started this year knowing it was time to move this summer, though we didn’t know where. The first four or five months of the year were figuring that part out. We considered—and ultimately ruled out—a return to the US for a few years. Instead, it became clear to us that the right next step for our family was to move somewhere where Megan could pick up her career in international education again, and where our 4-year-old could start preschool—ideally at the same school. So Megan spent months submitting applications, interviewing, looking at openings from the Philippines to Italy to Uganda before—at about the eleventh hour—connecting with an international school in India that felt like a really good fit for her, for our daughter, and for our family. So with not much more than a month before we’d have to fly, we decided we were moving to India.

Needless to say, it has been a bit of a whirlwind since that decision. But when I actually stop and look past all the stuff filling my field of view, I’m excited for the future. I’m looking forward to being in South Asia again—though India itself will be new. Navigating a familiar but different-in-some-key-ways culture will be interesting. We managed to find a house already through some mutual contacts. We saw a video walkthrough last week and the place looks really great.

The other day I found myself reading that post from before that last big move, and what struck me was how bittersweet that move was. On some level, I was ready and eager to get back overseas, but I was also really sad to leave Blacksburg and my church there. This time, I’m mostly just ready for the new chapter. This season will always be meaningful. This is where I met my wife and we had our two kids. I’ll always look back on these memories fondly. The greatest loss will be having grandparents in the same city.

But this time, with much less internal conflict, it feels like time. We’re leaving the babies stage behind; we’re entering the school-aged era, and returning to both of us working. We’re moving to a part of the world that is meaningful to both of us and that we’re excited to share with our kids. We will be joining a community and environment that seems like it will be the restful, rejuvenating space we desperately need right now. Sitting here I’m looking at a lot that still needs to happen before wheels up in a week and a half, but I can’t wait to explore new horizons.

Unearthing My One Post From 2024

Tuesday, 10 June 2025

I just came across this draft from September, in which I not only wrote a post about some recreational coding/data analysis, but made a bunch of updates to my website to better handle code and tables. Instead of updating and publishing it with today’s date, I decided to pull a Gruber and backdate it to when it was basically done and just never published.^[1]

Of course, in my case, it’s not as if it’s getting buried under 11 years of prolific writing. It’s literally the post right below this one. ↩︎

Bringing SQL to a Python Fight

Tuesday, 3 September 2024

A couple weeks ago I read two posts by Dr. Drang. He was documenting the creation of a table for a previous blog post using Pandas, a data analysis package for Python. Working with data is my day job now, so it was interesting to follow his process and the updates he made in the follow-up post. Of course I got nerd-sniped, and just had to work out how I’d approach the problem with my own preferred tools.

This will be the most technical piece I’ve written here, so if wrangling code and crunching numbers sounds like a good time, read on.^[1]

The Problem

The original post—and the table in question—was looking at states’ percentage of the Electoral College vote compared to their population as a percentage of the US total. He started with a CSV containing data for each state. The header and first ten rows look like this:

State	Abbrev	Population	Electors
Alabama	AL	5108468	9
Alaska	AK	733406	3
Arizona	AZ	7431344	11
Arkansas	AR	3067732	6
California	CA	38965193	54
Colorado	CO	5877610	10
Connecticut	CT	3617176	7
Delaware	DE	1031890	3
District of Columbia	DC	678972	3
Florida	FL	22610726	30

From that he calculated this table for his post:

Electors	States	Pop Pct	EC Pct
3	AK, DE, DC, ND, SD, VT, WY	1.61%	3.90%
4	HI, ID, ME, MT, NH, RI, WV	3.04%	5.20%
5	NE, NM	1.22%	1.86%
6	AR, IA, KS, MS, NV, UT	5.60%	6.69%
7	CT, OK	2.29%	2.60%
8	KY, LA, OR	3.98%	4.46%
9	AL, SC	3.13%	3.35%
10	CO, MD, MN, MO, WI	8.93%	9.29%
11	AZ, IN, MA, TN	8.49%	8.18%
12	WA	2.33%	2.23%
13	VA	2.60%	2.42%
14	NJ	2.77%	2.60%
15	MI	3.00%	2.79%
16	GA, NC	6.53%	5.95%
17	OH	3.52%	3.16%
19	IL, PA	7.62%	7.06%
28	NY	5.84%	5.20%
30	FL	6.75%	5.58%
40	TX	9.11%	7.43%
54	CA	11.63%	10.04%

Both Dr. Drang and I write our posts in Markdown, so the plaintext version of the table looks like this:

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

The Tools

My go-to language for working with data is SQL. I turn to Python^[2] for things that involve more scripting (like loops or complex functions). But for most data analysis needs, I find SQL better suited to the job.

From personal projects to one-off data transformation/analysis tasks at work, I keep finding more and more uses for DuckDB—in its own words, “a fast in-process analytical database”. DuckDB can import (and export) a variety of file and database formats or even query them directly. It can also be used from within Python, which allows for workflows combining DuckDB and Pandas.

The Solution

I worked through this a couple different ways. The first time through was more piece-by-piece, and then I condensed that down to a one-shot query, which is what I used in a (very short) Python script to generate the final Markdown table.

I started by importing the CSV to a new table in the database:^[3]

create table states as from "states.csv";

I queried that table to get something like Dr. Drang’s initial summary table.

with totals as (
     -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from states
)

   select electors,
          string_agg(abbrev, ', ') as states,
          count(*) as num_states,
          sum(population) as population,
          sum(population/total_pop) as pop_pct,
          sum(electors) as electors_sum,
          sum(electors/total_electors) as ec_pct

     from states,
          totals
 group by electors;

This is where the meat of the aggregation happens.

In the first part, I sum the population and electors columns for the whole states table. This gives me a single-row result set which I use along with the states table in the second part.

The group by electors clause tells the query what to group, and all the other columns are aggregate functions for all rows (or states) with that number of electors. string_agg does the thing Dr. Drang wrote a custom lambda function for, combining the column values with a given separator—in this case, a comma and space.

Here’s DuckDB’s output:

┌──────────┬────────────────────────────┬────────────┬────────────┬──────────────────────┬──────────────┬──────────────────────┐
│ Electors │           states           │ num_states │ population │       pop_pct        │ electors_sum │        ec_pct        │
│  int64   │          varchar           │   int64    │   int128   │        double        │    int128    │        double        │
├──────────┼────────────────────────────┼────────────┼────────────┼──────────────────────┼──────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │          7 │    5379033 │ 0.016060895111876104 │           21 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │          7 │   10196485 │ 0.030445003050700387 │           28 │ 0.052044609665427496 │
│        5 │ NE, NM                     │          2 │    4092750 │ 0.012220268674524016 │           10 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │          6 │   18766882 │  0.05603477862637313 │           36 │  0.06691449814126393 │
│        7 │ CT, OK                     │          2 │    7671000 │  0.02290432618710494 │           14 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │          3 │   13333261 │  0.03981089285383978 │           24 │  0.04460966542750929 │
│        9 │ AL, SC                     │          2 │   10482023 │ 0.031297571880163765 │           18 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │          5 │   29902889 │  0.08928503762127392 │           50 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │          4 │   28421431 │  0.08486165119649276 │           44 │  0.08178438661710037 │
│       12 │ WA                         │          1 │    7812880 │ 0.023327956196155443 │           12 │ 0.022304832713754646 │
│       13 │ VA                         │          1 │    8715698 │ 0.026023620119971076 │           13 │ 0.024163568773234202 │
│       14 │ NJ                         │          1 │    9290841 │  0.02774090116236843 │           14 │ 0.026022304832713755 │
│       15 │ MI                         │          1 │   10037261 │ 0.029969586751284978 │           15 │ 0.027881040892193308 │
│       16 │ GA, NC                     │          2 │   21864718 │  0.06528440008617711 │           32 │  0.05947955390334572 │
│       17 │ OH                         │          1 │   11785935 │  0.03519083557033198 │           17 │ 0.031598513011152414 │
│       19 │ IL, PA                     │          2 │   25511372 │    0.076172700530384 │           38 │  0.07063197026022305 │
│       28 │ NY                         │          1 │   19571216 │  0.05843638575704434 │           28 │  0.05204460966542751 │
│       30 │ FL                         │          1 │   22610726 │  0.06751185551183085 │           30 │ 0.055762081784386616 │
│       40 │ TX                         │          1 │   30503301 │  0.09107776768184646 │           40 │  0.07434944237918216 │
│       54 │ CA                         │          1 │   38965193 │  0.11634356543025655 │           54 │  0.10037174721189591 │
├──────────┴────────────────────────────┴────────────┴────────────┴──────────────────────┴──────────────┴──────────────────────┤
│ 20 rows                                                                                                            7 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Finally, I rewrote the query to cut out the import step and select just the columns for the final table. For some reason, when doing it this way the table was no longer sorted by the electors column, even though it had been in the two-step query before. That can be remedied with order by electors:

with totals as (
        -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from "states.csv"
)

   select electors as "Electors",
          string_agg(abbrev, ', ') as "States",
          sum(population/total_pop) as "Pop Pct",
          sum(electors/total_electors) as "EC Pct"

     from "states.csv",
          totals
 group by electors
 order by electors;

This gets closer, but it’s not Markdown yet:

┌──────────┬────────────────────────────┬──────────────────────┬──────────────────────┐
│ Electors │           States           │       Pop Pct        │        EC Pct        │
│  int64   │          varchar           │        double        │        double        │
├──────────┼────────────────────────────┼──────────────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │ 0.016060895111876104 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │ 0.030445003050700387 │ 0.052044609665427496 │
│        5 │ NE, NM                     │ 0.012220268674524016 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │  0.05603477862637313 │  0.06691449814126393 │
│        7 │ CT, OK                     │  0.02290432618710494 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │  0.03981089285383978 │  0.04460966542750929 │
│        9 │ AL, SC                     │ 0.031297571880163765 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │  0.08928503762127392 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │  0.08486165119649276 │  0.08178438661710037 │
│       12 │ WA                         │ 0.023327956196155443 │ 0.022304832713754646 │
│       13 │ VA                         │ 0.026023620119971076 │ 0.024163568773234202 │
│       14 │ NJ                         │  0.02774090116236843 │ 0.026022304832713755 │
│       15 │ MI                         │ 0.029969586751284978 │ 0.027881040892193308 │
│       16 │ GA, NC                     │  0.06528440008617711 │  0.05947955390334572 │
│       17 │ OH                         │  0.03519083557033198 │ 0.031598513011152414 │
│       19 │ IL, PA                     │    0.076172700530384 │  0.07063197026022305 │
│       28 │ NY                         │  0.05843638575704434 │  0.05204460966542751 │
│       30 │ FL                         │  0.06751185551183085 │ 0.055762081784386616 │
│       40 │ TX                         │  0.09107776768184646 │  0.07434944237918216 │
│       54 │ CA                         │  0.11634356543025655 │  0.10037174721189591 │
├──────────┴────────────────────────────┴──────────────────────┴──────────────────────┤
│ 20 rows                                                                   4 columns │
└─────────────────────────────────────────────────────────────────────────────────────┘

To get it into a Markdown table, we can run this query from Python, convert it to a Pandas dataframe, and run that through the to_markdown function.

import duckdb
import pandas

df = duckdb.sql("""
    with totals as (
         select sum(population) as total_pop,
                sum(electors) as total_electors
           from 'states.csv'
    )
       select electors as 'Electors',
              string_agg(abbrev, ', ') as 'States',
              sum(population/total_pop) as 'Pop Pct',
              sum(electors/total_electors) as 'EC Pct'
         from 'states.csv',
              totals
     group by electors
     order by electors;
""").df()

print(df.to_markdown(
    index=False,
    floatfmt='.2%',
    colalign=['center', 'center', 'right', 'right']
))

I’ve made two tweaks to the to_markdown call from Dr. Drang’s code. It was adding an index column by default, so I’ve disabled that. Second, since I’ve already renamed the columns in SQL, the headers parameter is no longer needed.

This code combines everything: reading from the CSV, transforming it to the new table, and converting the results (via Pandas) to Markdown. So all we have to do is run this script in the same working directory as states.csv.

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

Success!

Joke’s on me: writing this post and updating my website to handle all these tables and code blocks nicely ended up being way more involved than solving the original problem. ↩︎
Some Pandas, but mostly Snowpark Python, which is Pandas-inspired but native to Snowflake, the data warehouse we use at work. ↩︎
To pick this apart a bit: I could read the contents of the file with select * from "states.csv", which gives me every column (and row) of the data. DuckDB has a convenient syntax for those select * queries—namely, you can drop the select * and just type from "states.csv". Here I’m taking it a step further and creating a table from the results of that query. ↩︎

Three Sixty-Five

Sunday, 31 December 2023

When I wrote back in January I was just embarking on a 365. I planned to take a photo a day for the duration of the year, publishing them to Glass as I went. Somehow, the year is already drawing to a close—and as I write I have one photo left to take.

I’m glad I decided to take the plunge, and I’m glad it’s over. This was a drastic shift from only posting photos from time to time, and only sharing the ones I was reasonably proud of. It felt like a tilt towards quantity over quality. I remember approaching 100 days and wondering how I was going to keep it up for another 265. After the novelty of the first couple weeks had worn off, the next couple months were the drought. I regularly found myself camera in hand after 11pm, trying to find a photo of something around the house so I could just go to bed. The real low point was a couple sad slices of Domino’s pizza.

As I pushed through, though, things improved. I got more diligent about taking photos earlier in the day. Even when I did end up shooting things around the house in the evening, once I got past the obvious stuff, I started finding more interesting subjects, lighting, and camera angles. Or sometimes I just went for a late-night walk around the neighborhood.

When possible, I found inspiration in Glass’s monthly categories. In lieu of hashtags, there is a list of available categories, up to three of which can be added to each photo. Each month the Glass team adds and highlights a new one, which becomes a fun focal point for the community. Given the abundance of street cats here, I expected to take more cat photos than I did when they made April the month of the cat—but I did get featured among the staff favorites at the end of the month, and I’ve continued to add to the ranks of the cats of Glass as the year has gone on.

Overall, I think my average quality towards the end of the year was markedly better than the first part. The practice of carrying a camera almost everywhere and looking for photos in the everyday began to reinforce itself. Getting a lightweight, weather-resistant pancake lens made a huge difference to the ease of keeping a camera on hand. This past month I’ve also been using my new iPhone camera a lot, which has paired well with Glass’s new Everyday category for December.

I’m looking forward to putting this project behind me while taking some of the gains with me. I’m looking forward to feeling the freedom to be more selective in what I post, and to publishing photos from the archives—especially finding good ones when new monthly categories are added. On the other hand, I hope to settle into more frequent sharing and more everyday/street photography than my pre-2023 average. In short, I hope the stretching I have felt through this project is at least somewhat inelastic.

Diapers & Coffee

Sunday, 19 February 2023

I’ve started what I guess could be described as a parenting blog. This idea came up sometime toward the end of last year, when Megan commented on my habit of reasoning very matter-of-factly with our daughter about the kind of ridiculous things that come up in parenting a toddler. She thought it would be entertaining to save some of these situations, exchanges, and one-liners for…posterity?

As recounted in my opening post, the name came from an exchange in probably the spring or fall, when we were still eating brunch on our terrace in pleasant weather. The phrase “diapers and coffee” came back to me when I started kicking around the idea for this blog, and then I had the inspiration for the domain, diapers.coffee, which was too perfect.^[1]

I expressly wanted to keep this a very lightweight, short-form blog to which I could post quickly and often—but I couldn’t help but spin my tires for two months trying to put together the perfect system to allow me to do that. I let that be the enemy of the good until this weekend, when I realized I needed to cut loose the distraction that was all the custom code, and focus on launching with something simple.

So that’s what I’ve done.^[2] Enjoy.

I just looked and saw that I registered diapers.coffee two months ago to the day. ↩︎
For those interested, ~~I’m using Blot, a static site generator that automatically turns a folder of Markdown files into a website. I stuck with the default blog template instead of rolling my own, at least for now.~~ (Update 2023-03-19: I'm now using Eleventy—a more configurable, open-source static site generator—deployed on ~~GitHub Pages~~ (Update 2024-01-11: Cloudflare Pages).) ↩︎