Skip to main content

Global Variables

Tuesdays and Travels

I started this draft at 5:30 in the morning on Monday, waiting to board a flight in Delhi. I’m picking it back up to (hopefully) publish it as I wait for another flight at Heathrow, after three days at a data conference in London. This was my second time at this particular conference, and the fourth year running I’ve gone to a data conference in London in the fall.

This year was the first time I had to request an ETA for entry into the UK—not an estimated time of arrival, but an electronic travel authorization. Some weeks ago I downloaded the UK ETA app and used it to submit a scan of my passport, a scan of my face, and other information. I got the approval in my inbox almost immediately, but it was an extra step where in the past I’d been able to just land in the UK, walk through their automated gates with my passport, and be in. My arrival this year was the same experience, but only because the ETA on my passport was already in their system for this next year.

Why am I writing about data and passports? Well, fittingly, I just did some recreational data visualization on the topic as part of Tidy Tuesday, a weekly data project I’ve started participating in the month or so. The one from a couple weeks ago was on the Henley Passport Index, which measures the strength of passports based on visa-free access to other countries. The contributor of this dataset was inspired by an article about the strength of US passports declining. The US passport is still #10 on the rankings, but that’s down from #1 in 2014. So that news is pretty relevant to me, one of a generation of TCKs and now digital nomads who grew up thinking the world was our oyster—and a US passport-holder myself.

I could have done a visualization tracking the change in rankings over time as addressed in that article, but as I explored this dataset, I became more intrigued in looking at the data as a two-way street—essentially, how does access to a country for other nationalities correspond to the access granted to its nationals abroad? It took a few attempts to figure out how best to represent that, but I landed on what’s called a Sankey chart—typically used to represent flow. I set it up to show the breakdown of visa requirement categories as designated in the data—for entering a given country on the left side, and for passport holders entering others on the right. The page on my Tidy Tuesday site has an interactive version with a dropdown menu to pick the country. Here’s the US:

Sankey diagram titled 'Oyster Quotient: Visa requirements to and from United States' showing the flow of visa requirements between the United States and other countries. On the left side, the largest flow shows 'Visa Required' for foreign nationals traveling to the US, while on the right side, the largest flow shows 'Visa-free Access' for US citizens traveling abroad. Other categories include Electronic Travel Authorization, Online Visa, and Visa on Arrival options. The visualization illustrates the asymmetric nature of visa requirements, where most foreign nationals need visas to enter the US, but US citizens enjoy visa-free access to most international destinations.

So I can see myself in that little slice of US passport holders applying for ETAs in one of 10 countries. But this also clearly shows what I was already pretty sure I knew: that while a majority of countries (at least for now) allow US citizens to visit with no visa at all—a privilege that I’ve taken advantage of plenty—an even larger majority of nationalities are required to apply for a visa to enter the United States. The number allowed to arrive visa-free? Four.

I wonder how many Americans would even think twice about this imbalance. Or how many would defend it?

This also doesn’t account for the difficulty of attaining visas, where visas are required.[1]

Anyway, I’d been wanting to write about these Tidy Tuesday projects since I started working through them, and then this particular one hit pretty close to home for me. There are a few other datasets up there too—including last week where I tried to look at bread (or more technically wheat/flour) and rice across cuisines.

You may have noticed it says “Global Variables” at the top of the page now. I’ve been sitting on that name for a couple months. I was going to tie it to a redesigned site, but while I haven’t gotten around to the redesign, this felt like an appropriate first post under that masthead. Global variables are, of course, a programming term, with an appropriate double meaning for someone like me—someone who, as I mentioned, grew up in a world that seemed increasingly connected and closer together—where “globalization” was only considered a positive word, bringing with it a literal world of opportunities. That starry-eyed optimism is starting to show some cracks in the cold light of reality—as countries around the world seem to be pulling away from that glimmer of a more interconnected globe. Megan and I still choose this life for ourselves and for our kids. We want to give them the same global perspective we had the privilege of growing up with. It is increasingly important as so much of the world pendulum-swings in the other direction. I hope something of that world survives long enough for my kids to experience it and learn from it.


  1. Like, say, an H-1B visa. Just to name a totally random example. ↩︎

New Horizons

I definitely didn’t plan this, but as I’m finally catching a few minutes to write, I find myself once again eleven days from an international move. I remember the final stretch feeling like a crunch last time, but seven years later, this is so much more overwhelming—because this time we’re moving a family of four, and trying to sell off and pack up a much larger place than half of the little apartment I shared back then. No matter how much progress we make, there always seems to be so much more left to do before we can move out of our apartment at the end of the month, and board a plane two days later.

I say “we”, but the vast majority of the work going into this move is being pushed forward by my incredible wife, with me playing a supporting role at best—trying to juggle keeping on top of my job, taking kids so that Megan can get focused time to work on this move, and somewhere in there handle the pieces of the move that can only come down to me. We’re both so tired.

But when I manage to get my head enough above water to see past all that, I am really excited for what’s next. For our next adventure. A fresh start. New horizons.

“New Horizons” has been my theme for 2025. We started this year knowing it was time to move this summer, though we didn’t know where. The first four or five months of the year were figuring that part out. We considered—and ultimately ruled out—a return to the US for a few years. Instead, it became clear to us that the right next step for our family was to move somewhere where Megan could pick up her career in international education again, and where our 4-year-old could start preschool—ideally at the same school. So Megan spent months submitting applications, interviewing, looking at openings from the Philippines to Italy to Uganda before—at about the eleventh hour—connecting with an international school in India that felt like a really good fit for her, for our daughter, and for our family. So with not much more than a month before we’d have to fly, we decided we were moving to India.

Needless to say, it has been a bit of a whirlwind since that decision. But when I actually stop and look past all the stuff filling my field of view, I’m excited for the future. I’m looking forward to being in South Asia again—though India itself will be new. Navigating a familiar but different-in-some-key-ways culture will be interesting. We managed to find a house already through some mutual contacts. We saw a video walkthrough last week and the place looks really great.

The other day I found myself reading that post from before that last big move, and what struck me was how bittersweet that move was. On some level, I was ready and eager to get back overseas, but I was also really sad to leave Blacksburg and my church there. This time, I’m mostly just ready for the new chapter. This season will always be meaningful. This is where I met my wife and we had our two kids. I’ll always look back on these memories fondly. The greatest loss will be having grandparents in the same city.

But this time, with much less internal conflict, it feels like time. We’re leaving the babies stage behind; we’re entering the school-aged era, and returning to both of us working. We’re moving to a part of the world that is meaningful to both of us and that we’re excited to share with our kids. We will be joining a community and environment that seems like it will be the restful, rejuvenating space we desperately need right now. Sitting here I’m looking at a lot that still needs to happen before wheels up in a week and a half, but I can’t wait to explore new horizons.

Unearthing My One Post From 2024

I just came across this draft from September, in which I not only wrote a post about some recreational coding/data analysis, but made a bunch of updates to my website to better handle code and tables. Instead of updating and publishing it with today’s date, I decided to pull a Gruber and backdate it to when it was basically done and just never published.[1]


  1. Of course, in my case, it’s not as if it’s getting buried under 11 years of prolific writing. It’s literally the post right below this one. ↩︎

Bringing SQL to a Python Fight

A couple weeks ago I read two posts by Dr. Drang. He was documenting the creation of a table for a previous blog post using Pandas, a data analysis package for Python. Working with data is my day job now, so it was interesting to follow his process and the updates he made in the follow-up post. Of course I got nerd-sniped, and just had to work out how I’d approach the problem with my own preferred tools.

This will be the most technical piece I’ve written here, so if wrangling code and crunching numbers sounds like a good time, read on.[1]

The Problem

The original post—and the table in question—was looking at states’ percentage of the Electoral College vote compared to their population as a percentage of the US total. He started with a CSV containing data for each state. The header and first ten rows look like this:

State Abbrev Population Electors
Alabama AL 5108468 9
Alaska AK 733406 3
Arizona AZ 7431344 11
Arkansas AR 3067732 6
California CA 38965193 54
Colorado CO 5877610 10
Connecticut CT 3617176 7
Delaware DE 1031890 3
District of Columbia DC 678972 3
Florida FL 22610726 30

From that he calculated this table for his post:

Electors States Pop Pct EC Pct
3 AK, DE, DC, ND, SD, VT, WY 1.61% 3.90%
4 HI, ID, ME, MT, NH, RI, WV 3.04% 5.20%
5 NE, NM 1.22% 1.86%
6 AR, IA, KS, MS, NV, UT 5.60% 6.69%
7 CT, OK 2.29% 2.60%
8 KY, LA, OR 3.98% 4.46%
9 AL, SC 3.13% 3.35%
10 CO, MD, MN, MO, WI 8.93% 9.29%
11 AZ, IN, MA, TN 8.49% 8.18%
12 WA 2.33% 2.23%
13 VA 2.60% 2.42%
14 NJ 2.77% 2.60%
15 MI 3.00% 2.79%
16 GA, NC 6.53% 5.95%
17 OH 3.52% 3.16%
19 IL, PA 7.62% 7.06%
28 NY 5.84% 5.20%
30 FL 6.75% 5.58%
40 TX 9.11% 7.43%
54 CA 11.63% 10.04%

Both Dr. Drang and I write our posts in Markdown, so the plaintext version of the table looks like this:

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

The Tools

My go-to language for working with data is SQL. I turn to Python[2] for things that involve more scripting (like loops or complex functions). But for most data analysis needs, I find SQL better suited to the job.

From personal projects to one-off data transformation/analysis tasks at work, I keep finding more and more uses for DuckDB—in its own words, “a fast in-process analytical database”. DuckDB can import (and export) a variety of file and database formats or even query them directly. It can also be used from within Python, which allows for workflows combining DuckDB and Pandas.

The Solution

I worked through this a couple different ways. The first time through was more piece-by-piece, and then I condensed that down to a one-shot query, which is what I used in a (very short) Python script to generate the final Markdown table.

I started by importing the CSV to a new table in the database:[3]

create table states as from "states.csv";

I queried that table to get something like Dr. Drang’s initial summary table.

with totals as (
     -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from states
)

   select electors,
          string_agg(abbrev, ', ') as states,
          count(*) as num_states,
          sum(population) as population,
          sum(population/total_pop) as pop_pct,
          sum(electors) as electors_sum,
          sum(electors/total_electors) as ec_pct

     from states,
          totals
 group by electors;

This is where the meat of the aggregation happens.

In the first part, I sum the population and electors columns for the whole states table. This gives me a single-row result set which I use along with the states table in the second part.

The group by electors clause tells the query what to group, and all the other columns are aggregate functions for all rows (or states) with that number of electors. string_agg does the thing Dr. Drang wrote a custom lambda function for, combining the column values with a given separator—in this case, a comma and space.

Here’s DuckDB’s output:

┌──────────┬────────────────────────────┬────────────┬────────────┬──────────────────────┬──────────────┬──────────────────────┐
│ Electors │           states           │ num_states │ population │       pop_pct        │ electors_sum │        ec_pct        │
│  int64   │          varchar           │   int64    │   int128   │        double        │    int128    │        double        │
├──────────┼────────────────────────────┼────────────┼────────────┼──────────────────────┼──────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │          7 │    5379033 │ 0.016060895111876104 │           21 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │          7 │   10196485 │ 0.030445003050700387 │           28 │ 0.052044609665427496 │
│        5 │ NE, NM                     │          2 │    4092750 │ 0.012220268674524016 │           10 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │          6 │   18766882 │  0.05603477862637313 │           36 │  0.06691449814126393 │
│        7 │ CT, OK                     │          2 │    7671000 │  0.02290432618710494 │           14 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │          3 │   13333261 │  0.03981089285383978 │           24 │  0.04460966542750929 │
│        9 │ AL, SC                     │          2 │   10482023 │ 0.031297571880163765 │           18 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │          5 │   29902889 │  0.08928503762127392 │           50 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │          4 │   28421431 │  0.08486165119649276 │           44 │  0.08178438661710037 │
│       12 │ WA                         │          1 │    7812880 │ 0.023327956196155443 │           12 │ 0.022304832713754646 │
│       13 │ VA                         │          1 │    8715698 │ 0.026023620119971076 │           13 │ 0.024163568773234202 │
│       14 │ NJ                         │          1 │    9290841 │  0.02774090116236843 │           14 │ 0.026022304832713755 │
│       15 │ MI                         │          1 │   10037261 │ 0.029969586751284978 │           15 │ 0.027881040892193308 │
│       16 │ GA, NC                     │          2 │   21864718 │  0.06528440008617711 │           32 │  0.05947955390334572 │
│       17 │ OH                         │          1 │   11785935 │  0.03519083557033198 │           17 │ 0.031598513011152414 │
│       19 │ IL, PA                     │          2 │   25511372 │    0.076172700530384 │           38 │  0.07063197026022305 │
│       28 │ NY                         │          1 │   19571216 │  0.05843638575704434 │           28 │  0.05204460966542751 │
│       30 │ FL                         │          1 │   22610726 │  0.06751185551183085 │           30 │ 0.055762081784386616 │
│       40 │ TX                         │          1 │   30503301 │  0.09107776768184646 │           40 │  0.07434944237918216 │
│       54 │ CA                         │          1 │   38965193 │  0.11634356543025655 │           54 │  0.10037174721189591 │
├──────────┴────────────────────────────┴────────────┴────────────┴──────────────────────┴──────────────┴──────────────────────┤
│ 20 rows                                                                                                            7 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Finally, I rewrote the query to cut out the import step and select just the columns for the final table. For some reason, when doing it this way the table was no longer sorted by the electors column, even though it had been in the two-step query before. That can be remedied with order by electors:

with totals as (
        -- Sum population and electors for all states
     select sum(population) as total_pop,
            sum(electors) as total_electors
       from "states.csv"
)

   select electors as "Electors",
          string_agg(abbrev, ', ') as "States",
          sum(population/total_pop) as "Pop Pct",
          sum(electors/total_electors) as "EC Pct"

     from "states.csv",
          totals
 group by electors
 order by electors;

This gets closer, but it’s not Markdown yet:

┌──────────┬────────────────────────────┬──────────────────────┬──────────────────────┐
│ Electors │           States           │       Pop Pct        │        EC Pct        │
│  int64   │          varchar           │        double        │        double        │
├──────────┼────────────────────────────┼──────────────────────┼──────────────────────┤
│        3 │ AK, DE, DC, ND, SD, VT, WY │ 0.016060895111876104 │ 0.039033457249070626 │
│        4 │ HI, ID, ME, MT, NH, RI, WV │ 0.030445003050700387 │ 0.052044609665427496 │
│        5 │ NE, NM                     │ 0.012220268674524016 │  0.01858736059479554 │
│        6 │ AR, IA, KS, MS, NV, UT     │  0.05603477862637313 │  0.06691449814126393 │
│        7 │ CT, OK                     │  0.02290432618710494 │ 0.026022304832713755 │
│        8 │ KY, LA, OR                 │  0.03981089285383978 │  0.04460966542750929 │
│        9 │ AL, SC                     │ 0.031297571880163765 │  0.03345724907063197 │
│       10 │ CO, MD, MN, MO, WI         │  0.08928503762127392 │   0.0929368029739777 │
│       11 │ AZ, IN, MA, TN             │  0.08486165119649276 │  0.08178438661710037 │
│       12 │ WA                         │ 0.023327956196155443 │ 0.022304832713754646 │
│       13 │ VA                         │ 0.026023620119971076 │ 0.024163568773234202 │
│       14 │ NJ                         │  0.02774090116236843 │ 0.026022304832713755 │
│       15 │ MI                         │ 0.029969586751284978 │ 0.027881040892193308 │
│       16 │ GA, NC                     │  0.06528440008617711 │  0.05947955390334572 │
│       17 │ OH                         │  0.03519083557033198 │ 0.031598513011152414 │
│       19 │ IL, PA                     │    0.076172700530384 │  0.07063197026022305 │
│       28 │ NY                         │  0.05843638575704434 │  0.05204460966542751 │
│       30 │ FL                         │  0.06751185551183085 │ 0.055762081784386616 │
│       40 │ TX                         │  0.09107776768184646 │  0.07434944237918216 │
│       54 │ CA                         │  0.11634356543025655 │  0.10037174721189591 │
├──────────┴────────────────────────────┴──────────────────────┴──────────────────────┤
│ 20 rows                                                                   4 columns │
└─────────────────────────────────────────────────────────────────────────────────────┘

To get it into a Markdown table, we can run this query from Python, convert it to a Pandas dataframe, and run that through the to_markdown function.

import duckdb
import pandas

df = duckdb.sql("""
    with totals as (
         select sum(population) as total_pop,
                sum(electors) as total_electors
           from 'states.csv'
    )
       select electors as 'Electors',
              string_agg(abbrev, ', ') as 'States',
              sum(population/total_pop) as 'Pop Pct',
              sum(electors/total_electors) as 'EC Pct'
         from 'states.csv',
              totals
     group by electors
     order by electors;
""").df()

print(df.to_markdown(
    index=False,
    floatfmt='.2%',
    colalign=['center', 'center', 'right', 'right']
))

I’ve made two tweaks to the to_markdown call from Dr. Drang’s code. It was adding an index column by default, so I’ve disabled that. Second, since I’ve already renamed the columns in SQL, the headers parameter is no longer needed.

This code combines everything: reading from the CSV, transforming it to the new table, and converting the results (via Pandas) to Markdown. So all we have to do is run this script in the same working directory as states.csv.

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |

Success!


  1. Joke’s on me: writing this post and updating my website to handle all these tables and code blocks nicely ended up being way more involved than solving the original problem. ↩︎

  2. Some Pandas, but mostly Snowpark Python, which is Pandas-inspired but native to Snowflake, the data warehouse we use at work. ↩︎

  3. To pick this apart a bit: I could read the contents of the file with select * from "states.csv", which gives me every column (and row) of the data. DuckDB has a convenient syntax for those select * queries—namely, you can drop the select * and just type from "states.csv". Here I’m taking it a step further and creating a table from the results of that query. ↩︎

Three Sixty-Five

When I wrote back in January I was just embarking on a 365. I planned to take a photo a day for the duration of the year, publishing them to Glass as I went. Somehow, the year is already drawing to a close—and as I write I have one photo left to take.

I’m glad I decided to take the plunge, and I’m glad it’s over. This was a drastic shift from only posting photos from time to time, and only sharing the ones I was reasonably proud of. It felt like a tilt towards quantity over quality. I remember approaching 100 days and wondering how I was going to keep it up for another 265. After the novelty of the first couple weeks had worn off, the next couple months were the drought. I regularly found myself camera in hand after 11pm, trying to find a photo of something around the house so I could just go to bed. The real low point was a couple sad slices of Domino’s pizza.

As I pushed through, though, things improved. I got more diligent about taking photos earlier in the day. Even when I did end up shooting things around the house in the evening, once I got past the obvious stuff, I started finding more interesting subjects, lighting, and camera angles. Or sometimes I just went for a late-night walk around the neighborhood.

When possible, I found inspiration in Glass’s monthly categories. In lieu of hashtags, there is a list of available categories, up to three of which can be added to each photo. Each month the Glass team adds and highlights a new one, which becomes a fun focal point for the community. Given the abundance of street cats here, I expected to take more cat photos than I did when they made April the month of the cat—but I did get featured among the staff favorites at the end of the month, and I’ve continued to add to the ranks of the cats of Glass as the year has gone on.

Overall, I think my average quality towards the end of the year was markedly better than the first part. The practice of carrying a camera almost everywhere and looking for photos in the everyday began to reinforce itself. Getting a lightweight, weather-resistant pancake lens made a huge difference to the ease of keeping a camera on hand. This past month I’ve also been using my new iPhone camera a lot, which has paired well with Glass’s new Everyday category for December.

I’m looking forward to putting this project behind me while taking some of the gains with me. I’m looking forward to feeling the freedom to be more selective in what I post, and to publishing photos from the archives—especially finding good ones when new monthly categories are added. On the other hand, I hope to settle into more frequent sharing and more everyday/street photography than my pre-2023 average. In short, I hope the stretching I have felt through this project is at least somewhat inelastic.