You can't unit test for taste

By Karl Tryggvason - Developer Blog Published: 22 June 2026 | Reading Time: ~12 mins

I am currently developing In the Long Run, an application where athletes can participate in virtual runs along the world's most iconic routes. The system tracks your actual Strava mileage and maps that progress against massive, continent-spanning paths.

The core philosophy is simple:

"Life is a marathon, not a sprint."

The goal is to foster long-term motivation. Even if you have a sluggish season or a bad month, you are still incrementally traversing the globe. While the app already features interactive maps for exploration, I wanted to elevate the experience by integrating Points of Interest (POIs)—historical landmarks and scenic sights.

While I could manually curate lists for routes I know personally, that approach ~~scales poorly~~ is impossible for global routes. I needed an automated data pipeline. In the process, I encountered the friction between algorithmic data and human "taste," and dealt with the occasional hallucinations of an LLM.

Interestingly, while I initially thought AI would be the "star" feature, it ended up as a supporting actor to traditional data processing and signal filtering.

🛠️ Dataset and Toolset

I started with GeoNames, a comprehensive, Creative Commons-licensed database of global locations and categories. To transform these raw dumps into a user-facing feature, I collaborated with my AI assistant, Claude.

The Technical Stack

Language: Python (chosen for its robust library ecosystem).
Storage: Apache Parquet (for efficient local data storage).
Query Engine: DuckDB (providing a high-performance SQL layer).

Since this was my first time using Parquet and DuckDB, I let Claude guide me through their features. I generally believe that introducing one or two new technologies per project is the optimal way to learn. If the entire stack is unfamiliar, the learning curve becomes a wall, potentially killing the project's momentum.

While AI coding agents shift this dynamic, I still find that having a foundational understanding of the tools allows me to steer the agent effectively rather than following its suggestions blindly.

Example: A runner encountering a POI near Springfield, Illinois.

The Workflow Strategy

I used a structured approach to avoid the "context drift" that often plagues long AI sessions:

Define a high-level project plan.
Create detailed specs for individual pipeline steps.
Iterate on specs based on findings from previous steps.
Start a fresh agent session for every milestone.

By condensing the results of one milestone into a brief set of instructions for the next, I maintained higher response quality and avoided the degradation that occurs with massive context windows.

📍 Notability and the Bias of Data

The first phase involved downloading the GeoNames files and ensuring large data dumps were added to .gitignore. The initial processing logic was as follows:

Join the raw files on common columns.
Filter out irrelevant data (e.g., ~~administrative divisions like states or regions~~).
Select specific feature codes: castles, monuments, mountains, parks, and historic sites.
Apply thresholds: Population minimums for towns and elevation minimums for peaks.

While this likely created some false negatives, it provided a solid baseline.

The "Wikipedia Signal"

I discovered that the alternateNames.txt file contains Wikipedia links (specifically where isolanguage=link and the name matches %en.wikipedia.org%). Although this feels like a "bolted-on" schema choice by GeoNames, it is an incredibly powerful signal for notoriety. Since Wikipedia summaries are also Creative Commons, they served as the source for our POI blurbs.

The "Taste" Problem: During sanity checks, I noticed a bias. For example, the pipeline initially flagged a rural locality in Australia called Stonehenge but missed the actual prehistoric megaliths in England. To fix this, I had to ensure the pipeline correctly handled:

Local language canonical names.
Cross-referencing the correct English Wikipedia URLs.

The mathematical reduction of the dataset was significant: $\text{Initial Set: } 13,000,000 \text{ rows} \rightarrow \text{Filtered Set: } 725,000 \text{ rows}$

🗺️ Spatial Matching and Route Integration

We didn't want every single hamlet to appear; we only wanted POIs relevant to the specific run.

The Pipeline Logic

To implement this, I used Shapely and Pyproj for geospatial calculations. The logic follows a two-step filter:

Bounding Box: A quick rectangular crop to eliminate 99% of irrelevant points.
Distance Check: A precise calculation to ensure the POI is within a specific radius (defaulting to $50\text{km}$ ) of the route coordinates.

Results by Route

The output is a route-specific Parquet file. The variance in POI density was striking:

Route	Length (km)	POI Count
Iceland Ring Road	1,321	511
Cape Town to Magadan	23,257	10,000
Route 66	3,787	14,181

This disparity was the first clear indicator that our "notoriety signal" (English Wikipedia) was heavily biased toward Anglophone regions.