Determining the Optimal Dog Treat via Statistical Analysis

By Adam Wespiser | Published June 19, 2026

Meet Bebop: an 83lb, 33-inch tall Greyhound with three primary passions: sprinting, shadowing me around the house, and consuming treats. Whether it is a dedicated chew, a slice of pizza snatched from an unsuspecting child at a party, or a stray tray of cat food, Bebop possesses both the olfactory precision and the athletic prowess to secure his prize.

Despite years of observation, I realized I lacked a definitive answer to a crucial question: What is his absolute favorite snack? Since Bebop cannot communicate his preferences verbally, I turned to mathematics.

The Theoretical Framework: The Bradley-Terry Model

To quantify "treat strength," I utilized the Bradley-Terry model, which derives a global ranking from a series of pairwise comparisons.

The Mathematics of Preference

The model assigns a positive strength score, p_i, to each competitor (treat) $i$ . The probability that treat $i$ is chosen over treat $j$ is expressed as:

$Pr(i > j) = \frac{p_i}{p_i + p_j}$

Alternatively, if we represent strength as an exponential score where $p_i = e^{\beta_i}$ , the formula becomes:

$Pr(i > j) = \frac{e^{\beta_i}}{e^{\beta_i} + e^{\beta_j}}$

Essentially, the log-odds of one treat beating another are determined by the difference in their latent strengths.

Bradley-Terry vs. Elo

This logic is closely related to Elo ratings. In Elo, the probability is calculated as:

$Pr(i > j) = \frac{10^{R_i/400}}{10^{R_i/400} + 10^{R_j/400}}$

However, Elo is designed for online updates. After a match, a rating $R_A$ is adjusted based on the gap between the actual outcome ( $S_A$ ) and the expected outcome ( $E_A$ ):

R_A' = R_A + K(S_A - E_A)

$S_A$ : Actual score (1 for win, 0.5 for draw, 0 for loss).
$K$ : A constant determining the magnitude of the rating shift.

While Elo is ideal for chess (where games are continuous), the Bradley-Terry model is superior for small, static datasets where we can fit the model all at once.

Key Takeaway: Bradley-Terry is the go-to solution when you require a global ranking but only possess head-to-head comparison data.

This approach has been used in high-profile contexts, such as the "FaceSmash" algorithm depicted in The Social Network and the current Chatbot Arena for ranking LLM performance.

The Experimental Design

The goal was simple: present a variety of treats in pairs and record the winner.

Methodology

Timing: Conducted daily at approximately 11:00 PM.
Procedure: Select two treats $\rightarrow$ say the word "choice" $\rightarrow$ present one in each hand.
Constraint: Bebop may only take one; the other returns to the bag.
Conditioning: Bebop was trained to sniff both options before selecting.

Treat Selection

I chose a mix of established favorites (e.g., Greenies) and various formats found on Amazon.

Treat ID	Description	Note
Treat A	MON2SUN Duck + Rawhide	[Amazon Link]
Treat B	Greenies	Historical favorite
Treat C	Pork Chomps	New addition
Treat D	Various	Amazon find
Treat E	Various	Amazon find

Note: I ignored size differences to avoid the tedious task of weighing and cutting treats. To mitigate "hunger bias," trials occurred two hours after dinner.

Data Collection & Refinement

I maintained a strict schedule of head-to-head matchups. For example, a single day of trials might look like this:

Trial	Left Hand	Right Hand	Winner
1	Treat C	Treat B	B
2	Treat E	Treat B	E

Optimizing the Sample

Halfway through, it became evident that ~~Treat C (Pork Chomps)~~ and ~~Treat B (Greenies)~~ were consistently losing. To increase the statistical power of the remaining contenders, I skipped planned trials involving B or C and added more matchups between A, D, and E.

# Conceptual bootstrap logic for stability
for i in range(1000):
    sample = resample(trials)
    model = fit_bradley_terry(sample)
    results.append(model.top_treat)

Final Results and Analysis

The "Right-Side" Bias

Interestingly, in trials where the same treat was presented in both hands, Bebop consistently chose the treat on the right side (my left hand).

While this doesn't prove he is "right-pawed" (since I tracked side selection, not paw usage), it indicates a spatial bias. This may have been caused by a window fan on the left side of the kitchen, which introduced an uncontrolled variable.

The Winner

Treat E is the current champion. However, the margin is slim:

E vs. A: The head-to-head score is 3–2.
Probability: The model suggests a $57.5\%$ chance of E beating A.

To determine the absolute winner, future trials should focus almost exclusively on E vs. A.

Bootstrap Confidence

To test the stability of the results, I ran a bootstrap experiment (resampling the data and refitting the model). The frequency with which each treat ranked first was:

Treat E: $63\%$
Treat A: $33\%$
Treat D: $4\%$

Conclusion: For now, Treat E reigns supreme.