Finding the Best Dog Treat with Statistics
Determining the Optimal Dog Treat via Statistical Analysis
By Adam Wespiser | Published June 19, 2026
Meet Bebop: an 83lb, 33-inch tall Greyhound with three primary passions: sprinting, shadowing me around the house, and consuming treats. Whether it is a dedicated chew, a slice of pizza snatched from an unsuspecting child at a party, or a stray tray of cat food, Bebop possesses both the olfactory precision and the athletic prowess to secure his prize.
Despite years of observation, I realized I lacked a definitive answer to a crucial question: What is his absolute favorite snack? Since Bebop cannot communicate his preferences verbally, I turned to mathematics.
The Theoretical Framework: The Bradley-Terry Model
To quantify "treat strength," I utilized the Bradley-Terry model, which derives a global ranking from a series of pairwise comparisons.
The Mathematics of Preference
The model assigns a positive strength score, p_i, to each competitor (treat) . The probability that treat is chosen over treat is expressed as:
Alternatively, if we represent strength as an exponential score where , the formula becomes:
Essentially, the log-odds of one treat beating another are determined by the difference in their latent strengths.
Bradley-Terry vs. Elo
This logic is closely related to Elo ratings. In Elo, the probability is calculated as:
However, Elo is designed for online updates. After a match, a rating is adjusted based on the gap between the actual outcome () and the expected outcome ():
R_A' = R_A + K(S_A - E_A)
- : Actual score (1 for win, 0.5 for draw, 0 for loss).
- : A constant determining the magnitude of the rating shift.
While Elo is ideal for chess (where games are continuous), the Bradley-Terry model is superior for small, static datasets where we can fit the model all at once.
Key Takeaway: Bradley-Terry is the go-to solution when you require a global ranking but only possess head-to-head comparison data.
This approach has been used in high-profile contexts, such as the "FaceSmash" algorithm depicted in The Social Network and the current Chatbot Arena for ranking LLM performance.
The Experimental Design
The goal was simple: present a variety of treats in pairs and record the winner.
Methodology
- Timing: Conducted daily at approximately 11:00 PM.
- Procedure: Select two treats say the word "choice" present one in each hand.
- Constraint: Bebop may only take one; the other returns to the bag.
- Conditioning: Bebop was trained to sniff both options before selecting.
Treat Selection
I chose a mix of established favorites (e.g., Greenies) and various formats found on Amazon.
| Treat ID | Description | Note |
|---|---|---|
| Treat A | MON2SUN Duck + Rawhide | [Amazon Link] |
| Treat B | Greenies | Historical favorite |
| Treat C | Pork Chomps | New addition |
| Treat D | Various | Amazon find |
| Treat E | Various | Amazon find |
Note: I ignored size differences to avoid the tedious task of weighing and cutting treats. To mitigate "hunger bias," trials occurred two hours after dinner.
Data Collection & Refinement
I maintained a strict schedule of head-to-head matchups. For example, a single day of trials might look like this:
| Trial | Left Hand | Right Hand | Winner |
|---|---|---|---|
| 1 | Treat C | Treat B | B |
| 2 | Treat E | Treat B | E |
Optimizing the Sample
Halfway through, it became evident that Treat C (Pork Chomps) and Treat B (Greenies) were consistently losing. To increase the statistical power of the remaining contenders, I skipped planned trials involving B or C and added more matchups between A, D, and E.
# Conceptual bootstrap logic for stability
for i in range(1000):
sample = resample(trials)
model = fit_bradley_terry(sample)
results.append(model.top_treat)
Final Results and Analysis
The "Right-Side" Bias
Interestingly, in trials where the same treat was presented in both hands, Bebop consistently chose the treat on the right side (my left hand).
While this doesn't prove he is "right-pawed" (since I tracked side selection, not paw usage), it indicates a spatial bias. This may have been caused by a window fan on the left side of the kitchen, which introduced an uncontrolled variable.
The Winner
Treat E is the current champion. However, the margin is slim:
- E vs. A: The head-to-head score is 3–2.
- Probability: The model suggests a chance of E beating A.
To determine the absolute winner, future trials should focus almost exclusively on E vs. A.
Bootstrap Confidence
To test the stability of the results, I ran a bootstrap experiment (resampling the data and refitting the model). The frequency with which each treat ranked first was:
- Treat E:
- Treat A:
- Treat D:
Conclusion: For now, Treat E reigns supreme.