โ† Back to news

Chopped, Stored, Secured โ€“ The Story of the Hash Function

0xkrt26.github.io|32 points|7 comments|by denismenace|Jun 10, 2026

Chopped, Stored, Secured: The Evolution of the Hash Function

While we frequently discuss "hashes" in technical circles, the actual definition of a hash function is often overlooked. Before diving into the modern era, it is worth noting that the foundation was laid by figures like Peter Luhn (verification math), the Birthday Problem (explaining hash collisions), and the Rabin-Karp algorithm (utilizing rolling hashes for string searches).


๐Ÿ“œ The Origins of Indexing

The concept of hashing was first formalized in 1956 by Arnold Dumey. Dumey's journey into this field was driven by a lifelong passion for cryptography (starting at age 14) and a mathematics degree from Columbia University. His professional path took him through the US Army Signal Corps and eventually to Potter Instrument, a company specializing in printers and computer memory engineering.

Reflecting on his contributions in an interview with the Charles Babbage Institute, Dumey noted:

โ€œI wrote a paper, which was the first paper on hash coding, based on the work I did there [at Potter Instrument].โ€

In this seminal work, Dumey introduced "indexing"โ€”a method of using mathematical transformations to map data directly to memory addresses to accelerate search speeds.

๐Ÿงฎ The Math of Early Indexing

To implement this, Dumey suggested converting data into a base-37 number system (or using ASCII). The base-37 choice was practical: 26 English letters + 10 digits + 1 space = 37 symbols.

Example: Hashing the word "BOX"

  1. Assign letters to positions: A=1,B=2,โ€ฆA=1, B=2, \dots
  2. Calculate the numeric value: Value=2ร—372+15ร—371+24ร—370=2738+555+24=3317\text{Value} = 2 \times 37^2 + 15 \times 37^1 + 24 \times 37^0 = 2738 + 555 + 24 = 3317
  3. Select a prime number slightly smaller than the total memory addresses (e.g., 9797).
  4. Apply the modulo operation (keep the remainder, discard the quotient): 3317(mod97)=193317 \pmod{97} = 19

Result: The word BOX is stored at memory address 19. This technique is known today as a polynomial hash.


๐Ÿ”ช From "Indexing" to "Hashing"

The term "hash" is derived from the French word hacher, meaning "to chop" or "to mince." This is a perfect metaphor for how the function processes data before storage. While it existed as jargon among cryptographers for years, it didn't appear in official literature until the late 1960s in Herbert Hellermanโ€™s Digital Computer System Principles.

๐Ÿ› ๏ธ The Hashing Workflow


๐Ÿ” The Shift to Cryptographic Hashing

Modern security requires more than just fast retrieval; it requires protection. A pivotal moment occurred in 1976 with the paper "New Directions in Cryptography" by Whitfield Diffie and Martin Hellman. (This paper is highly recommended as it also introduces RSA).

๐Ÿ”‘ Password Security

Websites do not store your actual password (PW) because that would be a security nightmare. Instead, they store a hash:

  • Registration: User enters PW โ†’\rightarrow Website computes f(PW) โ†’\rightarrow f(PW) is stored.
  • Login: User enters PW โ†’\rightarrow Website computes f(x) โ†’\rightarrow If f(x) == f(PW), access is granted.

Security Concept

Note: Because passwords can be intercepted during transmission, hashing is paired with encryption protocols like TLS to secure data in transit.


๐Ÿ“‰ The Mathematics of Irreversibility

The core of a cryptographic hash is the one-way function. It is computationally impossible to reverse the process. "Computationally" implies that even with massive computing power, the time required to find the original input would span ages.

๐Ÿšซ The "Undo" Problem

In basic algebra, most operations have an inverse:

  • Addition โ†”\leftrightarrow Subtraction
  • Multiplication โ†”\leftrightarrow Division
  • Exponentiation โ†”\leftrightarrow Logarithms

However, some functions lack an "undo button."

1. High-Degree Polynomials

A polynomial of degree nn is expressed as: p(x)=anxn+anโˆ’1xnโˆ’1+โ‹ฏ+a1x+a0p(x) = a_n x^n + a_{n-1} x^{n-1} + \dots + a_1 x + a_0 While calculating p(x)p(x) is easy (linear multiplication and addition), solving for xx is hard. According to the Abel-Ruffini Theorem:

โ€œThe general polynomial of degree nn is not solvable by radicals for nโ‰ฅ5n \ge 5.โ€

This means there are no shortcuts; an attacker must guess the roots iteratively.

2. Finite Fields (FpF_p or GF(p)GF(p))

To make these functions secure, they operate over a finite field. This means a modulo of a large prime is taken after every operation. y=ax(modq)y = a^x \pmod{q} Calculating yy is simple, but finding xx (the discrete logarithm problem) is incredibly difficult.

๐Ÿ“Š Comparison: Real Fields vs. Finite Fields

FeatureReal Field (R\mathbb{R})Finite Field (FpF_p)
ElementsInfiniteFinite (e.g., 00 to pโˆ’1p-1)
Visual GraphSmooth, predictable curvesChaotic, scattered dots
AnalysisCan be traced/predictedRequires brute force
Attacker Experience"Hot and Cold" gameTotal guesswork

๐Ÿ Conclusion: The Brute Force Wall

When dealing with a finite field, an attacker cannot "narrow in" on the answer. If they are looking for a specific hash, they must check every possible input. For a 256-bit hash, this means checking 22562^{256} possibilities.

Security Checklist for Hash Functions:

  • Irreversible (One-way)
  • Deterministic (Same input = same output)
  • Collision-resistant
  • Computationally expensive to reverse
# Simplified conceptual example of a modulo-based hash
def simple_hash(text, prime_mod=97):
    value = 0
    for char in text:
        value += ord(char) * (37 ** len(text)) # Simplified polynomial
    return value % prime_mod

print(f"Hash of 'BOX': {simple_hash('BOX')}")