Chopped, Stored, Secured: The Evolution of the Hash Function

While we frequently discuss "hashes" in technical circles, the actual definition of a hash function is often overlooked. Before diving into the modern era, it is worth noting that the foundation was laid by figures like Peter Luhn (verification math), the Birthday Problem (explaining hash collisions), and the Rabin-Karp algorithm (utilizing rolling hashes for string searches).

📜 The Origins of Indexing

The concept of hashing was first formalized in 1956 by Arnold Dumey. Dumey's journey into this field was driven by a lifelong passion for cryptography (starting at age 14) and a mathematics degree from Columbia University. His professional path took him through the US Army Signal Corps and eventually to Potter Instrument, a company specializing in printers and computer memory engineering.

Reflecting on his contributions in an interview with the Charles Babbage Institute, Dumey noted:

“I wrote a paper, which was the first paper on hash coding, based on the work I did there [at Potter Instrument].”

In this seminal work, Dumey introduced "indexing"—a method of using mathematical transformations to map data directly to memory addresses to accelerate search speeds.

🧮 The Math of Early Indexing

To implement this, Dumey suggested converting data into a base-37 number system (or using ASCII). The base-37 choice was practical: 26 English letters + 10 digits + 1 space = 37 symbols.

Example: Hashing the word "BOX"

Assign letters to positions: $A=1, B=2, \dots$
Calculate the numeric value: $\text{Value} = 2 \times 37^2 + 15 \times 37^1 + 24 \times 37^0 = 2738 + 555 + 24 = 3317$
Select a prime number slightly smaller than the total memory addresses (e.g., $97$ ).
Apply the modulo operation (keep the remainder, ~~discard the quotient~~): $3317 \pmod{97} = 19$

Result: The word BOX is stored at memory address 19. This technique is known today as a polynomial hash.

🔪 From "Indexing" to "Hashing"

The term "hash" is derived from the French word hacher, meaning "to chop" or "to mince." This is a perfect metaphor for how the function processes data before storage. While it existed as jargon among cryptographers for years, it didn't appear in official literature until the late 1960s in Herbert Hellerman’s Digital Computer System Principles.

🛠️ The Hashing Workflow

🔐 The Shift to Cryptographic Hashing

Modern security requires more than just fast retrieval; it requires protection. A pivotal moment occurred in 1976 with the paper "New Directions in Cryptography" by Whitfield Diffie and Martin Hellman. (This paper is highly recommended as it also introduces RSA).

🔑 Password Security

Websites do not store your actual password (PW) because that would be a security nightmare. Instead, they store a hash:

Registration: User enters PW $\rightarrow$ Website computes f(PW) $\rightarrow$ f(PW) is stored.
Login: User enters PW $\rightarrow$ Website computes f(x) $\rightarrow$ If f(x) == f(PW), access is granted.

Note: Because passwords can be intercepted during transmission, hashing is paired with encryption protocols like TLS to secure data in transit.

📉 The Mathematics of Irreversibility

The core of a cryptographic hash is the one-way function. It is computationally impossible to reverse the process. "Computationally" implies that even with massive computing power, the time required to find the original input would span ages.

🚫 The "Undo" Problem

In basic algebra, most operations have an inverse:

Addition $\leftrightarrow$ Subtraction
Multiplication $\leftrightarrow$ Division
Exponentiation $\leftrightarrow$ Logarithms

However, some functions lack an "undo button."

1. High-Degree Polynomials

A polynomial of degree $n$ is expressed as: $p(x) = a_n x^n + a_{n-1} x^{n-1} + \dots + a_1 x + a_0$ While calculating $p(x)$ is easy (linear multiplication and addition), solving for $x$ is hard. According to the Abel-Ruffini Theorem:

“The general polynomial of degree $n$ is not solvable by radicals for $n \ge 5$ .”

This means there are no shortcuts; an attacker must guess the roots iteratively.

2. Finite Fields ( $F_p$ or $GF(p)$ )

To make these functions secure, they operate over a finite field. This means a modulo of a large prime is taken after every operation. $y = a^x \pmod{q}$ Calculating $y$ is simple, but finding $x$ (the discrete logarithm problem) is incredibly difficult.

📊 Comparison: Real Fields vs. Finite Fields

Feature	Real Field ( $\mathbb{R}$ )	Finite Field ( $F_p$ )
Elements	Infinite	Finite (e.g., $0$ to $p-1$ )
Visual Graph	Smooth, predictable curves	Chaotic, scattered dots
Analysis	Can be traced/predicted	Requires brute force
Attacker Experience	"Hot and Cold" game	Total guesswork

🏁 Conclusion: The Brute Force Wall

When dealing with a finite field, an attacker cannot "narrow in" on the answer. If they are looking for a specific hash, they must check every possible input. For a 256-bit hash, this means checking $2^{256}$ possibilities.

Security Checklist for Hash Functions:

Irreversible (One-way)
Deterministic (Same input = same output)
Collision-resistant
Computationally expensive to reverse

# Simplified conceptual example of a modulo-based hash
def simple_hash(text, prime_mod=97):
    value = 0
    for char in text:
        value += ord(char) * (37 ** len(text)) # Simplified polynomial
    return value % prime_mod

print(f"Hash of 'BOX': {simple_hash('BOX')}")