Show HN: Microcrad – Micrograd Reimplemented in C

Microcrad is a lightweight, scalar-based automatic differentiation engine written in C. It includes a basic neural network implementation layered on top of the core engine.

Essentially, this project is a C port of Andrej Karpathy's famous micrograd. It is designed specifically for developers and students who want to peel back the curtain and see how backpropagation actually functions under the hood.

🎯 Core Philosophy

Unlike modern deep learning frameworks, microcrad does not use tensors. Instead, it operates entirely on scalars.

"This repository is first and foremost an educational implementation."

Because of this focus, it is ~~a production-ready autograd package~~ not intended for production use, nor is it a practical framework for large-scale deep learning or high numerical robustness.

Key Characteristics:

No Vectorization: No SIMD or batching.
No GPU Acceleration: Everything runs on the CPU.
No Complex Optimizations: It is a pure application of the chain rule.
Scalar-Centric: Every single number is treated as a node in a graph.

🏗️ Technical Architecture

The engine is built upon two fundamental pillars: the Value object and Reference Counting.

1. The `Value` Struct

Every number involved in a calculation is wrapped in a Value struct. If the value is the result of a mathematical operation, it maintains a record of the operands used to create it.

Field	Type	Description
`ref_count`	`uint32_t`	Tracks how many pointers reference this node.
`n_prevs`	`uint32_t`	The number of operand nodes that produced this value.
`prev`	`Value **`	An array of pointers to the preceding nodes.
`magic`	`uint32_t`	A debug "canary" used to detect stale or invalid pointers.

2. The Computation Graph

Because every operation links its output back to its inputs, the resulting structure is a Directed Acyclic Graph (DAG).

3. Automatic Differentiation

To find the gradient, microcrad performs a backward pass. It traverses the graph in reverse, applying the chain rule to compute the derivative of the final output with respect to every input node.

In mathematical terms, for a composition of functions: $\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx}$

💻 Usage Example

Below is a minimal implementation demonstrating how to create values, perform a multiplication, and compute gradients.

// Initialize leaf nodes (inputs/constants)
Value * a = value_create_leaf(2.0);
Value * b = value_create_leaf(3.0);

// Perform operation: c = a * b
Value * c = value_mul(a, b); // Result: 6.0

// Trigger the backward pass to calculate gradients
value_backward(c); 

// Output results
printf("c = %f\n", c->data);       // 6.000000
printf("dc/da = %f\n", a->grad);   // 3.000000 (which is b)
printf("dc/db = %f\n", b->grad);   // 2.000000 (which is a)

// Memory cleanup via reference counting
value_release(c); // Frees c and decrements refs for a and b
value_release(a); // a is now freed
value_release(b); // b is now freed

🛠️ API Reference

Value Creation

Value * value_create(double data, int32_t n_prevs, Value ** prev);
- The general constructor used internally by operations.
Value * value_create_leaf(double data);
- The primary constructor for users to create weights, biases, or inputs.

Mathematical Operations

Each of these functions returns a new Value node and automatically increments the reference counts of the input operands to ensure they aren't deleted prematurely.

value_add(v1, v2) $\rightarrow v_1 + v_2$
value_mul(v1, v2) $\rightarrow v_1 \cdot v_2$
value_pow(b, e) $\rightarrow b^e$
value_exp(v) $\rightarrow e^v$
value_log(v) $\rightarrow \ln(v)$
value_relu(v) $\rightarrow \max(0, v)$

Memory & Gradients

value_backward(Value * v): Computes the gradient of the provided node relative to all its ancestors.
value_release(Value * v): Decrements the reference count; if it hits zero, the memory is freed.

📝 Summary Checklist for Learners

If you are using microcrad to study deep learning, ensure you can answer these:

How does the prev array enable the backward pass?
Why is reference counting necessary for a dynamic computation graph?
How does the chain rule manifest in the value_backward logic?
What is the difference between a "leaf" node and an "operation" node?

Figure 1: Conceptual representation of scalar-based autograd.