Neural Network

Hidden Layers in Neural Networks (Geometric View)

A hidden layer does not produce a single value. It produces a vector of values : one per neuron.

A hidden layer can be written as:

$H (x) = (h_{1} (x), h_{2} (x), \dots, h_{m} (x))$

So instead of mapping:

$x \to R$ it maps: $x \to R^{m}$

Each neuron is a separate function:

$h_{i} (x) = σ (w_{i}^{T} x + b_{i})$

So:

each neuron has its own weights $w_{i}$
each neuron defines its own hyperplane
each neuron detects a different pattern

Example in 2D input space

Let $x = (x_{1}, x_{2})$ and a hidden layer with 3 neurons.

Neuron 1

$h_{1} (x) = σ (x_{1} + x_{2} - 1)$

Detects:

whether the point lies above the line $x_{1} + x_{2} = 1$

Neuron 2

$h_{2} (x) = σ (x_{1} - x_{2})$

Detects:

which side of the line $x_{1} = x_{2}$ the point lies on

Neuron 3

$h_{3} (x) = σ (- x_{1} + 2 x_{2})$

Detects:

another linear boundary in input space

The layer output is:

$H (x) = (h_{1} (x), h_{2} (x), h_{3} (x))$

So instead of the original input $(x_{1}, x_{2})$ we now have a transformed representation $(h_{1}, h_{2}, h_{3})$

The network replaces the original coordinates:

$(x_{1}, x_{2})$

with new coordinates:

$(h_{1}, h_{2}, h_{3})$

where each coordinate means:

“how strongly feature detector 1 activates”
“how strongly feature detector 2 activates”
“how strongly feature detector 3 activates”

A single neuron:

produces one linear decision boundary
can only separate space with one hyperplane

A layer of neurons:

produces many hyperplanes
creates a rich partition of space
builds complex nonlinear representations

Each layer feeds into the next:

Layer 1: simple patterns (edges, lines)
Layer 2: combinations of patterns (curves, corners)
Layer 3: higher-level structures (objects)

Example:

neuron detects “edge”
neuron detects “curve”
neuron detects “eye-like shape”

Next layer combines them:

“eye + nose + mouth → face”

Hidden layers produce feature vectors:

$x \to R^{d} \to R^{m} \to \dots$

Final layer compresses features into prediction:

classification: $R^{10}$ (digit probabilities)
binary classification: $R^{1}$

Example :

Image classifier: $R^{784} \to R^{128} \to R^{64} \to R^{10}$
Binary Classification: $R^{64} \to R^{1}$

Hidden layers learn new coordinate systems where each axis corresponds to a learned feature detector defined by a hyperplane in the previous space.

A hidden layer maps inputs into a new feature space where each coordinate represents the activation of a different learned hyperplane-based detector, enabling progressively more abstract representations.

A neural network progressively partitions space into regions and learns increasingly useful coordinate systems (representations) in which the target problem becomes simpler.

Neural networks act as adaptive coordinate systems that partition input space into polyhedral regions and assign each region a simple linear (or smooth) model.

Agney's Digital Garden

📁 Explorer

Neural Network

Graph View