This work presents a computational implementation of a Single Layer Perceptron model trained via gradient descent. A set of weights and a bias term are iteratively optimized to minimize the prediction error on a continuous dataset. The system evaluates the final model by comparing the predicted outputs against the ground truth values, demonstrating fundamental machine learning convergence.
This work implements a discrete learning algorithm on a toy dataset with the following objectives:
- Simulate the training loop of a basic neural network architecture
- Constrain the model to a single linear layer without non-linear activation functions
- Compute the Mean Squared Error (MSE) cost function across iterations
- Update parameters using a continuous gradient descent algorithm
- Evaluate the model's accuracy through a comparative reporting utility
The system is designed as an educational and exploratory tool for understanding forward propagation, backward propagation, and numerical optimization.
The simulation operates on a continuous dataset defined as:
(x1, x2) ∈ ℝ², y ∈ ℝ
Where:
X ∈ ℝ^(2 × m) : feature matrix (input boundaries)
Y ∈ ℝ^(1 × m) : label vector (target outputs)
m ∈ ℕ : number of training examples
The model state at iteration t is defined by its parameters:
S(t) = {W(t), b(t)}
Where:
W(t) ∈ ℝ^(1 × 2) : weight matrix
b(t) ∈ ℝ : bias scalar
At each iteration, the model performs a deterministic update:
- A forward pass computes the linear prediction:
Y_hat(t) = W(t)X + b(t)
- A backward pass computes the gradients of the cost function:
dW(t) = (1/m) * (Y_hat(t) - Y) * X^T
db(t) = (1/m) * sum(Y_hat(t) - Y)
- Parameters are updated using a learning rate α:
W(t+1) = W(t) - α * dW(t)
b(t+1) = b(t) - α * db(t)
This ensures:
- Iterative minimization of the loss landscape
- Continuous parameter adjustment
- Convergence toward the global minimum
To analyze model performance, a Mean Squared Error (MSE) metric is computed:
J(t) = sum((Y_hat(t) - Y)^2) / (2 * m)
Where:
(Y_hat(t) - Y) is the residual error vector for all examples
m is the number of training examples
This produces a strictly positive scalar value used to monitor convergence across training iterations.
The system produces:
- A trained linear regression model capable of numerical prediction
- An iterative loss reduction log over time
- A comparative report of actual versus predicted values Convergence logs reveal:
- Rapid initial descent of the cost function
- Stable asymptotic behavior near the optimal weights
- A high degree of accuracy on linearly separable continuous data
- The model is strictly linear and cannot solve non-linear problems (e.g., XOR)
- Lack of an activation function reduces the network to simple linear regression
- Gradient descent uses the full batch, which is computationally expensive for large datasets
- Hand-coded mathematical derivatives lack the flexibility of automatic differentiation