Single Neuron Linear Regression Trainer

Introduction to Linear Regression

Linear regression is a fundamental technique in machine learning and statistics. It models the relationship between a dependent variable (y) and one or more independent variables (x) by fitting a straight line through the data points. The equation of this line is typically written as:

y = w₁x₁ + w₂x₂ + ... + b

where w are the weights (slopes for each input), and b is the bias (y-intercept).

How Does a Single Neuron Work?

Example: Predicting House Price

Suppose we want to predict the price of a house (y) based on its size (x₁ = 50 m²) and number of bedrooms (x₂ = 2).
Assume our neuron has learned the weights: w₁ = 2,000, w₂ = 10,000, and b = 5,000.

The neuron computes:
y = 2,000 × 50 + 10,000 × 2 + 5,000 = 100,000 + 20,000 + 5,000 = 125,000
So, the estimated house price is 125,000 (currency units).

This is how a single neuron can be used for linear regression: it learns weights for each input and a bias to make predictions.

TensorFlow.js Discussion

TensorFlow.js is an open-source library that enables machine learning directly in the browser or in Node.js using JavaScript. With TensorFlow.js, developers can train and run machine learning models without needing Python or server-side computation. This is particularly useful for interactive web-based applications, real-time inference, and privacy-preserving computation since all processing can occur on the client side.

Offers support for all major operating systems and devices that support web browsers.
Provides GPU acceleration through WebGL for fast computations.
Allows importing models trained in Python (TensorFlow/Keras) and running them in JavaScript environments.
Ideal for education, interactive demos, or production-ready AI in web apps.

In this demo, we use TensorFlow.js to create and train a basic model: a single neuron performing linear regression.

Classic Libraries for Model Definition, Training, and Visualization

Model Definition and Training Libraries

TensorFlow.js (v0.x): The first versions of TensorFlow.js (e.g., v0.6.0, released in early 2018) enabled users to define, train, and run neural network models directly in the browser using JavaScript. These early releases provided layers, optimizers, and basic data handling, establishing the foundation for machine learning in the browser.
ConvNetJS: Released by Andrej Karpathy in 2014, ConvNetJS is one of the earliest neural network libraries for browsers. It supports defining and training neural nets (including linear regression, multi-layer perceptrons, and convolutional nets) entirely in JavaScript with no dependencies.

Visualization Libraries

D3.js (v3): D3.js v3 is a pioneering JavaScript library for data-driven document manipulation (visualization), released in 2012. It was widely used for plotting data, drawing graphs, and making interactive visualizations in the browser. Many early machine learning demos and model visualizations used D3 v3 for rendering.
Chart.js (v1): Chart.js v1 (2014) is a simple library for drawing charts on web pages using the HTML5 canvas. It was popular for visualizing training loss, accuracy, and regression lines in early web-based machine learning projects.

These libraries paved the way for interactive, browser-based machine learning and visualization tools that are now common in education and research.

Visualization of Cleaned Data: Horsepower vs. Miles per Gallon

Now that the dataset is cleaned and available as cleanedData, we can visualize the relationship between Horsepower and Miles_per_Gallon. This scatter plot will help us understand the data points that will be used for training our model.

Splitting the Dataset: Training and Testing Sets

To evaluate a machine learning model fairly, it's important to train it on a portion of the data and test it on data it hasn't seen. The dataset is typically split into a training set (for model learning) and a testing set (for evaluation). A common split is 70% for training and 30% for testing.

Building an Artificial Neural Network Model

Neural networks are powerful tools for modeling complex nonlinear relationships in data. Here, we’ll create a simple artificial neural network (ANN) that learns to predict Miles_per_Gallon from Horsepower using the training data we prepared earlier.

What is an Artificial Neural Network?

An artificial neural network (ANN) is inspired by biological brains. It consists of interconnected layers of simple processing units called neurons. Each neuron receives inputs, applies weights, computes a sum, and then applies an activation function to determine its output.

The input layer receives features (here, horsepower).
The hidden layer processes the input through several neurons.
The output layer produces the final prediction (here, miles per gallon).

An example of a simple neural network with an input, a hidden, and an output layer.

In this project, the model will learn from the training set you created to estimate the relationship between horsepower and miles per gallon, aiming to predict efficient vehicle performance.

Training the Neural Network Model

Training means letting the neural network learn patterns from data. Here, the model will learn to predict Miles_per_Gallon from Horsepower using the cleanedData you loaded previously.

Evaluating the Trained Neural Network Model

Once the model is trained, it’s important to check how well it performs on new data it hasn't seen before. This is done using the test set that was separated from the cleanedData. Here, we’ll compare the model’s predictions with the real values to understand its performance.

Understanding the Confusion Matrix

The confusion matrix is a useful tool for evaluating classification models. It shows the number of correct and incorrect predictions made by the model compared to the actual outcomes (labels). The matrix helps you see not only the overall accuracy, but also the types of errors the model makes.

True Positive (TP): Model correctly predicts the positive class.
True Negative (TN): Model correctly predicts the negative class.
False Positive (FP): Model incorrectly predicts positive when it is actually negative (Type I error).
False Negative (FN): Model incorrectly predicts negative when it is actually positive (Type II error).

A confusion matrix for a binary classifier looks like this:

                Actual
               | 1 | 0
            ---+---+---
          1 | TP | FP
    Pred. ---+---+---
          0 | FN | TN

To use a confusion matrix with this regression demo, we will convert the regression output into two classes: for example, cars with MPG above 23 are "Efficient" (1), and those with MPG 23 or below are "Not Efficient" (0). The threshold can be adjusted as needed.

Train a Single Neuron for Linear Regression