Train a Single Neuron for Linear Regression

Introduction to Linear Regression

Linear regression is a fundamental technique in machine learning and statistics. It models the relationship between a dependent variable (y) and one or more independent variables (x) by fitting a straight line through the data points. The equation of this line is typically written as:

y = w1x1 + w2x2 + ... + b

where w are the weights (slopes for each input), and b is the bias (y-intercept).

How Does a Single Neuron Work?

Example: Predicting House Price

Suppose we want to predict the price of a house (y) based on its size (x₁ = 50 m²) and number of bedrooms (x₂ = 2).
Assume our neuron has learned the weights: w₁ = 2,000, w₂ = 10,000, and b = 5,000.

This is how a single neuron can be used for linear regression: it learns weights for each input and a bias to make predictions.

Train a Single Neuron for Linear Regression

Enter your data points below (one (x, y) pair per line, separated by a comma), then train the single neuron to fit a line:


TensorFlow.js Discussion

TensorFlow.js is an open-source library that enables machine learning directly in the browser or in Node.js using JavaScript. With TensorFlow.js, developers can train and run machine learning models without needing Python or server-side computation. This is particularly useful for interactive web-based applications, real-time inference, and privacy-preserving computation since all processing can occur on the client side.

In this demo, we use TensorFlow.js to create and train a basic model: a single neuron performing linear regression.

Classic Libraries for Model Definition, Training, and Visualization

Model Definition and Training Libraries

Visualization Libraries

These libraries paved the way for interactive, browser-based machine learning and visualization tools that are now common in education and research.

Discussion: The Cars Dataset

The dataset from carsData.json contains information about various car models, including attributes such as miles per gallon (mpg) and horsepower. This dataset is often used for regression tasks and demonstration of machine learning techniques in JavaScript tutorials.

The data may contain missing values or non-numeric entries. Before using it for training models, it is important to clean the dataset by removing records with incomplete or invalid data.

Visualization of Cleaned Data: Horsepower vs. Miles per Gallon

Now that the dataset is cleaned and available as cleanedData, we can visualize the relationship between Horsepower and Miles_per_Gallon. This scatter plot will help us understand the data points that will be used for training our model.

Splitting the Dataset: Training and Testing Sets

To evaluate a machine learning model fairly, it's important to train it on a portion of the data and test it on data it hasn't seen. The dataset is typically split into a training set (for model learning) and a testing set (for evaluation). A common split is 70% for training and 30% for testing.

Building an Artificial Neural Network Model

Neural networks are powerful tools for modeling complex nonlinear relationships in data. Here, we’ll create a simple artificial neural network (ANN) that learns to predict Miles_per_Gallon from Horsepower using the training data we prepared earlier.

What is an Artificial Neural Network?

An artificial neural network (ANN) is inspired by biological brains. It consists of interconnected layers of simple processing units called neurons. Each neuron receives inputs, applies weights, computes a sum, and then applies an activation function to determine its output.

Neural network diagram

An example of a simple neural network with an input, a hidden, and an output layer.

In this project, the model will learn from the training set you created to estimate the relationship between horsepower and miles per gallon, aiming to predict efficient vehicle performance.

Training the Neural Network Model

Training means letting the neural network learn patterns from data. Here, the model will learn to predict Miles_per_Gallon from Horsepower using the cleanedData you loaded previously.

Evaluating the Trained Neural Network Model

Once the model is trained, it’s important to check how well it performs on new data it hasn't seen before. This is done using the test set that was separated from the cleanedData. Here, we’ll compare the model’s predictions with the real values to understand its performance.

Understanding the Confusion Matrix

The confusion matrix is a useful tool for evaluating classification models. It shows the number of correct and incorrect predictions made by the model compared to the actual outcomes (labels). The matrix helps you see not only the overall accuracy, but also the types of errors the model makes.

A confusion matrix for a binary classifier looks like this:

                Actual
               | 1 | 0
            ---+---+---
          1 | TP | FP
    Pred. ---+---+---
          0 | FN | TN
  

To use a confusion matrix with this regression demo, we will convert the regression output into two classes: for example, cars with MPG above 23 are "Efficient" (1), and those with MPG 23 or below are "Not Efficient" (0). The threshold can be adjusted as needed.