Convolutional Neural Networks (CNNs) are a specialized type of deep learning model particularly effective at processing data that has a grid-like topology, such as images. CNNs are composed of layers that automatically and adaptively learn spatial hierarchies of features, making them highly effective for image recognition, object detection, and similar tasks.
TensorFlow.js is an open-source library that allows you to define, train, and run machine learning models directly in the browser and in Node.js environments. This makes it possible to create, train, and deploy CNNs using JavaScript, leveraging GPU acceleration where available.
tf.layers.conv2d.model.fit().
// Basic CNN in TensorFlow.js
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [28, 28, 1],
filters: 32,
kernelSize: 3,
activation: 'relu'
}));
model.add(tf.layers.maxPooling2d({poolSize: 2, strides: 2}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({units: 64, activation: 'relu'}));
model.add(tf.layers.dense({units: 10, activation: 'softmax'}));
model.compile({
optimizer: 'adam',
loss: 'categoricalCrossentropy',
metrics: ['accuracy']
});
With TensorFlow.js, you can train this network on image data directly in the browser, making deep learning accessible to web developers.
Several JavaScript libraries are essential for building, training, and visualizing neural network models in the browser:
These libraries make it possible to build, train, and visualize deep learning models interactively within a web page.
The MNIST dataset is a well-known collection of handwritten digits used for training and evaluating image processing systems. It consists of 28x28 pixel grayscale images of digits (0–9). Each image is paired with a label that indicates which digit it represents.
Before using this data in a model, the images and labels must be extracted and organized. Each image will be converted into a flat array of pixel values (normalized between 0 and 1), and each label will be stored as an integer. The cleaned dataset is made easily accessible in the browser as cleanedData for further experiments.
After loading and preprocessing the MNIST data, it’s helpful to visualize some samples to confirm that the images and labels are correct. Below, you can display a selection of random handwritten digit images from cleanedData along with their corresponding labels.
Before building or evaluating a machine learning model, it’s important to split the dataset into two separate parts: a training set (used to train the model) and a testing set (used to evaluate model performance on unseen data). A common practice is to use 80% of the data for training and 20% for testing. This helps ensure fair and unbiased evaluation of your model.
Click the button below to perform the split. After splitting, you can access the sets with trainingSet and testingSet in your scripts.
Convolutional Neural Networks (CNNs) are specialized for processing images. With the button below, you can programmatically define a simple CNN architecture in JavaScript, ready for training on the MNIST dataset. After creation, the layer configuration will be displayed.
A Convolutional Neural Network (CNN) uses a series of layers to extract patterns from image data and perform classification. Here’s a visual explanation of how it works:
softmax to predict the probability of each digit (0-9).Image source: Wikimedia Commons (Typical_cnn.png)
In this animation, you can see how filters slide across the image, producing feature maps that capture patterns useful for recognizing digits.
After defining and understanding your CNN model, the next step is to train it using your training dataset. Training involves feeding your model many labeled examples so it can learn to recognize digit patterns. As training progresses, you can monitor the accuracy and loss to see how well the model is learning.
A confusion matrix is a table that visualizes the performance of a classification model by showing how many predictions were correct and where errors occurred. Each row corresponds to the actual class, and each column to the predicted class. The diagonal values represent correct predictions, while off-diagonal values indicate misclassifications.