Convolutional Neural Networks (CNNs) are a specialized type of deep neural network designed to process structured grid data, such as images. CNNs are highly effective for image and video recognition, classification, and many other visual tasks.
CNNs work by sliding small filters (kernels) over the input image to produce feature maps. These maps highlight important features like edges, textures, and shapes. The network learns which filters are most useful during training.
Training a CNN involves multiple steps:
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [28, 28, 1],
filters: 32,
kernelSize: 3,
activation: 'relu'
}));
model.add(tf.layers.maxPooling2d({ poolSize: [2, 2] }));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
model.add(tf.layers.dense({ units: 10, activation: 'softmax' }));
TensorFlow.js is a JavaScript library for training and deploying machine learning models directly in the browser or in Node.js. It enables running computations on the client side, utilizing WebGL for fast parallel processing.
tf.sequential() or tf.model() to construct models.
conv2d, maxPooling2d, flatten, dense, and more, making it suitable for CNNs.
tf.data and tf.tensor utilities.
model.fit(), specifying epochs, batch size, and callbacks for monitoring.
model.compile({
optimizer: 'adam',
loss: 'categoricalCrossentropy',
metrics: ['accuracy']
});
await model.fit(trainImages, trainLabels, {
epochs: 10,
validationData: [testImages, testLabels],
callbacks: tf.callbacks.earlyStopping({monitor: 'val_loss'})
});
Below is a demonstration code block. For a complete example, you need to supply appropriate trainImages and trainLabels tensors in the correct shape.
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script>
// Define and compile the model as shown above
// Load and preprocess your image data as tensors
// Train the model with model.fit()
</script>
Training a CNN involves multiple steps:
Below is a simple schematic of a Convolutional Neural Network. Each circle represents a neuron. The network consists of an input image, a convolutional layer, a pooling layer, and a fully connected layer.
Imagine you want to teach a computer to recognize whether an image is a cat or a dog.
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [28, 28, 1],
filters: 32,
kernelSize: 3,
activation: 'relu'
}));
model.add(tf.layers.maxPooling2d({ poolSize: [2, 2] }));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
model.add(tf.layers.dense({ units: 2, activation: 'softmax' })); // 2 outputs: cat or dog
To define and train neural network models directly in the browser, as well as to visualize results and model structure, the following libraries are commonly used:
These libraries enable you to build, train, and visualize deep learning models interactively and entirely client-side.
The MNIST dataset is a classic benchmark in machine learning, consisting of images of handwritten digits (0–9) and their corresponding labels. Each image is a 28x28 grayscale pixel grid, making it ideal for training and evaluating image classification models such as convolutional neural networks (CNNs).
The dataset is widely used to evaluate computer vision algorithms' ability to recognize handwritten numbers. Each image is associated with a single label, indicating which digit it represents.
Visualizing the data is an important step for understanding the structure and variety of the MNIST dataset. Below, a selection of digit images and their labels from cleanedData is displayed as a grid. This helps verify that the data has loaded correctly and gives a sense for the input your models will see.
In machine learning, it is important to split your dataset into a training set and a testing set. The training set is used to train your model, while the testing set is used to evaluate how well your model performs on unseen data. A typical split is 80% for training and 20% for testing.
Convolutional Neural Networks (CNNs) are a popular architecture for image classification tasks, such as recognizing handwritten digits in the MNIST dataset.
How does a CNN work?
A CNN processes image data through several layers:
This layered approach allows the network to learn increasingly complex patterns, making CNNs highly effective for image recognition.
After defining the CNN model, the next step is to train it using the training set. This process teaches the model to recognize patterns in the data by repeatedly adjusting its parameters to minimize prediction errors.
During training, the model uses the images and labels from the training set to learn. It repeatedly makes predictions and compares them to the actual labels, adjusting itself each time to improve. Over multiple epochs, the loss should decrease and the accuracy should increase, indicating that the model is learning to recognize the images.
After training, it's important to evaluate the model on a separate test set. This helps us understand how well the model generalizes to new, unseen data.
Testing the model on new data that it has never seen before gives a realistic sense of its performance in real-world scenarios. High test accuracy suggests that the model has learned to generalize well, not just remember the training data.
A confusion matrix is a table used to evaluate the performance of a classification model. It shows how many predictions were correct and where mistakes were made for each class.
The diagonal cells show correct predictions. Off-diagonal cells show misclassifications. A perfect model has nonzero values only on the diagonal.