Convolutional Neural Networks (CNNs) are a specialized type of deep learning model best suited for processing data with a grid-like topology, such as images. CNNs use layers with learnable filters (kernels) that scan across the input, automatically learning features like edges, textures, and shapes. The main building blocks of CNNs include convolutional layers, activation functions (e.g., ReLU), pooling layers, and fully connected layers. CNNs have achieved state-of-the-art performance in computer vision tasks such as image classification, object detection, and segmentation.
TensorFlow.js is an open-source library that brings machine learning capabilities to JavaScript. Using TensorFlow.js, you can build, train, and run neural networks directly in the browser or in Node.js without any server-side code. For training CNNs in the browser, TensorFlow.js provides:
tf.sequential() and tf.model() for easy model constructiontf.layers.conv2d(), tf.layers.maxPooling2d(), and tf.layers.dense()Note: This example creates a simple CNN for digit classification. For demonstration, synthetic random data is used; in practice, replace this with real image data.
// Define the CNN model
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [28, 28, 1],
filters: 16,
kernelSize: 3,
activation: 'relu'
}));
model.add(tf.layers.maxPooling2d({ poolSize: 2 }));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({ units: 10, activation: 'softmax' }));
model.compile({
optimizer: 'adam',
loss: 'categoricalCrossentropy',
metrics: ['accuracy']
});
// Generate random training data (replace with real images)
const xs = tf.randomNormal([100, 28, 28, 1]);
const ys = tf.oneHot(tf.randomUniform([100], 0, 10, 'int32'), 10);
// Train the model
await model.fit(xs, ys, {
epochs: 5,
callbacks: { onEpochEnd: (epoch, logs) => console.log(logs) }
});
The image below shows a very simplified structure of a CNN. Each circle represents a "neuron" (unit) in each layer. The input image is processed through convolutional and pooling layers to extract features, then passed to dense (fully connected) layers for classification.
Imagine you want to teach a computer to tell apart pictures of cats and dogs. A CNN model is like a set of smart filters that automatically learn the features that make cats and dogs different—like detecting ears, fur patterns, and shapes. At first, the network knows nothing. You show it thousands of labeled pictures, and after training, it learns which visual patterns most often appear with “cat” and which with “dog.” When you give it a new photo, it can predict whether it's more likely a cat or a dog—just like a simple visual detective!
To build, train, and visualize deep learning models in the browser, you’ll need a few important JavaScript libraries:
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.7.4/dist/tf.min.js
https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-vis@1.0.1/dist/tfjs-vis.umd.min.js
These libraries are essential for building interactive machine learning demos and tools in JavaScript.
The MNIST dataset is a large collection of 28x28 pixel grayscale images of handwritten digits (0 through 9). It is commonly used for training and evaluating image classification systems. Each image represents a single digit and is paired with a corresponding label indicating which digit is shown.
Each image in mnist_images.png is placed sequentially in rows, and each is paired with its label from mnist_labels_uint8. The dataset is often used to teach and test Convolutional Neural Networks (CNNs) and other machine learning models in recognizing handwritten digits.
After loading the MNIST data, it's helpful to visualize some samples to better understand the dataset. Below, you can display a selection of handwritten digit images from cleanedData along with their labels.
When building machine learning models, it's essential to separate your data into training and testing sets. The training set is used to teach the model, while the testing set evaluates how well the model can generalize to unseen data. A common practice is to use 80% of the data for training and 20% for testing.
Convolutional Neural Networks (CNNs) are a type of deep learning model especially effective for image recognition tasks like classifying handwritten digits. In this section, you can define a simple CNN model, review its architecture, and learn how it processes images.
CNNs are inspired by how the human brain processes visual information. They use layers that automatically learn to detect features in images, such as edges, shapes, or even complex objects. Here’s how a typical CNN processes an image:
Once your model and dataset are ready, training is the process where the model learns patterns from the training data. The model adjusts its internal parameters to minimize the difference between its predictions and the actual labels. This process repeats over several epochs (full passes through the training data).
During training, the CNN model receives batches of images and their correct labels. It predicts the classes for the images, calculates how far its predictions are from the correct ones (loss), and uses an optimizer to adjust its internal weights. This process is repeated over several epochs, continually improving the model's accuracy.
Illustration: The CNN is shown learning from labeled images, adjusting filters and connections to improve performance every epoch.
(Source: Medium, for educational use)
After training, it's important to check how well the CNN model performs on data it hasn't seen before. This is done with the test set, which helps you understand the model’s generalization power and ensure it hasn't just memorized the training data.