Demystifying Convolution Neural Network (CNN)

4 min readJul 10, 2020

Written by Prasun Biswas and Chandan Durgia

Ask an engineer about the best advice he/she would have ever got on understanding complex machines, and the most probable answer you will get would be “Reverse Engineer”. To expand on it:

“Best way to learn about a machine is to open it, assess the role of each element individually and finally understand how all elements together create the desired functionality. And, one fine day, you will be able to design these machines from individual elements yourself.”

This way of learning has proven effective in providing individuals with deep understanding of the concepts and principles. This article aims to leverage and extend this pronounced learning methodology to not only understanding and demystifying Convolution Neural Network (CNN) better, but also provide guidance on how one can conceptualize complex CNN model frameworks.

The desired can be achieved with the following key steps:

1. Create/find a generalized model which can perform image classification with good accuracy (let’s say 90% +).

2. Visualize and understand the model framework components.

3. Break the model into components and understand how each component contributes to the overall functionality.

With the context being set, let’s deep dive into each of these.

1. Finding a Generalized image classification Model

There could be numerous ways of finding a good generalized model, if you ask various practitioners you will surely get different answers, however, going by experience, the following are proposed:

a. Models developed by individuals (could be sourced from some public git libraries)

b. Transfer learning techniques/Frameworks: As explained in Keras documentation, these are deep learning models made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning. Some examples of these applications like Resnet, DenseNet, VGG etc. (https://keras.io/api/applications/)

c. Training a model using “teachable machine” by google. (https://teachablemachine.withgoogle.com/). Teachable Machine allows user to train the model to identify a variety of images, with a strong model performance. The key benefit of using this approach is that the code for the generated model can be exported, analyzed and amended.

Of the three, the ones developed by individuals usually do not have acceptable accuracy or are not stable enough. Models developed through transfer learning techniques are comparatively stable, however, these models do not have much leeway for core architecture changes. In contrast, the models developed using “teachable machine” are not only stable with good accuracy, but also the architecture/ framework behind these models can be amended for further improving the model performance.

2. Understanding various model framework components

Once we have a good, potentially complex, model, it is important to visualize the components to understand the ingredients of the CNN models i.e. the choice of number of layers, activation functions, overall architecture/framework, components linkage, number of conv2d layers/max pooling layers, kernel size, associated weights. etc.

For models developed by individuals or model developed through transfer learning techniques visualizing the architecture is easy. As in the first case, the code would specify the details or for the latter case there are sufficient published articles available on the internet.

For the models developed using teachable machine, the same can be achieved by exporting the code (let’s say as h5 tensorflow file) and visualizing the file in any Neural Network visualization software for example Tensorboard (https://www.tensorflow.org/tensorboard/graphs), Netron( https://www.electronjs.org/apps/netron) etc. Figure below shows a snippet from Netron for one of the trained models. Note that by clicking on a node one can view the various parameters of the nodes as well.

3. Understanding individual components and overall functionality

Finally, let's talk about the elephant in the room.

Once the model components are visualized, to understand the model nuances and intricacies, it is imperative to assess each model component individually and all components as a whole.

To become a subject matter on CNN models, a potential best way is to play with the parameters of each and every component and see how it impacts the inputs to the next layers and in turn the final output. With more and more iterations with different parameters, together with good knowledge of mathematics behind these components, one can improve the understanding of CNN models significantly.

Interestingly and somewhat expectedly, while analyzing the details of various image classification frameworks derived from teachable machine, one can acknowledge that many of these frameworks are similar to the frameworks used in Transfer learning techniques (like Resnet, VGG etc.).

Wrapping up

Given the complexity, leave aside developing, it is usually not even easy to understand the nuances of the components and overall architecture of CNN models. The article attempted to provide a methodology to attain a good level of understanding and eventually expertise with these kinds of models.

One of the fastest ways to understand the intricacies of a CNN model is to start with a comprehensive model, visualize various layers leveraging tools like Netron, Tensorboard etc. and understand how various parameters in the layers impact the model output. This together with a strong understanding of the math behind the layers can help one’s understanding of CNN working significantly and can surely pave a path for strong CNN proficiency.

Happy learning !!

In the next article, we will cover how dimensionality reduction (using t-SNE) helps to improve the CNN accuracy.