Source: Deep Learning on Medium
❄️How EfficientNet Works
It.s possible to understand in three basic steps why it is more efficient.
- Depthwise Convolution + Pointwise Convolution: Divides the original convolution into two stages to significantly reduce the cost of calculation, with a minimum loss of accuracy.
- Inverse Res: The original ResNet blocks consist of a layer that squeezes the channels, then a layer that extends the channels. In this way, it links skip connections to rich channel layers. In MBConv, however, blocks consist of a layer that first extends channels and then compresses them, so that layers with fewer channels are skip connected.
- Linear bottleneck: Uses linear activation in the last layer in each block to prevent loss of information from ReLU.
The main building block for EfficientNet is MBConv, an inverted bottleneck conv, originally known as MobileNetV2. Using shortcuts between bottlenecks by connecting a much smaller number of channels (compared to expansion layers), it was combined with an in-depth separable convolution, which reduced the calculation by almost k² compared to traditional layers. Where k denotes the kernel size, it specifies the height and width of the 2-dimensional convolution window.
from keras.layers import Conv2D, DepthwiseConv2D, Adddef inverted_residual_block(x, expand=64, squeeze=16):
block = Conv2D(expand, (1,1), activation=’relu’)(x)
block = DepthwiseConv2D((3,3), activation=’relu’)(block)
block = Conv2D(squeeze, (1,1), activation=’relu’)(block)
return Add()([block, x])
Larger networks with greater width, depth, or resolution tend to achieve higher accuracy. However, once the accuracy gain reaches 80%, it’s quickly saturated. This indicates the limitation of one-dimensional scaling. The individual scaling technique is called compound scaling.
If you want to increase the FLOPS 2-fold from EfficientNetB0 to EfficientNetB7 (to ensure that EfficientNetB1 has 2x FLOPS compared to EfficientNetB0), it’s necessary to take the following approach:
depth: d = α φ
width: w = β φ
resolution: r = γ φ s.t. α · β 2 · γ 2 ≈ 2
α ≥ 1, β ≥ 1, γ ≥ 1
❄️Creating an EfficientNet Model
Step 1️⃣: Building a basic model is called EfficientNet-B0. MBConv is used with MobileNet’s inverted Res bottlenecks.
Step 2️⃣: ϕ = 1 and grid search for α, β, and için to scale from B0 to B1.
Step 3️⃣: α,β,γ set. Thus, for scaling from B2 to B7, ϕ is selected between 2 ~ 7. Below, you can see class activation maps for models with different scaling methods.
It appears that the compound scaling method focuses on details more efficiently in the object-related regions. Worth trying! 🤖