cnn_learner is a utility function which creates a Learner from given a pretrained CNN architecture such as resnet18.
def cnn_learner(dls, arch, normalize=True, n_out=None, pretrained=True, config=None,# learner args loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=None, cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95,0.85,0.95),# other model args**kwargs):"Build a convnet style learner from `dls` and `arch`" ... meta = model_meta.get(arch, _default_meta)if normalize: _add_norm(dls, meta, pretrained)if n_out isNone: n_out = get_c(dls)assert n_out, "`n_out` is not defined, and could not be inferred from data, set `dls.c` or pass `n_out`" model = create_cnn_model(arch, n_out, pretrained=pretrained, **kwargs) splitter=ifnone(splitter, meta['split']) learn = Learner(dls=dls, model=model, loss_func=loss_func, opt_func=opt_func, lr=lr, splitter=splitter, cbs=cbs, metrics=metrics, path=path, model_dir=model_dir, wd=wd, wd_bn_bias=wd_bn_bias, train_bn=train_bn, moms=moms)if pretrained: learn.freeze() ...return learn
To do that, it uses the model metadata from model_meta registry. model_meta registry is simply a mapping (dictionary) from architecture to its metadata.
The cut value is used for stripping off the existing classification head of the network so that we can add a custom head and fine-tune it for our task.
The split function is used when discriminative learning rate schema is applied such that the layers of a model are trained with different learning rates.
The stats refer to the channel means and standard deviations of the images in ImageNet dataset, which the model is pretrained on.
There are two alternative ways to to use a custom model not present in model registry: 1. Create a new helper function similar to cnn_learner that splits the network into backbone and head. Check out Zachary Mueller’s awesome blog post to see how it’s done. 2. Register the architecture in model_meta and use cnn_learner.
We will cover the second option in this post.
Let’s first inspect an architecture registered already, e.g. resnet18.
Similarly, we need to determine the cut index for the custom model we use. Let’s try EfficientNetB0 architecture available in torchvision library. First, we inspect the network layers to find out where to split it into backbone and head.
from torchvision.models import efficientnet_b0m = efficientnet_b0()pprint_model(m)
As it can be seen, the pooling layer is at index -2, which corresponds to the cut value. We’ll use the default_split for split function and ImageNet stats as the model is pre-trained on it.
Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-3dd342df.pth
Let’s verify that the body and head are created correctly.