TorchVision model funetuning

Pytorch에서 fine-tuning하는 방법을 확인해보려고 합니다.

모델은 torchvision models 중 선택할 수 있으며, 모두 1000-class Imagenet datasets로 pre-trained되었습니다.

참고 링크에 fine-tuning과 feature-extraction 이렇게 두 가지 타입의 transfer learning을 수행합니다.

1) Fintuning : pre-trained된 모델로 시작하여 새로운 task에 대한 model의 모든 parameter를 업데이트합니다. 본질적으로 전체 model을 retraining 합니다.

2) Feature extraction : pre-trained된 모델로 시작하여 prediction을 도출하는 마지막 레이어의 weight만 업데이트합니다.

(이것은 feature extraction이라고 불리는데 왜냐하면, pretrained CNN을 고정된 feature-extractor로 이용하고, 오직 마지막 레이어만 바꾸기 때문입니다.)

두 방법 모두 아래와 같은 과정이 필요합니다.

pretrained model 초기화 (Initialize the pretrained model)
새 데이터의 class 수와 동일한 출력 수를 같도록 최종 레이어의 모양을 변경 (Reshape the final layer(s) to have the same number of outputs as the number of classes in the new dataset)
Training 중에 업데이트할 parameters를 optimization algorithm에 대해 정의 (Define for the optimization algorithm which parameters we want to update during training)
Training step 실행 (Run the training step)

1) 라이브러리 import

from __future__ import print_function
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

out

PyTorch Version:  1.6.0+cu101
Torchvision Version:  0.7.0+cu101

Inputs

- 다음은 실행을 위해 변경할 모든 parameter입니다.

- 아래 링크에서 다운로드 할 수있는 hymenoptera_data 데이터 세트를 사용합니다.

download.pytorch.org/tutorial/hymenoptera_data.zip

이 dataset에는 bees과 ants의 두 가지 클래스가 포함되어 있으며, ImageFolder dataset를 사용할 수 있도록 구성되어 있습니다.

데이터를 다운로드하고 data_dir 입력을 dataset의 루트 디렉터리로 설정합니다.

model_name 입력은 사용하려는 모델의 이름이며 아래 링크에서 선택해야합니다.

pytorch.org/docs/stable/torchvision/models.html

torchvision.models — PyTorch 1.7.0 documentation

torchvision.models The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classific

pytorch.org

num_classes : dataset의 class 수를 의미
batch_size : batch_size
num_epochs : 실행하고자 하는 training epochs 수
feature_extract(boolean) : finetuning(false) 아니면 feature extracting(true)

# Top level data directory. Here we assume the format of the directory conforms
#   to the ImageFolder structure
data_dir = "./data/hymenoptera_data"

# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "squeezenet"

# Number of classes in the dataset
num_classes = 2

# Batch size for training (change depending on how much memory you have)
batch_size = 8

# Number of epochs to train for
num_epochs = 15

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
feature_extract = True

Model Training and Validation Code

train_model 함수 : 주어진 model의 training 과 validation을 다룹니다.

input

1) Pytorch model

2) dataloaders dictionary

3) loss function

4) optimizer

5) epoch : 함수는 지정된 epoch 수에 대해 훈련하고 각 epoch가 validation step를 실행 한 후 training 합니다.

6) inception model인지 확인하는 bolean flag

(is_inception 플래그는 Inception v3 모델을 수용하는 데 사용됩니다. 아키텍처가 auxiliary output을 사용하고 전체 모델 loss가 auxiliary output과 final output을 모두 고려하기 때문입니다.)

또한 최고 성능의 모델 (validation accuracy 측면에서) 을 추적하고 훈련이 끝나면 최고 성능의 모델을 반환합니다.

각 epoch 후에 training 및 validation accuracies가 인쇄됩니다.

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

Set Model Parameters’ .requires_grad attribute

Feature extract 할 때 .requires_grad 파라미터를 False로 설정합니다.

반면, training from scratch나 fine-tuning을 할 때는 .requires_grad를 True로 설정합니다.

만약, feature extraction 중이고 새로 initialize된 레이어 대한 gradient만 계산하는 경우에는, 모든 parameter에 gradient 계산이 필요하진 않습니다.

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

Initialize and Reshape the Networks

원하는 task에 맞게 기존 모델을 사용하려면 네트워크에 대해서 reshaping이 필요합니다.

ImageNet으로 pre-trained된 모델은 모두 크기가 1000(각 클래스 당 하나의 노드)인 output layer를 갖죠.

때문에, 만약 본인의 dataset의 class 수와 동일한 수의 출력을 가지도록 변경해야 합니다.

(output 1000 -> dataset의 class 수)

Feature extraction 시, 마지막 레이어의 parameter만 업데이트하거나, reshaping하는 레이어의 parameter만 업데이트 합니다. 따라서 변경하지 않으려는 레이어에 대해 gradient를 계산할 필요가 없으므로, 효율성을 위해 .required_grads 속성을 False로 설정합니다.

(기본 .requires_grad 속성은 True)

새 레이어를 초기화하고 기본적으로 새 parameter에 .requires_grad=True가 있으므로, 새 레이어의 파라미터만 업데이트 됩니다.

Fine-tuning 할 때 모든 .required_grad를 기본값 True로 설정한 상태로 둘 수 있습니다.

Pre-trained model에 대해 새로운 layer로 변경할 때(패턴)

Resnet

(fc): Linear(in_features=512, out_features=1000, bias=True)

위의 fc layer의 input feature 수가 512 out_features 수가 1000으로 되어 있다.

이를 새로운 레이어로 변경하려면 아래와 같이 정의한다.

model.fc = nn.Linear(512, num_classes)

model 변수는 전체 pre-trained 모델을 의미하고, 그 모델의 fc layer에 접근해서 새로운 nn.Linear 함수를 정의한다.

Alexnet

(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, out_features=1000, bias=True)
 )

Alexnet의 경우 classifier block의 6번째 레이어에 최종 layer가 있다.

따라서 classifier block에 indexing 접근하여 새롭게 layer를 정의한다.

model.classifier[6] = nn.Linear(4096,num_classes)

위에 보면 알 수 있지만, 많은 모델이 유사한 출력 구조를 가지고 있지만 각 모델마다 조금씩 다르게 처리되어야 합니다.

또한 reshape된 네트워크의 printed 모델 아키텍처를 확인하고 output feature의 수가 dataset의 클래스 수와 동일한 지 확인이 필요합니다.

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "inception":
        """ Inception v3
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size

# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Print the model we just instantiated
print(model_ft)

저는 pre-trained model 중 densenet161로 실험 해보기 위해 위 initialize_model을 아래와 같이 재 정의했습니다.

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "densenet161":
        """ Densenet
        """
        model_ft = models.densenet161(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size

여기까지의 전체 코드 입니다.

"""
2021-01-30 Densenet161 finetuning 시도
"""

from __future__ import print_function
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

# Top level data directory. Here we assume the format of the directory conforms
#   to the ImageFolder structure
data_dir = "./data/hymenoptera_data"

# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "densenet161"

# Number of classes in the dataset
num_classes = 2

# Batch size for training (change depending on how much memory you have)
batch_size = 8

# Number of epochs to train for
num_epochs = 15

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
feature_extract = True


def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "densenet161":
        """ Densenet
        """
        model_ft = models.densenet161(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size

# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Print the model we just instantiated
print(model_ft)

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 96, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
    ..............
    ..............
    ..............
    (norm5): BatchNorm2d(2208, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Linear(in_features=2208, out_features=2, bias=True)
)

위와 같이 최종 out_features가 class 개수인 2로 변경된 것을 확인할 수 있습니다.

Load Data

입력 크기가 정해지면 data transform, image datasets 및 dataloader를 초기화 할 수 있습니다.

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Create the Optimizer

마지막 단계는 원하는 파라미터만 업데이트하는 optimizer을 만드는 것입니다.

pretrained 모델을 load 한 후 형태를 변경하기 전에 feature_extract = True인 경우 매개 변수의 모든 .requires_grad 속성을 False로 수동 설정합니다(고정된 layer는 학습을 진행하지 않고 새로 정의한 layer만 학습하기 때문에).

그런 다음 다시 초기화 된 레이어의 파라미터에는 기본적으로 .requires_grad = True가 있습니다.

이제 .requires_grad = True 인 모든 파라미터가 optimized되어야한다는 것을 알았습니다.

다음으로 이러한 파라미터 list을 만들고 이 목록을 SGD algorithm constructor에 입력합니다.

여기서 update할 파라미터를 출력해볼 수 있습니다.

# Send the model to GPU
model_ft = model_ft.to(device)

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.
params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = [] # 파라미터 리스트 생성
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9) # 업데이트 할 파라미터만 넣어준다.

Params to learn:
	 classifier.weight
	 classifier.bias

Run Training and Validation Step

마지막으로, 마지막 단계는 모델에 대한 loss을 설정 한 다음, 설정된 epoch 수에 대해 training 및 validation function을 실행합니다.

기본 learning rate는 모든 모델에 대해 최적이 아니므로 최대 정확도를 얻으려면 각 모델에 대해 개별적으로 조정해야합니다.

# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

# Train and evaluate
model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

아래와 같이 에러가 발생하면

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable

이런식으로 multiprocessing의 라이브러리를 importing한 후 freeze_support 함수를 불러온 뒤

모든 실행문을 if __name__ == '__main__' 안에 넣어줍니다.

from multiprocessing import Process, freeze_support

......
......
.....

if __name__ == '__main__':
    freeze_support()
    # Initialize the model for this run
    model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)
    
    # Print the model we just instantiated
    print(model_ft)
    .......    
    .......    
    .......

pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

Finetuning Torchvision Models — PyTorch Tutorials 1.2.0 documentation

Note Click here to download the full example code Finetuning Torchvision Models Author: Nathan Inkawhich In this tutorial we will take a deeper look at how to finetune and feature extract the torchvision models, all of which have been pretrained on the 100