SSD(single shot detector) 코드 분석 0. (prior box)

딥러닝관련/Detection

SSD(single shot detector) 코드 분석 0. (prior box)

머리올리자 2022. 5. 17. 13:12

SSD에서는 ground truth box, predict box 이외에도 prior box를 미리 정의하여 학습에 사용

1. Feature map size

- [38, 19, 10, 5, 3, 1]

2. shrink range(stride)

- [8, 16, 32, 64, 100, 300]

3. box size(min, max)

- [30, 60, 111, 162, 213, 264] / [60, 111, 162, 213, 264, 315]

4. aspect_ratio

- [[2], [2, 3], [2, 3], [2, 3], [2], [2]]

실제 구현상은 어떻게 되어 있나 보자

# https://github.com/qfgaohao/pytorch-ssd        
    
    """Generate SSD Prior Boxes.

    It returns the center, height and width of the priors. The values are relative to the image size
    Args:
        specs: SSDSpecs about the shapes of sizes of prior boxes. i.e.
            specs = [
                SSDSpec(38, 8, SSDBoxSizes(30, 60), [2]),
                SSDSpec(19, 16, SSDBoxSizes(60, 111), [2, 3]),
                SSDSpec(10, 32, SSDBoxSizes(111, 162), [2, 3]),
                SSDSpec(5, 64, SSDBoxSizes(162, 213), [2, 3]),
                SSDSpec(3, 100, SSDBoxSizes(213, 264), [2]),
                SSDSpec(1, 300, SSDBoxSizes(264, 315), [2])
            ]
        image_size: image size.
        clamp: if true, clamp the values to make fall between [0.0, 1.0]
    Returns:
        priors (num_priors, 4): The prior boxes represented as [[center_x, center_y, w, h]]. All the values
            are relative to the image size.
    """


    for spec in specs:
        scale = image_size / spec.shrinkage
        for j, i in itertools.product(range(spec.feature_map_size), repeat=2):
            x_center = (i + 0.5) / scale # normalize
            y_center = (j + 0.5) / scale # normalize

            # SQUARE BOX : SMALL SIZE
            size = spec.box_sizes.min
            h = w = size / image_size # ORIGINAL BOX SIZE / IMAGE SCALE
            # feature map point, feature map point, relative box scale, relative box scale
            priors.append([x_center, y_center, w, h])

            # SQUARE BOX : BIG SIZE
            size = math.sqrt(spec.box_sizes.max * spec.box_sizes.min) # SQUARE_ROOT(BIG * SMALL)
            h = w = size / image_size
            priors.append([x_center, y_center, w, h])

            # CHANGE RATIO OF THE SMALL SIZED BOX
            size = spec.box_sizes.min
            h = w = size / image_size
            for ratio in spec.aspect_ratios:
                ratio = math.sqrt(ratio)
                priors.append([x_center, y_center, w * ratio, h / ratio]) # increase width, decrease height 
                priors.append([x_center, y_center, w / ratio, h * ratio]) # decrease width, increase height

38 x 38을 예시로 들면

scale = 300 / 8 = 37.5

j, i 가 0, 0 일 때

x_center = (0 + 0.5) / 37.5 (scale)

y_center = (0 + 0.5) / 37.5 (scale)

- 0.5를 더해주는 이유는 center 지점을 찾으려하는 이유로 생각되며

- 37.5로 나누는 이유는 normalize를 위함이라고 생각됨

square box(min)

미리정의한 minimum box 사이즈(30)를 가져와 image_size를 나눠 비율을 계산하고

h = w = 30 / 300 = 0.1

prior 리스트에 append 한다

([x_center, y_center, w, h]) -> ([0.013333333333333334, 0.013333333333333334, 0.1, 0.1])

square box(max)

미리정의한 maximum box 사이즈(60)를 가져오며 값을 새로 다시 정의한다.

size = math.sqrt(spec.box_sizes.max * spec.box_sizes.min)

size = root(60 * 30) = 42.426406871

h = w = 42.426406871 / 300 = 0.14142135

이를 또 append 한다

([x_center, y_center, w, h]) -> ([0.013333333333333334, 0.013333333333333334, 0.14142135, 0.14142135])

Rectangle Box(modified from the small box)

minimum box 사이즈(30)를 가져와 ratio에 따라 변형한다.

h = w = 30 / 300 = 0.1

38일 때의 aspect ratio는 [2]

1. aspect ratio에 root 장착 -> root(2) -> 1.4142

2-1) width 늘리고, height 줄이고 -> (x_center, y_center, w * ratio, h / ratio)

2-2) width 줄이고, height 늘리고 -> (x_center, y_center, w / ratio, h * ratio)

결국 feature map size 38 x 38에서는 4개의 box(square 2개, rectangle 2개) 가 생기며총 38 x 38 x 4 = 5776개의 box가 생겨나는 것이다.

center 좌표를 계산할 때는 실제 연산시 줄어들 사이즈만큼 normalize해서 계산하는 반면 (x_center = (i + 0.5) / 37.5 (scale))

box의 height width 구할 때는 pre-defined한 box size에서 pre-defined image size를 나눈다. (h=w=size / image_size)

한 눈에 들어올 수 있게 아래와 같이 정리해봄

여기서

center 좌표는 feature map 사이즈를 기반으로,

width, height는 input image 사이지를 기반으로

되어 있으며

모두 해당 사이즈로 나누어 0~1로 normalize되어 있다는 것이다.

저작자표시 (새창열림)