目標檢測是計算機視覺中一個非常流行的任務,在這個任務中,給定一個圖像,你預測圖像中物體的包圍盒(通常是矩形的) ,并且識別物體的類型。在這個圖像中可能有多個對象,而且現在有各種先進的技術和框架來解決這個問題,例如 Faster-RCNN 和 YOLOv3。
本文將討論圖像中只有一個感興趣的對象的情況。這里的重點更多是關于如何讀取圖像及其邊界框、調整大小和正確執行增強,而不是模型本身。目標是很好地掌握對象檢測背后的基本思想,你可以對其進行擴展以更好地理解更復雜的技術。
本文中的所有代碼都在下面的鏈接中:https://jovian.ai/aakanksha-ns/road-signs-bounding-box-prediction。
給定一個由路標組成的圖像,預測路標周圍的包圍盒,并識別路標的類型。這些路標包括以下四種:
這就是所謂的多任務學習問題,因為它涉及執行兩個任務: 1)回歸找到包圍盒坐標,2)分類識別道路標志的類型。
我使用了來自 Kaggle 的道路標志檢測數據集,鏈接如下:https://www.kaggle.com/andrewmvd/road-sign-detection
它由877張圖像組成。這是一個相當不平衡的數據集,大多數圖像屬于限速類,但由于我們更關注邊界框預測,因此可以忽略不平衡。
每個圖像的注釋都存儲在單獨的 XML 文件中。我按照以下步驟創建了訓練數據集:
def filelist(root, file_type): """Returns a fully-qualified list of filenames under root directory""" return [os.path.join(directory_path, f) for directory_path, directory_name, files in os.walk(root) for f in files if f.endswith(file_type)]def generate_train_df (anno_path): annotations = filelist(anno_path, '.xml') anno_list = [] for anno_path in annotations: root = ET.parse(anno_path).getroot() anno = {} anno['filename'] = Path(str(images_path) + '/'+ root.find("./filename").text) anno['width'] = root.find("./size/width").text anno['height'] = root.find("./size/height").text anno['class'] = root.find("./object/name").text anno['xmin'] = int(root.find("./object/bndbox/xmin").text) anno['ymin'] = int(root.find("./object/bndbox/ymin").text) anno['xmax'] = int(root.find("./object/bndbox/xmax").text) anno['ymax'] = int(root.find("./object/bndbox/ymax").text) anno_list.append(anno) return pd.DataFrame(anno_list)
#label encode targetclass_dict = {'speedlimit': 0, 'stop': 1, 'crosswalk': 2, 'trafficlight': 3}df_train['class'] = df_train['class'].apply(lambda x: class_dict[x])
由于訓練一個計算機視覺模型需要的圖像是相同的大小,我們需要調整我們的圖像和他們相應的包圍盒。調整圖像的大小很簡單,但是調整包圍盒的大小有點棘手,因為每個包圍盒都與圖像及其尺寸相關。
下面是調整包圍盒大小的工作原理:
從調整完大小的掩碼中提取邊界框坐標。
def create_mask(bb, x): """Creates a mask for the bounding box of same shape as image""" rows,cols,*_ = x.shape Y = np.zeros((rows, cols)) bb = bb.astype(np.int) Y[bb[0]:bb[2], bb[1]:bb[3]] = 1. return Ydef mask_to_bb(Y): """Convert mask Y to a bounding box, assumes 0 as background nonzero object""" cols, rows = np.nonzero(Y) if len(cols)==0: return np.zeros(4, dtype=np.float32) top_row = np.min(rows) left_col = np.min(cols) bottom_row = np.max(rows) right_col = np.max(cols) return np.array([left_col, top_row, right_col, bottom_row], dtype=np.float32)def create_bb_array(x): """Generates bounding box array from a train_df row""" return np.array([x[5],x[4],x[7],x[6]])
def resize_image_bb(read_path,write_path,bb,sz): """Resize an image and its bounding box and write image to new path""" im = read_image(read_path) im_resized = cv2.resize(im, (int(1.49*sz), sz)) Y_resized = cv2.resize(create_mask(bb, im), (int(1.49*sz), sz)) new_path = str(write_path/read_path.parts[-1]) cv2.imwrite(new_path, cv2.cvtColor(im_resized, cv2.COLOR_RGB2BGR)) return new_path, mask_to_bb(Y_resized)
#Populating Training DF with new paths and bounding boxesnew_paths = []new_bbs = []train_path_resized = Path('./road_signs/images_resized')for index, row in df_train.iterrows(): new_path,new_bb = resize_image_bb(row['filename'], train_path_resized, create_bb_array(row.values),300) new_paths.append(new_path) new_bbs.append(new_bb)df_train['new_path'] = new_pathsdf_train['new_bb'] = new_bbs
數據增強是一種通過使用現有圖像的不同變體創建新的訓練圖像來更好地概括我們的模型的技術。我們當前的訓練集中只有 800 張圖像,因此數據增強對于確保我們的模型不會過擬合非常重要。
對于這個問題,我使用了翻轉、旋轉、中心裁剪和隨機裁剪。
這里唯一需要記住的是確保包圍盒也以與圖像相同的方式進行轉換。
# modified from fast.aidef crop(im, r, c, target_r, target_c): return im[r:r+target_r, c:c+target_c]# random crop to the original sizedef random_crop(x, r_pix=8): """ Returns a random crop""" r, c,*_ = x.shape c_pix = round(r_pix*c/r) rand_r = random.uniform(0, 1) rand_c = random.uniform(0, 1) start_r = np.floor(2*rand_r*r_pix).astype(int) start_c = np.floor(2*rand_c*c_pix).astype(int) return crop(x, start_r, start_c, r-2*r_pix, c-2*c_pix)def center_crop(x, r_pix=8): r, c,*_ = x.shape c_pix = round(r_pix*c/r) return crop(x, r_pix, c_pix, r-2*r_pix, c-2*c_pix)
def rotate_cv(im, deg, y=False, mode=cv2.BORDER_REFLECT, interpolation=cv2.INTER_AREA): """ Rotates an image by deg degrees""" r,c,*_ = im.shape M = cv2.getRotationMatrix2D((c/2,r/2),deg,1) if y: return cv2.warpAffine(im, M,(c,r), borderMode=cv2.BORDER_CONSTANT) return cv2.warpAffine(im,M,(c,r), borderMode=mode, flags=cv2.WARP_FILL_OUTLIERS+interpolation)def random_cropXY(x, Y, r_pix=8): """ Returns a random crop""" r, c,*_ = x.shape c_pix = round(r_pix*c/r) rand_r = random.uniform(0, 1) rand_c = random.uniform(0, 1) start_r = np.floor(2*rand_r*r_pix).astype(int) start_c = np.floor(2*rand_c*c_pix).astype(int) xx = crop(x, start_r, start_c, r-2*r_pix, c-2*c_pix) YY = crop(Y, start_r, start_c, r-2*r_pix, c-2*c_pix) return xx, YYdef transformsXY(path, bb, transforms): x = cv2.imread(str(path)).astype(np.float32) x = cv2.cvtColor(x, cv2.COLOR_BGR2RGB)/255 Y = create_mask(bb, x) if transforms: rdeg = (np.random.random()-.50)*20 x = rotate_cv(x, rdeg) Y = rotate_cv(Y, rdeg, y=True) if np.random.random() > 0.5: x = np.fliplr(x).copy() Y = np.fliplr(Y).copy() x, Y = random_cropXY(x, Y) else: x, Y = center_crop(x), center_crop(Y) return x, mask_to_bb(Y)
def create_corner_rect(bb, color='red'): bb = np.array(bb, dtype=np.float32) return plt.Rectangle((bb[1], bb[0]), bb[3]-bb[1], bb[2]-bb[0], color=color, fill=False, lw=3)def show_corner_bb(im, bb): plt.imshow(im) plt.gca().add_patch(create_corner_rect(bb))
圖片
現在我們已經有了數據增強,我們可以進行訓練驗證拆分并創建我們的 PyTorch 數據集。我們使用 ImageNet 統計數據對圖像進行標準化,因為我們使用的是預訓練的 ResNet 模型并在訓練時在我們的數據集中應用數據增強。
X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.2, random_state=42)
def normalize(im): """Normalizes images with Imagenet stats.""" imagenet_stats = np.array([[0.485, 0.456, 0.406], [0.229, 0.224, 0.225]]) return (im - imagenet_stats[0])/imagenet_stats[1]
class RoadDataset(Dataset): def __init__(self, paths, bb, y, transforms=False): self.transforms = transforms self.paths = paths.values self.bb = bb.values self.y = y.values def __len__(self): return len(self.paths) def __getitem__(self, idx): path = self.paths[idx] y_class = self.y[idx] x, y_bb = transformsXY(path, self.bb[idx], self.transforms) x = normalize(x) x = np.rollaxis(x, 2) return x, y_class, y_bb
train_ds = RoadDataset(X_train['new_path'],X_train['new_bb'] ,y_train, transforms=True)valid_ds = RoadDataset(X_val['new_path'],X_val['new_bb'],y_val)
batch_size = 64train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)valid_dl = DataLoader(valid_ds, batch_size=batch_size)
對于這個模型,我使用了一個非常簡單的預先訓練的 resNet-34模型。由于我們有兩個任務要完成,這里有兩個最后的層: 包圍盒回歸器和圖像分類器。
class BB_model(nn.Module): def __init__(self): super(BB_model, self).__init__() resnet = models.resnet34(pretrained=True) layers = list(resnet.children())[:8] self.features1 = nn.Sequential(*layers[:6]) self.features2 = nn.Sequential(*layers[6:]) self.classifier = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 4)) self.bb = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 4)) def forward(self, x): x = self.features1(x) x = self.features2(x) x = F.relu(x) x = nn.AdaptiveAvgPool2d((1,1))(x) x = x.view(x.shape[0], -1) return self.classifier(x), self.bb(x)
對于損失,我們需要同時考慮分類損失和邊界框回歸損失,因此我們使用交叉熵和 L1 損失(真實值和預測坐標之間的所有絕對差之和)的組合。我已經將 L1 損失縮放了 1000 倍,因為分類和回歸損失都在相似的范圍內。除此之外,它是一個標準的 PyTorch 訓練循環(使用 GPU):
def update_optimizer(optimizer, lr): for i, param_group in enumerate(optimizer.param_groups): param_group["lr"] = lr
def train_epocs(model, optimizer, train_dl, val_dl, epochs=10,C=1000): idx = 0 for i in range(epochs): model.train() total = 0 sum_loss = 0 for x, y_class, y_bb in train_dl: batch = y_class.shape[0] x = x.cuda().float() y_class = y_class.cuda() y_bb = y_bb.cuda().float() out_class, out_bb = model(x) loss_class = F.cross_entropy(out_class, y_class, reduction="sum") loss_bb = F.l1_loss(out_bb, y_bb, reduction="none").sum(1) loss_bb = loss_bb.sum() loss = loss_class + loss_bb/C optimizer.zero_grad() loss.backward() optimizer.step() idx += 1 total += batch sum_loss += loss.item() train_loss = sum_loss/total val_loss, val_acc = val_metrics(model, valid_dl, C) print("train_loss %.3f val_loss %.3f val_acc %.3f" % (train_loss, val_loss, val_acc)) return sum_loss/total
def val_metrics(model, valid_dl, C=1000): model.eval() total = 0 sum_loss = 0 correct = 0 for x, y_class, y_bb in valid_dl: batch = y_class.shape[0] x = x.cuda().float() y_class = y_class.cuda() y_bb = y_bb.cuda().float() out_class, out_bb = model(x) loss_class = F.cross_entropy(out_class, y_class, reduction="sum") loss_bb = F.l1_loss(out_bb, y_bb, reduction="none").sum(1) loss_bb = loss_bb.sum() loss = loss_class + loss_bb/C _, pred = torch.max(out_class, 1) correct += pred.eq(y_class).sum().item() sum_loss += loss.item() total += batch return sum_loss/total, correct/total
model = BB_model().cuda()parameters = filter(lambda p: p.requires_grad, model.parameters())optimizer = torch.optim.Adam(parameters, lr=0.006)
train_epocs(model, optimizer, train_dl, valid_dl, epochs=15)
現在我們已經完成了訓練,我們可以選擇一個隨機圖像并在上面測試我們的模型。盡管我們只有相當少量的訓練圖像,但是我們最終在測試圖像上得到了一個相當不錯的預測。
使用手機拍攝真實照片并測試模型將是一項有趣的練習。另一個有趣的實驗是不執行任何數據增強并訓練模型并比較兩個模型。
# resizing test imageim = read_image('./road_signs/images_resized/road789.png')im = cv2.resize(im, (int(1.49*300), 300))cv2.imwrite('./road_signs/road_signs_test/road789.jpg', cv2.cvtColor(im, cv2.COLOR_RGB2BGR))
# test Datasettest_ds = RoadDataset(pd.DataFrame([{'path':'./road_signs/road_signs_test/road789.jpg'}])['path'],pd.DataFrame([{'bb':np.array([0,0,0,0])}])['bb'],pd.DataFrame([{'y':[0]}])['y'])x, y_class, y_bb = test_ds[0]
xx = torch.FloatTensor(x[None,])xx.shape
# predictionout_class, out_bb = model(xx.cuda())out_class, out_bb
現在我們已經介紹了目標檢測的基本原理,并從頭開始實現它,您可以將這些想法擴展到多對象情況,并嘗試更復雜的模型,如 RCNN 和 YOLO!
本文鏈接:http://www.www897cc.com/showinfo-26-19916-0.html基于Pytorch的從零開始的目標檢測 | 附源碼
聲明:本網頁內容旨在傳播知識,若有侵權等問題請及時與本網聯系,我們將在第一時間刪除處理。郵件:2376512515@qq.com
上一篇: 2024年的后端和Web開發趨勢
下一篇: C++ 如何解析函數調用