MS COCO Dataset - かすブログ

deep learning 学習・検証用の画像を探していたところ、

MS COCO （Microsoft）のdatasetとapiがよくまとまっててとても使いやすそうだった。
データ量も豊富そう。

object detectには使えるが、顔認証のような用途向けでは無い。
（顔データはない）

python用のapiとmatlab用のapiが公開されている。
慣れてるpythonのapiを使ってMS COCO apiの動作確認を実施した。

環境

mac os mojative v10.4

python 3.6.5

導入

ms coco api のgitをダウンロード

https://github.com/cocodataset/cocoapi.git

imageとannotationは別途必要なので、Microsoftの公式サイトからダウンロード

COCO - Common Objects in Context

- 2014 val images
- 2014 Train/Val Annotations

これをそれぞれgitからダウンロードしたapi ディレクトリの直下(PythonAPI, README.txt等と同階層)に
images
annotations
というディレクトリ名で置く。

pipで必要なパッケージをインストール

pip install cython
pip install setuptools
pip install stringio
pip install scikit-image
pip install jupyter

cythonとsetuptoolsはcoco apiのmake時にエラーが出るようだ。
scikit-imageとjupyterはMS COCOのexampleを表示するのに必要
※他にもいるかも...?

coco api make

coco/PythonAPI 直下でmakeを実行

$coco/PythonAPI/make

Jupiter notebook

jupyterのnotebookでms cocoのデモ用ipynbを実行する。

[coco/PythonAPI$]jupyter notebook

そのままデモの内容を参考にすれば一通りのapiの動作を確認できる。

pycocoDemo

jupyter notebookを実行すると、ブラウザで以下のようなページが表示される。

f:id:skattun:20190408120500p:plain — Jupiter notebook実行

pycocoDemoを選択すると、いくつかのコードブロックに分かれてあらかじめ実行結果が表示されている。
それぞれ画面上部の実行ボタンを押下して動作を確認していくと、一通りの使い方がわかる。

pycocoDemo画面

%matplotlib inline
from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

dataDir='..'
dataType='val2014'
annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)

dataDirは親ディレクトリ、dataTypeはダウンロードしたval2014を設定

# initialize COCO api for instance annotations
coco=COCO(annFile)
# display COCO categories and supercategories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))

nms = set([cat['supercategory'] for cat in cats])
print('COCO supercategories: \n{}'.format(' '.join(nms)))

カテゴリの一覧を表示する。catはカテゴリのこと。

# get all images containing given categories, select one at random
catIds = coco.getCatIds(catNms=['person','dog','skateboard']);
imgIds = coco.getImgIds(catIds=catIds );
imgIds = coco.getImgIds(imgIds = [324158])
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]

人と犬とスケートボードが写っている画像をランダムで表示。
catNmsのところのカテゴリを変更すると色々検索できる。

# load and display image
# I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))
# use url to load image
I = io.imread(img['coco_url'])
plt.axis('off')
plt.imshow(I)
plt.show()

img['coco_url'])はprintしたら
http://images.cocodataset.org/val2014/COCO_val2014_000000324158.jpg
とcocodatasetのurlが入っていた。ローカルのimageファイルを指定したい場合は、

I = io.imread(path + img['file_name'])

pathを自分で指定可能。
axis offは座標表示OFFしてるだけ。

# load and display instance annotations
plt.imshow(I); plt.axis('off')
annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
coco.showAnns(anns)

annがannotationのことで、人、犬、スケボの領域をマークしてくれる。
領域はskeltonという複数点の2Dポリゴンのような形で保持されている。
coco.showAnns(anns)ではこのskeletonの領域がマークされるが、
skeletonとは別にbboxというオブジェクト全体を矩形で囲った領域情報も取れる。

ann["bbox"]

annを読み込んだらbboxキーで取れる。
float情報だが、int()で変換してopencvでroiかけたところ、それっぽい領域が取れるので、
処理過程で小数点が生じただけ？

import cv2
# plt.show()
img = cv2.imread(imgDir + img['file_name'])
cv2.imwrite("test_image.png", img)
i = 0
for ann in anns:
    x,y,w,h = ann["bbox"]
    roi = img[int(y):int(y+h), int(x):int(x+w)]
    cv2.imwrite("test_image_" + str(i) + ".png", roi)
    i += 1

元画像
f:id:skattun:20190408235313p:plain

bboxをroiで切り抜き

参考サイト

cocodataset.org

qiita.com