YOLO — ‘You only look once’ for Object Detection explained

While there are a lot of implementation of YOLO using a plethora of framework, there isn’t a single explanation of how it works. I will…

Mar 26, 2017

While there are a lot of implementation of YOLO using a plethora of framework, there isn’t a single explanation of how it works. I will explain you how it actually works and implementation of it in Self-driving Car vehicle detection dataset by Udacity. We will use pre-trained weights.

You can refer the original paper here.

First, How is YOLO different from other Object detectors? YOLO uses a single CNN network for both classification and localising the object using bounding boxes. This is the architecture of YOLO :

In the end, you will get a tensor value of 7*7*30.

For every grid cell, you will get two bounding boxes, which will make up for the starting 10 values of the 1*30 tensor. The remaining 20 denote the number of classes. The values denote the class score, which is the conditional probability of object belongs to class i, if an object is present in the box.

Next, we multiply all these class score with bounding box confidence and get class scores for different bounding boxes. We do this for all the grid cells. That is equal to 7*7*2 = 98.

Now we have class scores for each bounding box(Tensor dimension=20*1). Now let us focus on the dog in the image. The dog score for the bounding boxes will be present in (1,1) of the tensor in all the bounding box scores. We will now set a threshold value of scores and sort them descendingly.

Now we will use Non-max supression algorithm to set score to zero for redundant boxes.

Consider you have dog score for boundingbox1 as 0.5 and let this be the highest score and for box47 as 0.3. We will take an Intersection over Union of these values and if the value is greater than 0.5, we will set the value for box2 as zero,otherwise, we will continue to the next box. We do this for all boxes.

After all this has been done, we will be left with 2–3 boxes only. All others will be zero. Now, we select bbox to draw by class score value. This is explained in the image.

For code, you can check out the this github repo.

And if you want explanation for the code, drop a comment or email me at aashay96@gmail.com ! I would be more than happy to help.

P. S : Looking for a job in deep learning. If you like my post and can connect me with someone, I will be grateful! ☺

Aashay's Newsletter

Discussion about this post