Lecture 10: Faster RCNN
courses.physics.illinois.edu › ece417 › fa2020Faster RCNN assumes that the original image is 1064 1064 pixels, which is then downsampled to the 224 224-pixel size required as input to VGG16. There are 4 layers of max pooling before the last conv layer, so each feature vector in the last conv layer represents 24 1064 224 24 1064 224 = 76 76 input pixels feature vector: The last conv layer ...