MIDWRSeg: Acquiring Adaptive Multi-Scale Contextual Information for Road-Scene Semantic Segmentation

Bing Su; Peng Jin; Yifeng Lin; Fuyang Wang

doi:10.31577/cai_2024_4_849

Authors

Bing Su Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
Peng Jin Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
Yifeng Lin Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
Fuyang Wang Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China

DOI:

https://doi.org/10.31577/cai_2024_4_849

Keywords:

Deep convolutional network, attention mechanism, semantic segmentation, autonomous driving

Abstract

We present MIDWRSeg, a simple semantic segmentation model based on neural network architecture. For complex road scenes, a large receptive field gathered at multiple scales is crucial for semantic segmentation tasks. Currently, there is an urgent need for the CNN architecture to establish long-range dependencies (large receptive fields) akin to the unique attention mechanism employed by the Transformer architecture. However, the high complexity of the attention mechanism formed by the matrix operations of Query, Key and Value cannot be borne by real-time semantic segmentation models. Therefore, a Multi-Scale Convolutional Attention (MSCA) block is constructed using inexpensive convolution operations to form long distance dependencies. In this method, the model adopts a Simple Inverted Residual (SIR) block for feature extraction in the initial encoding stage. After downsampling, the feature maps with reduced resolution undergo a sequence of stacked MSCA blocks, resulting in the formation of multi-scale long-range dependencies. Finally, in order to further enrich the size of the adaptive receptive field, an Internal Depth Wise Residual (IDWR) block is introduced. In the decoding stage, a simple decoder similar to FCN is used to alleviate computational consumption. Our method has formed a competitive advantage with existing real-time semantic segmentation models for encoder-decoder on Cityscapes and CamVid datasets. Our MIDWRSeg achieves 74.2 % mIoU at a speed of 88.9 FPS at Cityscapes test and achieves 76.8 % mIoU at a speed of 95.2 FPS at CamVid test.

Downloads

Download data is not yet available.

MIDWRSeg: Acquiring Adaptive Multi-Scale Contextual Information for Road-Scene Semantic Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords