MIDWRSeg: Acquiring Adaptive Multi-Scale Contextual Information for Road-Scene Semantic Segmentation

Authors

  • Bing Su Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
  • Peng Jin Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
  • Yifeng Lin Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China
  • Fuyang Wang Changzhou University, School of Computer Science and Artifical Intelligence, No. 2468, YanZeng West Rd., Wujin District, Changzhou City, Jiangsu Province, China

DOI:

https://doi.org/10.31577/cai_2024_4_849

Keywords:

Deep convolutional network, attention mechanism, semantic segmentation, autonomous driving

Abstract

We present MIDWRSeg, a simple semantic segmentation model based on neural network architecture. For complex road scenes, a large receptive field gathered at multiple scales is crucial for semantic segmentation tasks. Currently, there is an urgent need for the CNN architecture to establish long-range dependencies (large receptive fields) akin to the unique attention mechanism employed by the Transformer architecture. However, the high complexity of the attention mechanism formed by the matrix operations of Query, Key and Value cannot be borne by real-time semantic segmentation models. Therefore, a Multi-Scale Convolutional Attention (MSCA) block is constructed using inexpensive convolution operations to form long distance dependencies. In this method, the model adopts a Simple Inverted Residual (SIR) block for feature extraction in the initial encoding stage. After downsampling, the feature maps with reduced resolution undergo a sequence of stacked MSCA blocks, resulting in the formation of multi-scale long-range dependencies. Finally, in order to further enrich the size of the adaptive receptive field, an Internal Depth Wise Residual (IDWR) block is introduced. In the decoding stage, a simple decoder similar to FCN is used to alleviate computational consumption. Our method has formed a competitive advantage with existing real-time semantic segmentation models for encoder-decoder on Cityscapes and CamVid datasets. Our MIDWRSeg achieves 74.2 % mIoU at a speed of 88.9 FPS at Cityscapes test and achieves 76.8 % mIoU at a speed of 95.2 FPS at CamVid test.

Downloads

Download data is not yet available.

Downloads

Published

2024-08-31

How to Cite

Su, B., Jin, P., Lin, Y., & Wang, F. (2024). MIDWRSeg: Acquiring Adaptive Multi-Scale Contextual Information for Road-Scene Semantic Segmentation. Computing and Informatics, 43(4), 849–873. https://doi.org/10.31577/cai_2024_4_849