Correlation Analysis Algorithm for Massive Ultra-High-Dimensional Breast Ultrasound Radiomics Feature Data in a Distributed Environment

Authors

  • Yuehong Tang Tumor Hospital Affiliated to Xinjiang Medical University, Urumqi, Xinjiang, China & School of Public Health, Xinjiang Medical University, Urumqi, Xinjiang, China
  • Yan Chen The Medical School of Jiaxing University, Jiaxing Zhejiang, China
  • Wen Liu Artificial Intelligence and Smart Mine Engineering Technology Center, Xinjiang Institute of Engineering, Urumqi, China & Xinjiang Changsen Data Technology Co., Ltd, Urumqi 830011, China
  • Zheng Gu Artificial Intelligence and Smart Mine Engineering Technology Center, Xinjiang Institute of Engineering, Urumqi, China
  • Hui Yao Xinjiang Changsen Data Technology Co., Ltd., Urumqi 830011, China

DOI:

https://doi.org/10.31577/cai_2024_3_756

Keywords:

Radiomics, massive high-dimensional data, correlation analysis, distributed computing

Abstract

Radiomics is a technology that extracts a large number of quantitative features from high-throughput medical images and has become a focus of research. It can help in disease diagnosis, therapy planning, and prognosis evaluation through Big Data analysis algorithms. Radiomics technology can extract hundreds or even tens of thousands of quantifiable data features from medical images, which can no longer fit into the memory of one machine. Therefore, we propose a distributed correlation analysis algorithm (DFCA) based on a MapReduce distributed computing framework for breast ultrasound radiomics feature datasets. Each compute node will produce massive intermediate data while the DFCA calculates the Pearson correlation coefficient of radiomics features. With the increase of feature data and dimensions, the data transmission cost will be in a square growth. To reduce the cost, we propose a distributed correlation estimation algorithm (DFCEA) for radiomics features based on DFCA. The DFCEA algorithm estimates the Pearson correlation coefficient using an iterative method, which can further reduce the I/O cost. The experiment proved that our algorithms are more effective compared to the algorithms in the literature.

Downloads

Download data is not yet available.

Downloads

Published

2024-06-24

How to Cite

Tang, Y., Chen, Y., Liu, W., Gu, Z., & Yao, H. (2024). Correlation Analysis Algorithm for Massive Ultra-High-Dimensional Breast Ultrasound Radiomics Feature Data in a Distributed Environment. Computing and Informatics, 43(3), 756–776. https://doi.org/10.31577/cai_2024_3_756