Parallel Retrieval of Dense Vectors in the Vector Space Model

Authors

  • Tobias Berka
  • Marian Vajteršic

Keywords:

Vector space model, symmetric multiprocessing, dense vector computations, message passing interface

Abstract

Modern information retrieval systems use distributed and parallel algorithms to meet their operational requirements, and commonly operate on sparse vectors; but dimensionality-reducing techniques produce dense and relatively short feature vectors. Motivated by this relevance of dense vectors, we have parallelized the vector space model for dense matrices and vectors. Our algorithm uses a hybrid partitioning splitting documents and features and operates on a mesh of hosts holding a block partitioned corpus matrix. We show that the theoretic speed-up is optimal. The empirical evaluation of an MPI-based implementation reveals that we obtain a super-linear speed-up on a cluster using Nehalem Xeon CPUs.

Downloads

Download data is not yet available.

Author Biographies

Tobias Berka

Department of Computer Sciences
University of Salzburg
Jakob-Haringer-Strasse 2
5020 Salzburg, Austria

Marian Vajteršic

Department of Computer Sciences
University of Salzburg, Austria
&
Department of Informatics, Mathematical Institute
Slovak Academy of Sciences, Bratislava, Slovakia

Downloads

Published

2012-01-26

How to Cite

Berka, T., & Vajteršic, M. (2012). Parallel Retrieval of Dense Vectors in the Vector Space Model. Computing and Informatics, 30(2), 247–265. Retrieved from http://147.213.75.17/ojs/index.php/cai/article/view/164

Issue

Section

Special Section Articles