Semantic Segmentation of Text Using Deep Learning

Tiziano Lattisi; Davide Farina; Marco Ronchetti

doi:10.31577/cai_2022_1_78

Authors

Tiziano Lattisi Department of Information Engineering and Computer Science, Università di Trento, 38123 Trento, Italy & TxC2, Via Strada Granda 41, 38069 Nago-Torbole (TN), Italy
Davide Farina Department of Information Engineering and Computer Science, Università di Trento 38123 Trento, Italy
Marco Ronchetti Department of Information Engineering and Computer Science, Università di Trento 38123 Trento, Italy

DOI:

https://doi.org/10.31577/cai_2022_1_78

Keywords:

Text segmentation, semantic boundaries, BERT

Abstract

Given a text, can we segment it into semantically coherent sections in an automatic way? Can we detect the semantic boundaries, if we know how many they are? Can we determine how many semantically distinct sections are in the text? These are the questions we address in this paper. To respond, we use the Bidirectional Encoder Representation from Transformer (BERT) to analyze the text and evaluate a function that we call local incoherence, which we expect to show maxima at the points where a semantic boundary is detected. Our results, although preliminary, are encouraging and suggest that our approach can be successfully applied. However, they are quite sensitive with respect to the text quality, as it happens in the case in which the text is derived from an audio stream via Automatic Speech Recognition techniques.

Downloads

Download data is not yet available.

Semantic Segmentation of Text Using Deep Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Make a Submission

Keywords