Automating Test Case Identification in Java Open Source Projects on GitHub

Authors

  • Matej Madeja Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Jaroslav Porubän Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Michaela Bačíková Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Matúš Sulír Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Ján Juhár Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Sergej Chodarev Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia
  • Filip Gurbáľ Department of Computers and Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Košice, Slovakia

DOI:

https://doi.org/10.31577/cai_2021_3_575

Keywords:

Program comprehension, Java testing, testing practices, test smells, open-source projects, GitHub

Abstract

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word "test" in different natural languages; (ii) whether the number of occurrences of the word "test" correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3 M GitHub projects. We analyzed 20 340 test cases in 803 classes manually and 170 k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word "test" and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97 % of test cases; (iii) 15 % of the analyzed classes used ttmain() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.

Downloads

Download data is not yet available.

Downloads

Published

2021-11-30

How to Cite

Madeja, M., Porubän, J., Bačíková, M., Sulír, M., Juhár, J., Chodarev, S., & Gurbáľ, F. (2021). Automating Test Case Identification in Java Open Source Projects on GitHub. Computing and Informatics, 40(3), 575–605. https://doi.org/10.31577/cai_2021_3_575