"Evaluating the Quality of the Indonesian Scientific Journal References" by Ariani Indrawati, Ambar Yoganingrum et al.

Libraries at University of Nebraska-Lincoln


Date of this Version


Document Type



Ariani Indrawati, Ambar Yoganingrum, and Pradipta Yuwono, “Evaluating the Quality of the Indonesian Scientific Journal References Using ParsCit, CERMINE and GROBID” VOL, ISSUE (YEAR): PAGES.

List of References

  1. Z. Guo and H. Jin, “Reference Metadata Extraction from Scientific Papers,” in 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 2011, pp. 45–49.
  2. M.-Y. Day, R. Tzong-HanTsai, and Cheng-Lung Sung, “Reference metadata extraction using a hierarchical knowledge representation framework,” Decis Support Syst, pp. 152–167, 2007.
  3. I. Councill, C. Giles, and M.-Y. Kan, “ParsCit: an open-source CRF reference string parsing package,” in International Conference on Language Resources and Evaluation, 2008.
  4. D. Tkaczyk, S. Paweł, M. Fedoryszak, P. J. Dendek, and Ł. Bolikowski, “CERMINE: automatic extraction of structured metadata from scientific literature,” Int J Doc Anal Recognit, vol. 18, no. 4, pp. 317–335, 2015.
  5. P. Lopez, “GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications,” Res Adv Technol Digit Libr, pp. 473–474, 2009.
  6. D. Matsuoka, M. Ohta, A. Takasu, and J. Adachi, “Examination of Effective Features for CRF-Based Bibliography Extraction from Reference Strings,” in The Eleventh International Conference on Digital Information Management (ICDIM 2016), 2016, pp. 243–248.
  7. D. Namikoshi, M. Ohta, A. Takasu, and J. Adachi, “CRF-Based Bibliography Extraction from Reference Strings Using a Small Amount of Training Data,” in The Twelfth International Conference on Digital Information Management (ICDIM 2017), 2017, pp. 59–64.
  8. M. Ohta, D. Arauchi, A. Takasu, and J. Adachi, “CRF-based Bibliography Extraction from Reference Strings Focusing on Various,” in 10th IAPR International Workshop on Document Analysis Systems, 2012, pp. 276–281.
  9. M. Ohta, D. Arauchi, A. Takasu, and J. Adachi, “Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings,” in 11th IAPR International Workshop on Document Analysis Systems, 2014, pp. 287–292.
  10. B. Ojokoh, M. Zhang, and J. Tang, “A trigram hidden Markov model for metadata extraction from heterogeneous references,” Inf Sci (Ny), pp. 1538–1551, 2011.
  11. X. Zhang, J. Zou, D. Le, and G. Thoma, “A Structural SVM Approach for Reference Parsing,” in Ninth International Conference on Machine Learning and Applications, 2010, pp. 479–484.
  12. E. Suryawati and D. Widyantoro, “Combination of Heuristic, Rule-Based and Machine Learning for Bibliography Extraction,” in 5th International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME), 2017, pp. 276–281.
  13. “Anystyle-Parser.” [Online]. Available: https://www.rubydoc.info/gems/anystyle-parser. [Accessed: 26-May-2019].
  14. M. Carl Staelin, “Biblio: automatic meta-data extraction,” Int J Doc Anal Recognit, vol. 10, no. 2, pp. 113–126, 2007.
  15. C.-C. Chen, K.-H. Yang, C.-L. Chen, and J.-M. Ho, “BibPro: A Citation Parser Based on Sequence Alignment,” IEEE Trans Knowl Data Eng, vol. 24, no. 2, pp. 236–250, 2012.
  16. T. Nishimura, “Parse Citation List in Paper,” 2016. [Online]. Available: https://github.com/nishimuuu/citation.
  17. M. Romanello, “citation-parser 0.4.1,” 2017. .
  18. P. Lopez, “GROBID from PDF to structured documents,” 2015. [Online]. Available: https://grobid.readthedocs.io/en/latest/grobid-04-2015.pdf. [Accessed: 20-May-2019].
  19. PDDI, “Statistik Jumlah Artikel,” 2018. [Online]. Available: http://isjd.pdii.lipi.go.id/.
  20. J. Fagan and J. Keach, “Build, buy, open source, or web 2.0? making an informed decision for your library,” Comput Libr, pp. 8–11, 2010.


There are several open-source tools available to extract the bibliographic references of the Pdf. Those tools based on the various approaches including rule-based approach, knowledge-based approach, machine learning-based approach, and the combination. To improve the services of the Indonesian Scientific Journal Database (ISJD), Center for Scientific Data and Documentation – Indonesian Institute of Sciences (PDDI-LIPI) intends to have an automatic bibliographic references extraction tool. The paper aims to analyze the quality of the reference metadata of the local journals with the three open-source tools, namely ParsCit, CERMINE and GROBID. The accuracy test of the three tools are poor. Those are 0.555, 0.633, and 0.605 for ParsCit, CERMINE, and GROBID respectively. It caused by many authors do not use a reference manager when they write the bibliography section. On the such condition this paper proposed to build an application to identify and correct errors in the bibliographic references of paper in ISJD. This application become a liaison between ISJD and open source tool for the bibliographic reference extraction. This paper proposed the combination of building software and using an open source.
