Identification and Harmonization of Material Values and Product Names in a Group of Companies Using Nlp Methods

Authors

DOI:

https://doi.org/10.46991/BYSU.G.2026.17.1.083

Keywords:

Material classification, data standardization, ERP systems, Master Data Management, Entity Resolution, Named Entity Recognition, NLP, precision, recall, F1-score

Abstract

The article examines the problem of heterogeneous material-value and product names in a group of companies. The same physical material may be registered under different abbreviations, spellings, languages, internal codes, or incomplete descriptions in the accounting and enterprise systems of separate subsidiaries. This reduces the quality of consolidated reporting, complicates procurement analysis, inventory control, price comparison, and managerial decision-making at group level. The problem is formulated as an Entity Resolution and product-matching task and is addressed through Natural Language Processing and machine learning methods. A dataset of 17,258 material and product names was annotated manually and used to train a domain-specific Named Entity Recognition model. The proposed pipeline extracts structured components from free-text descriptions and creates a basis for unified material classification, centralized procurement, and analytical control in a group of companies. The article also adds a model-evaluation framework based on the confusion matrix, precision, recall, and F1-score.

Downloads

Download data is not yet available.

Author Biography

  • Rafik Mashuryan, Yerevan State University

    PhD student in the Chair of Mathematical Modeling in Economics, Faculty of Economics and Management, Yerevan State University.

References

Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.

Cicco, V., & Firmani, D. (2019). Interpreting deep learning models for entity resolution: An experience report using LIME.

Papadakis, G., Skoutas, D., & Thanos, E. (2020). A Survey of Blocking and Filtering Techniques for Entity Resolution.

Reddy, A. (2025). An indepth guide to materials master data management. Verdantis. https://www.verdantis.com/materials-master-data-management

Explosion. (n.d.). spaCy usage documentation. Retrieved from https://spacy.io/usage

Trącz, J., et al. (2020). BERT-based similarity learning for product matching. Proceedings of the Workshop on Natural Language Processing in E-Commerce (EComNLP), 66-75.

Yadav, V., & Bethard, S. (2018). A Survey on Recent Advances in Named Entity Recognition from Deep Learning Models. Proceedings of COLING.

Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2020). spaCy: Industrial-strength natural language processing in Python. Zenodo. https://doi.org/10.5281/zenodo.1212303

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. Proceedings of NAACL-HLT 2016.

Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. Proceedings of the 2018 International Conference on Management of Data, 19-34.

Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 39-48. https://doi.org/10.1145/956750.956759

Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018-2019. https://doi.org/10.14778/2367502.2367564

Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1-16. https://doi.org/10.1109/TKDE.2007.250581

Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183-1210. https://doi.org/10.1080/01621459.1969.10501049

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.

Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

Downloads

Published

2026-06-23

Issue

Section

Economic and mathematical modeling

How to Cite

Mashuryan, R. (2026). Identification and Harmonization of Material Values and Product Names in a Group of Companies Using Nlp Methods. Bulletin of Yerevan University G: Economics, 17(1(47), 83-91. https://doi.org/10.46991/BYSU.G.2026.17.1.083