Identification and Harmonization of Material Values and Product Names in a Group of Companies Using Nlp Methods
DOI:
https://doi.org/10.46991/BYSU.G.2026.17.1.083Keywords:
Material classification, data standardization, ERP systems, Master Data Management, Entity Resolution, Named Entity Recognition, NLP, precision, recall, F1-scoreAbstract
The article examines the problem of heterogeneous material-value and product names in a group of companies. The same physical material may be registered under different abbreviations, spellings, languages, internal codes, or incomplete descriptions in the accounting and enterprise systems of separate subsidiaries. This reduces the quality of consolidated reporting, complicates procurement analysis, inventory control, price comparison, and managerial decision-making at group level. The problem is formulated as an Entity Resolution and product-matching task and is addressed through Natural Language Processing and machine learning methods. A dataset of 17,258 material and product names was annotated manually and used to train a domain-specific Named Entity Recognition model. The proposed pipeline extracts structured components from free-text descriptions and creates a basis for unified material classification, centralized procurement, and analytical control in a group of companies. The article also adds a model-evaluation framework based on the confusion matrix, precision, recall, and F1-score.
Downloads
References
Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.
Cicco, V., & Firmani, D. (2019). Interpreting deep learning models for entity resolution: An experience report using LIME.
Papadakis, G., Skoutas, D., & Thanos, E. (2020). A Survey of Blocking and Filtering Techniques for Entity Resolution.
Reddy, A. (2025). An indepth guide to materials master data management. Verdantis. https://www.verdantis.com/materials-master-data-management
Explosion. (n.d.). spaCy usage documentation. Retrieved from https://spacy.io/usage
Trącz, J., et al. (2020). BERT-based similarity learning for product matching. Proceedings of the Workshop on Natural Language Processing in E-Commerce (EComNLP), 66-75.
Yadav, V., & Bethard, S. (2018). A Survey on Recent Advances in Named Entity Recognition from Deep Learning Models. Proceedings of COLING.
Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2020). spaCy: Industrial-strength natural language processing in Python. Zenodo. https://doi.org/10.5281/zenodo.1212303
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019.
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. Proceedings of NAACL-HLT 2016.
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. Proceedings of the 2018 International Conference on Management of Data, 19-34.
Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 39-48. https://doi.org/10.1145/956750.956759
Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018-2019. https://doi.org/10.14778/2367502.2367564
Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1-16. https://doi.org/10.1109/TKDE.2007.250581
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183-1210. https://doi.org/10.1080/01621459.1969.10501049
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.
Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Rafik Mashuryan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.