The research scientist at RIX Technologies Juris Rāts has attended on July 13th the 11th International conference DATA 2022: Conference on Data Science, Technology and Applications, where he presented the paper Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents. The purpose of the conference is to bring together researchers, engineers and practitioners interested on databases, big data, data mining, data management, data security and other aspects of information systems and technology involving advanced applications of data.
The paper outlines the results of the research Creating a model of the document representation allowing to improve the accuracy of the automated document classification run by RIX technologies. The goal of the research was to develop and evaluate a flexible machine learning based model for automation of indexing and routing of the enterprise incoming documents. The model provides methods of initial configuration for the enterprise that supports a customer to select the important document topics and to label training sets for topic recognition bots. The model provides as well the methods for analysis and evaluation of the topics and training sets thus hinting the customer for changes of the topic set (e.g. merging or splitting of particular topics) useful to increase the models performance.
The customer may profit from the results of the configuration to evaluate the convenience of the model for her company and to decide on it’s deployment. If deployed the model gradually expands the training sets for the topic recognition bots. This may enable automation of indexing and routing of more documents with more accuracy.