An Integrated Topic Modeling with Classification for Semantic Information Retrieval in Large Scale Text Documents
Contenido principal del artículo
Resumen
Big data has attracted considerable attention across scientific and engineering domains due to its vast potential and wide-ranging applications. Despite its advantages, several challenges must be addressed to improve the quality of service, particularly in information retrieval (IR)—a key area of computer science focused on efficiently retrieving relevant information from large datasets based on user queries. As the need for precise, expressive, and contextually relevant results grows, semantic IR from big data has become increasingly important for decision-making and analysis.This work proposes a deep learning approach for semantic information retrieval using a hybrid BERT-LDA model on large-scale text datasets. Following a pre-processing phase, the model integrates LDA for generating probabilistic topic distributions and BERT for capturing sentence-level semantic embeddings. These outputs are combined and served into a deep learning framework that incorporates a CNN module to extract inter-feature relationships, along with an Attention Mechanism (AM) module to emphasize significant features. Experimental evaluation on the DBpedia dataset demonstrates that this approach improves retrieval performance in terms of Accuracy, Precision, Recall, and F1-measure.
Detalles del artículo

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.