An Integrated Topic Modeling with Classification for Semantic Information Retrieval in Large Scale Text Documents

GEETHA M; Dr. N.VIMALA

PDF (English)

Publicado: May 16, 2025

GEETHA M

Research Scholar, Deportment of computer science, LRG Gov. arts college for women, Tirupur, India.

Dr. N.VIMALA

Assistant Professor & HEAD, Department of Computer Science, Puratchi Thalaivi Amma Govt Arts & science college, Palladam, India.

Resumen

Big data has attracted considerable attention across scientific and engineering domains due to its vast potential and wide-ranging applications. Despite its advantages, several challenges must be addressed to improve the quality of service, particularly in information retrieval (IR)—a key area of computer science focused on efficiently retrieving relevant information from large datasets based on user queries. As the need for precise, expressive, and contextually relevant results grows, semantic IR from big data has become increasingly important for decision-making and analysis.This work proposes a deep learning approach for semantic information retrieval using a hybrid BERT-LDA model on large-scale text datasets. Following a pre-processing phase, the model integrates LDA for generating probabilistic topic distributions and BERT for capturing sentence-level semantic embeddings. These outputs are combined and served into a deep learning framework that incorporates a CNN module to extract inter-feature relationships, along with an Attention Mechanism (AM) module to emphasize significant features. Experimental evaluation on the DBpedia dataset demonstrates that this approach improves retrieval performance in terms of Accuracy, Precision, Recall, and F1-measure.

Número

Vol. 24 Núm. 01 (2025)

Sección

Articles

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.

Barra lateral del artículo

Contenido principal del artículo

Resumen

Detalles del artículo