10
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Document retrieval using clustering-based Aquila hash-Q optimization with query expansion based on pseudo relevance feedback

, &
Received 23 Sep 2023, Accepted 09 Apr 2024, Published online: 23 Apr 2024
 

Abstract

A document retrieval system helps users to retrieve the relevant documents corresponding to their query quickly and easily. In the real world, document retrieval is a difficult task due to high volumes of data, unstructured data, and different formats of data. Even though many research techniques are introduced, major problems like vocabulary mismatch and non-linear matching still need to be solved. In this work, the Aquila hash-q optimizer is the proposed matching technique with the clustering technique to retrieve the document in a time-efficient manner for the user query without collision. First, preprocessing is done by eliminating the stop words from the document, stemming, and grouping documents in a cluster into a single document using Hierarchical Density-based Sampling Spatial Cluster of Applications with Noise (HDBSSCAN) clustering. This clustering algorithm is powerful, robust to noise, and scalable and identifies clusters of documents that are related to each other. Additionally, the sampling technique used in this clustering algorithm increases the clustering speed by reducing the size of the document which improves the performance of document retrieval systems. Secondly, the queries are searched using the Aquila hash-q optimizer matching technique by which the relevant documents are retrieved. The Aquila hash-q optimization works by pre-computing a hash table of the terms in a document collection and then using this hash table to quickly identify the relevant documents from the given query. This can significantly improve the speed of document retrieval, especially for large document collections. Aquila hash-q optimization can improve the accuracy, efficiency, and scalability of document retrieval systems. The effectiveness of the Hierarchical Density-Based Clustering Aquila Optimization approach is determined by various analyses through NPL, LISA, and CACM data in terms of precision @ 5 (0.497), precision @ 10 (0.425), Mean Average Precision (MAP) (0.462) by comparing our approach with various methods. As a result, the Aquila hash-q optimizer is the proposed matching technique to retrieve the document in a time-efficient manner for the user query without collision.

Data availability statement

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Bhushan Inje, Kapil Nagwanshi and Radha Krishna Rambola. The first draft of the manuscript was written by Bhushan Inje and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Conceptualization: Bhushan Inje; Methodology: Bhushan Inje, Kapil Nagwanshi; Formal analysis and investigation: Bhushan Inje, Radha Krishna Rambola; Writing – original draft preparation: Radha Krishna Rambola; Writing – review and editing: Bhushan Inje, Kapil Nagwanshi; Supervision: Radha Krishna Rambola.

Disclosure statement

We the authors of paper “Document Retrieval Using Clustering-based Aquila Hash-Q Optimization with Query Expansion based on Pseudo Relevance Feedback” here by declare that we have no conflict of interest in any manner for this work.

Ethical approval

This article does not contain any studies with human participants and/or animals performed by any of the authors.

Additional information

Notes on contributors

Bhushan Inje

Bhushan Inje is a research scholer at Amity University Jaipur and currently working as assistant professor in CSE Dept, NMIMS University, Shirpur. I graduated B.E from Pune University in 2007 and M.E from K.B.C N.M.U Jalgoan in 2015. My current research interests in the areas of Artificial Intelligence, Machine learning, Soft Computing, Nature Inspired Optimisation Algorithms, Information Retrieval and Natural Language Processing.

Kapil Nagwanshi

Kapil Nagwanshi has received his PhD from the Chhatisgarh Swami Vivekanand Technical University Bhilai, India. He is currently working as an Associate Professor at SoS E&T Guru Ghasidas Vishwavidyalay (A Central University), Bilaspur, India. His primary domain of teaching and research includes the internet of things, digital image processing, cyber forensics, data science and engineering, AI, and computer networking. He has guided 15 MTech scholars and currently supervising six PhD scholars. He is a senior member of IEEE, YHAI, and a life member of CSI, IETE, and members of IAENG, IACSIT, and some other professional bodies. He is a reviewer of reputed journals such as IEEE Access, Imaging Science Journal, Journal of Real-Time Image Processing, and International Journal of Computer and Electrical Engineering.

Radhakrishna Rambola

Radhakrishna Rambola, Associate Professor, CSE Dept, NMIMS University Mumbai, Shirpur. He had also worked as Associate Professor at Galgotias University, Noida (2014–2017) and Assistant Professor at Asia Pacific Institute of Information Technology, Panipat (2009–2014). Earned his B.E. (Computer Sci. and Engg) from University of Madras; M.Tech. (Computer Sci. and Engg) from Allahabad Agricultural Institute, Allahabad and PhD from T. M. B. University, Bhagalpur. He had worked as Software Engineer at Aspyre Systems, Chennai (1998–2000); Corporate Teaching Manager at Software Solution Integrated Ltd, Chennai, Hyderabad, Kanpur and New Delhi (2000–2003); Sr. Lecturer, Fr. Agnel Institute of Tech, New Delhi (2003–2004 and 2006–2007); Project Manager at Eastern Software System Ltd, Delhi, Mumbai; also worked in Congo in Central Africa (2007–2009).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.