INTRODUCTION
Text retrieval is the core of Information Retrieval. Most of the Natural Language applications deal with the automatic detection of semantic text similarity between documents. To conduct information retrieval experiments on unstructured documents and queries and to retrieve the most similar document for the given user query, the documents are retrieved in the order of similarity ranking. The domain considered here is University and, in particular, the information related to Administration, Academics, Examination, Finance, and Planning & Development branches of Universities. It includes a Query set and corresponding Document set.
DATASET DESCRIPTION
University Document Data Set in English (UDDSE)
This dataset consists of documents, which are the orders released by the different Universities or other officials related to higher education all over India, for the wellbeing of students and staff on academic affairs. These documents are available at the University website, Government websites, which are orders, and notices generally in Portable Document Format (pdf).
Sample document:
Short Queries for University Services in English (SQUSE)
This dataset consists of natural language queries from various stakeholders related with University affairs. There may be many doubts and queries from the students as well as employees on the process followed in a University which are expressed in the natural language. The query dataset is created by including the commonly asked general questions collected with the support of the University enquiry section. The short queries related to the University services are framed with the assistance of the staff and authorities working in the University enquiry branch. The queries are also framed based on the information collected directly from the students at the University campus. Some queries are also generated directly from the available documents.
Sample queries taken from the SQUSE data set
People
- RESHMA P K,, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)
- JYOTHY V A, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)
- Dr. LAJISH V L, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)