Document and Query dataset in University Domain

 INTRODUCTION

Text retrieval is the core of Information Retrieval. Most of the Natural Language applications deal with the automatic detection of semantic text similarity between documents. To conduct information retrieval experiments on unstructured documents and queries and to retrieve the most similar document for the given user query, the documents are retrieved in the order of similarity ranking. The domain considered here is University and, in particular, the information related to Administration, Academics, Examination, Finance, and Planning & Development branches of Universities.  It includes a Query set and corresponding Document set. 

DATASET DESCRIPTION

University Document Data Set in English (UDDSE)

This dataset consists of documents, which are the orders released by the different Universities or other officials related to higher education all over India, for the wellbeing of students and staff on academic affairs. These documents are available at the University website, Government websites, which are orders, and notices generally in Portable Document Format (pdf).  

Dataset

Category

Total Number of
Documents in the Category


University Document Data Set in English (UDDSE)

Academics

550

Administration

660

Examination

690

Finance

540

Planning & Development

360

 Sample document:

 

Short Queries for University Services in English (SQUSE)

This dataset consists of natural language queries from various stakeholders related with University affairs. There may be many doubts and queries from the students as well as employees on the process followed in a University which are expressed in the natural language. The query dataset is created by including the commonly asked general questions collected with the support of the University enquiry section. The short queries related to the University services are framed with the assistance of the staff and authorities working in the University enquiry branch. The queries are also framed based on the information collected directly from the students at the University campus. Some queries are also generated directly from the available documents.

Dataset

Category

Total Number of
Documents in the Category


Short Queries for University Services in English (SQUSE)

Academics

1500

Administration

1800

Examination

1800

Finance

1500

Planning & Development

900

 

Sample queries taken from the SQUSE data set

Sl.No.

Natural language queries

1

How to apply for an open degree

2

How to apply for TC from the distance education office

3

Can a student under regular mode continue through distance education during the course

4

What is the fee for confidential mark lists

6

What is the difference between private registration and distance education

People

  1. RESHMA P K,, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)
  2. JYOTHY V A, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)
  3. Dr. LAJISH V L, University of Calicut, Kerala, India. (This email address is being protected from spambots. You need JavaScript enabled to view it.)