Details
This project was started as part of the course CS657: Information Retrieval, in the Spring ‘18 term at IIT Kanpur under Prof. Arnab Bhattacharya, Department of Computer Science and Engineering, IIT Kanpur. It was later continued beyond the course into my second undergraduate research project.
Abstract
In this project, we introduced a novel pipeline to implement a query-biased multi-document abstractive summarisation. Traditional information retrieval systems return a ranked list of whole documents as the answer to a query. However, in many cases, not every part of an entire document is relevant to the query. Thus, it is desirable to retrieve from the set of retrieved documents, in a succinct manner, a summary of only the required information extracted from across all the relevant documents. The approach proposed involves retrieving relevant documents for each query, followed by extracting relevant passages from each document. This is followed by collating all such relevant passages and performing redundancy removal to keep the length of such a collated document sane. Finally, an abstractive summarisation approach is used to generate abstractive single-line summaries for our information need.