Question answering

Is a

Technology

Industry

Industry attributes

Parent Industry

Natural language processing (NLP)

Technology attributes

Related Industries

Computer science

‌

Computational linguistics

Other attributes

Short Name

Wikidata ID

Q1074173

Overview

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP). QA systems enable users to retrieve exact answers for questions posed in natural language, using either a pre-structured database or a collection of natural language documents.

QA systems can be considered an advanced form of information retrieval that makes it possible to retrieve answers using natural language queries. With an increasing demand for systems that deliver short, precise, question-specific answers, QA is a growing area of research worldwide.

Question answering system architecture

Architecture

QA system architecture is typically broken down into three modules:

Question processing
Document processing
Answer processing

Question processing

Question processing receives the input from the user (question in natural language) for analysis (obtaining preliminary information), classification, and reformulation.

Question classification breaks down the type of question to better understand the context for the answer. There are two main approaches to question classification: manual and automatic.

Manual classification applies hand-made rules for identifying expected answer types. While these rules can be accurate, they are time-consuming and non-extensible in nature. Some manual approaches improve answer detection by breaking down the question type into

What questions
Why questions
Who questions
How questions
Where questions

In contrast, automatic classifications are extensible to new questions types with acceptable accuracy.

Reformulation of the question converts it into a pre-trained vector with several examples of question and answer pairs. The main types of answer provided by QA systems include the following:

Factoid—a simple fact
List—a set of entities that satisfies the given criteria defined in the question
Definition—a summary of a short passage explaining the meaning of the subject/object of the question
Complex question—utilizes information in its context to usually merge retrieved passages using a range of techniques.

Document processing

Document processing takes the reformulated question as its input and uses an internal information retrieval system to map the closest documents to the input presented. A set of paragraphs, depending on the focus of the questions, are extracted and sorted according to their similarity and relevance to the question.

The document processing module includes three main tasks:

Retrieve a set of relevant documents from the IR system
Filter the documents and reduce them to a concise set of paragraphs
Order and rank the documents by similarity and relevance to the question

Answer processing

This module uses extraction techniques on the result from the document processing module to present an answer to the question. While it returns a simple answer to the question, it may require merging and summarizing information from different sources, as well as dealing with uncertainty or contradiction.

Answer processing can be broken down into three major tasks:

Identify statements/answers within the concise set of documents.
Extract the relevant output by selecting appropriate phrases and words that answer the question.
Validate the answer obtained in the previous step using evaluation metrics defined during the design of the QA system.

Types of question answering systems

Web-based

Web-based question answering systems use search engines to retrieve webpages potentially containing answers to the

questions before applying filters and ranking the recovered passages. The data available on the web has the

characteristics of semi-structure, heterogeneity, and distributivity.

Natural language processing (NLP)

NLP QA systems use linguistic intuitions and machine learning methods to extract answers from retrieved passages.

Knowledge-based

This type finds answers from structured data sources (knowledge base) instead of unstructured text. Standard data-based queries are used in replacement of word-based searches. This type of system makes use of structured data, such as ontology. An ontology describes a conceptual representation of concepts and their relationships within a specific domain.

Hybrid

High-performance QA systems use multiple types of resources. A hybrid approach uses a combination of web-based, NLP, and knowledge-based QA.

Techniques

A range of techniques, algorithms, frameworks, and tools are utilized in QA systems:

Deep neural network
Graph-based
Lemmatization
Latent Semantic Analysis (LSA)
Multi-document summarization
Naive Bayes
Named entity recognition
Parser
Part-of-speech (POS) Tagging
Relation finding (Similarity Distance)
Shallow syntactical
Stemming
Support vector machine
Text chunking
Tokenization

Datasets

Training a QA system requires large datasets. There are many publicly available text and graph-based datasets that have been generated through crowd-sourcing or manual annotation.

NLP Question Answering Datasets

Four possible outcomes from a QA system.

Evaluation metrics

There are many methods for evaluating the performance of QA systems. Metrics are based on the difference between the actual answer and the predicted answer the system returns, shown by a 2 x 2 contingency table.

True positive—fragment correctly selected
False negative—fragment incorrectly not selected
False positive—fragment incorrectly selected
True negative—fragment correctly not selected

Basic evaluation metrics (F1, precision, and recall) can be calculated from the rate of these occurrences.

Applications

With the amount of information available online, there has been a rise in the use of automated answering systems that can accurately extract information. These systems have a range of applications:

Customer support
Education
Search engines
Data analytics

Prominent QA Researchers

Timeline

No Timeline data yet.

Companies in this industry

Further Resources

Title

Author

Link

Type

Date

A literature review on question answering techniques, paradigms and systems

Marco Antonio Calijorne Soares, Fernando Silva Parreiras

https://www.sciencedirect.com/science/article/pii/S131915781830082X

Web

July 2020

Question Answering Systems: Survey and Trends

Abdelghani Bouziane, Djelloul Bouchiha, Noureddine Doumi, Mimoun Malkic

https://www.sciencedirect.com/science/article/pii/S1877050915034663

Web

2015