nNuno Guimarães, nRicardo Campos, nAlípio Jorgen

Pre-trained language models: What do they know?

Diagram of pretrained language models common sense capabilities and possible domains of application.

Abstract

Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.

This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Technologies > Artificial Intelligence

Author: nNuno Guimarães, nRicardo Campos, nAlípio Jorgen

Pre‐trained language models: What do they know?

Abstract