General Language Understanding Evaluation (GLUE)

Nome: General Language Understanding Evaluation

Sigla: GLUE

Anno di creazione: 2018

Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding

Dataset Contenuti

Corpus of Linguistic Acceptability (COLA)

Misura l'accettabilità grammaticale delle frasi

Input: "They caused him to become angry by making him."
Target: 0

Stanford Sentiment Treebank (SST-2)

Misura il sentiment

Input: "that loves its characters and communicates something rather beautiful about human nature "
Target: 1

Quora Question Pairs (QQP)

Misura la somiglianza di due domande

Inputs:
1. "What is the best self help book you have read? Why? 
    How did it change your life?"
2. "What are the top self help books I should read?"
Target: 1

Semantic Textual Similarity Benchmark (STS-B)

Misura la somiglianza semantica, da 1 a 5. La metrica utilizzata per la alutazione è la correlazione.

Example 1:
Input:
1. "A plane is taking off."
2. "An air plane is taking off."
Target: 5

Example 2:
Input:
1. "A man is slicing a bun."
2. "A man is slicing an onion."
Target: 2.4

Microsoft Research Paraphrase Corpus (MRPC)

Giudica se una frase è la parafrasi dell'altra

Input:
1. "Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier ."
2. "With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ."
Target: 1

Multi-Genre Natural Language Inference (MNLI)

... to be continued -> https://medium.com/@priyankads/evaluate-language-understanding-of-ai-models-66dd56269a45

Links

Homepage di GLUE