General Language Understanding Evaluation (GLUE)

Da Wiki AI.
Versione del 24 mar 2024 alle 17:10 di Alesaccoia (discussione | contributi) (Creata pagina con "Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding === Dataset Contenuti === ==== Corpus of Linguistic Acceptability (COLA) ==== Misura l'accettabilità grammaticale delle frasi Input: "They caused him to become angry by making him." Target: 0 ==== Stanford Sentiment Treebank (SST-2) ==== Misura il sentiment Input: "that loves its characters and communicates something rather beautiful about human...")
(diff) ← Versione meno recente | Versione attuale (diff) | Versione più recente → (diff)

Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding

Dataset Contenuti

Corpus of Linguistic Acceptability (COLA)

Misura l'accettabilità grammaticale delle frasi

Input: "They caused him to become angry by making him."
Target: 0

Stanford Sentiment Treebank (SST-2)

Misura il sentiment

Input: "that loves its characters and communicates something rather beautiful about human nature "
Target: 1

Quora Question Pairs (QQP)

Misura la somiglianza di due domande

Inputs:
1. "What is the best self help book you have read? Why? 
    How did it change your life?"
2. "What are the top self help books I should read?"
Target: 1

Semantic Textual Similarity Benchmark (STS-B)

Misura la somiglianza semantica, da 1 a 5. La metrica utilizzata per la alutazione è la correlazione.

Example 1:
Input:
1. "A plane is taking off."
2. "An air plane is taking off."
Target: 5
Example 2:
Input:
1. "A man is slicing a bun."
2. "A man is slicing an onion."
Target: 2.4

Microsoft Research Paraphrase Corpus (MRPC)

Giudica se una frase è la parafrasi dell'altra

Input:
1. "Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier ."
2. "With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ."
Target: 1

Multi-Genre Natural Language Inference (MNLI)

... to be continued -> https://medium.com/@priyankads/evaluate-language-understanding-of-ai-models-66dd56269a45

Links

Homepage di GLUE