General Language Understanding Evaluation (GLUE)
Nome: General Language Understanding Evaluation
Sigla: GLUE
Anno di creazione: 2018
URLHomePage: Homepage di GLUE
Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding
Dataset Contenuti
Corpus of Linguistic Acceptability (COLA)
Misura l'accettabilità grammaticale delle frasi
Input: "They caused him to become angry by making him." Target: 0
Stanford Sentiment Treebank (SST-2)
Misura il sentiment
Input: "that loves its characters and communicates something rather beautiful about human nature " Target: 1
Quora Question Pairs (QQP)
Misura la somiglianza di due domande
Inputs: 1. "What is the best self help book you have read? Why? How did it change your life?" 2. "What are the top self help books I should read?" Target: 1
Semantic Textual Similarity Benchmark (STS-B)
Misura la somiglianza semantica, da 1 a 5. La metrica utilizzata per la alutazione è la correlazione.
Example 1: Input: 1. "A plane is taking off." 2. "An air plane is taking off." Target: 5
Example 2: Input: 1. "A man is slicing a bun." 2. "A man is slicing an onion." Target: 2.4
Microsoft Research Paraphrase Corpus (MRPC)
Giudica se una frase è la parafrasi dell'altra
Input: 1. "Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier ." 2. "With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ." Target: 1
Multi-Genre Natural Language Inference (MNLI)
... to be continued -> https://medium.com/@priyankads/evaluate-language-understanding-of-ai-models-66dd56269a45
Links