General Language Understanding Evaluation (GLUE)
General Language Understanding Evaluation (GLUE) | |
---|---|
Nome | General Language Understanding Evaluation |
Sigla | GLUE |
Anno di creazione | 2018 |
URLHomePage | https://gluebenchmark.com |
Pubblicazione |
Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding
Dataset Contenuti
Corpus of Linguistic Acceptability (COLA)
Misura l'accettabilità grammaticale delle frasi
Input: "They caused him to become angry by making him." Target: 0
Stanford Sentiment Treebank (SST-2)
Misura il sentiment
Input: "that loves its characters and communicates something rather beautiful about human nature " Target: 1
Quora Question Pairs (QQP)
Misura la somiglianza di due domande
Inputs: 1. "What is the best self help book you have read? Why? How did it change your life?" 2. "What are the top self help books I should read?" Target: 1
Semantic Textual Similarity Benchmark (STS-B)
Misura la somiglianza semantica, da 1 a 5. La metrica utilizzata per la alutazione è la correlazione.
Example 1: Input: 1. "A plane is taking off." 2. "An air plane is taking off." Target: 5
Example 2: Input: 1. "A man is slicing a bun." 2. "A man is slicing an onion." Target: 2.4
Microsoft Research Paraphrase Corpus (MRPC)
Giudica se una frase è la parafrasi dell'altra
Input: 1. "Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier ." 2. "With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ." Target: 1
Multi-Genre Natural Language Inference (MNLI)
... to be continued -> https://medium.com/@priyankads/evaluate-language-understanding-of-ai-models-66dd56269a45
Links