General Language Understanding Evaluation (GLUE): differenze tra le versioni

Versione attuale delle 13:47, 17 ago 2024

General Language Understanding Evaluation (GLUE)
Nome	General Language Understanding Evaluation
Sigla	GLUE
Anno di creazione	2018
URLHomePage	https://gluebenchmark.com
Pubblicazione

Famoso benchmark che misura la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding

Dataset Contenuti

Corpus of Linguistic Acceptability (COLA)

Misura l'accettabilità grammaticale delle frasi

Input: "They caused him to become angry by making him."
Target: 0

Stanford Sentiment Treebank (SST-2)

Misura il sentiment

Input: "that loves its characters and communicates something rather beautiful about human nature "
Target: 1

Quora Question Pairs (QQP)

Misura la somiglianza di due domande

Inputs:
1. "What is the best self help book you have read? Why? 
    How did it change your life?"
2. "What are the top self help books I should read?"
Target: 1

Semantic Textual Similarity Benchmark (STS-B)

Misura la somiglianza semantica, da 1 a 5. La metrica utilizzata per la alutazione è la correlazione.

Example 1:
Input:
1. "A plane is taking off."
2. "An air plane is taking off."
Target: 5

Example 2:
Input:
1. "A man is slicing a bun."
2. "A man is slicing an onion."
Target: 2.4

Microsoft Research Paraphrase Corpus (MRPC)

Giudica se una frase è la parafrasi dell'altra

Input:
1. "Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier ."
2. "With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ."
Target: 1

Multi-Genre Natural Language Inference (MNLI)

... to be continued -> https://medium.com/@priyankads/evaluate-language-understanding-of-ai-models-66dd56269a45

Links

Homepage di GLUE

@@ Riga 57: / Riga 57: @@
 [https://gluebenchmark.com Homepage di GLUE]
+{{#seo:
+            |title=Your page title
+            |title_mode=append
+            |keywords=valutazione, modelli di linguaggio, Natural Language Understanding, dataset, grammatica, sentiment, somiglianza, parafrasi, inferenza
+            |description=GLUE, acronimo di General Language Understanding Evaluation, è un famoso benchmark creato nel 2018 per misurare la qualità dei modelli di linguaggio in vari compiti di Natural Language Understanding. Questo benchmark utilizza diversi dataset, tra cui COLA per l'accettabilità grammaticale, SST-2 per il sentiment, QQP per la somiglianza tra domande, STS-B per la somiglianza semantica e MRPC per la parafrasi. GLUE è uno strumento essenziale per la ricerca nel campo dell'NLP.
+            |image=
+            }}