Transformer (Architettura di Deep Learning): differenze tra le versioni

Versione attuale delle 21:36, 27 nov 2024

Transformer (Architettura di Deep Learning)
Nome Inglese	Transformer
Sigla
Anno Di Creazione	2017
Pubblicazione	Attention Is All You Need (2017)
URL	https://arxiv.org/pdf/1706.03762
Topic	Generazione, Reti neurali, Traduzione, Elaborazione del Linguaggio Naturale (NLP), Immagini

Architettura proposta originariamente sul paper Attention Is All You Need (2017), composta da un Encoder e un Decoder.

Vengono chiamati "Transformer Encoder" i modelli bidirezionali (Encoder-Only) come BERT, che utilizzando come obiettivo di pre-training il Masked-Language-Modeling (MLM), necessitano che la rappresentazione fonda sia il contesto a destra che quello a sinistra del token che dev'essere predetto, quindi utilizzano una self-attention bidirezionale
Vengono chiamati "Transformer Decoder" i modelli che usano un' attention "left-to-right" che quindi per la generazione del token successivo hanno a disposizione solo i token passati

Links

https://github.com/karpathy/minGPT

https://github.com/karpathy/nanoGPT

Un'implementazione di GPT2 in 175 linee di codice python

https://towardsdatascience.com/transformers-141e32e69591 3B1B - But what is a GPT? Visual intro to transformers

An Introduction to Transformers (Turner)

Coding a GPT with Andrej Karpathy

Introduction to self-attention by John Hewitt

History of language models by Brit Cruise

Paper about examples like the “woman - man”

@@ Riga 1: / Riga 1: @@
+{{template architettura
+|NomeInglese=Transformer
+|AnnoDiCreazione=2017
+|Pubblicazione=Attention Is All You Need (2017)
+|URLHomePage=https://arxiv.org/pdf/1706.03762
+|Topic=Generazione, Reti neurali, Traduzione, Elaborazione del Linguaggio Naturale (NLP), Immagini
+}}
 Architettura proposta originariamente sul paper [[Attention Is All You Need (2017)]], composta da un Encoder e un Decoder.
 * Vengono chiamati "Transformer Encoder" i modelli bidirezionali (Encoder-Only) come [[BERT]], che utilizzando come obiettivo di pre-training il [[Masked-Language-Modeling (MLM)]], necessitano che la rappresentazione fonda sia il contesto a destra che quello a sinistra del token che dev'essere predetto, quindi utilizzano una self-attention bidirezionale
 * Vengono chiamati "Transformer Decoder" i modelli che usano un' [[Attention Is All You Need (2017)|attention]] "left-to-right" che quindi per la <u>generazione</u> del token successivo hanno a disposizione solo i token passati
+[[File:Transformers Family Tree.jpg|miniatura|445x445px|Transformers Family Tree]]
 === Links ===
+https://github.com/karpathy/minGPT
+[https://github.com/karpathy/nanoGPT?tab=readme-ov-file https://github.com/karpathy/nanoGPT]
 [https://github.com/lutzroeder/gpt2/blob/main/gpt2.py Un'implementazione di GPT2 in 175 linee di codice python]
+https://towardsdatascience.com/transformers-141e32e69591[https://www.youtube.com/watch?v=wjZofJX0v4M 3B1B - But what is a GPT? Visual intro to transformers]
+[https://arxiv.org/pdf/2304.10557.pdf An Introduction to Transformers (Turner)]
+[https://www.youtube.com/watch?v=kCc8FmEb1nY&t=0s Coding a GPT with Andrej Karpathy]
+[https://web.stanford.edu/class/cs224n/readings/cs224n-self-attention-transformers-2023_draft.pdf Introduction to self-attention by John Hewitt]
+History of language models by Brit Cruise
+[https://arxiv.org/pdf/1301.3781.pdf Paper about examples like the “woman - man”]
 [[Category:Architettura]]
+{{#seo:
+            |title=Transformer
+            |title_mode=append
+            |keywords=reti neurali, "modelli linguistici", "elaborazione del linguaggio naturale", "NLP", "intelligenza artificiale", "apprendimento automatico", "architettura transformer", "encoder", "decoder", "self-attention", "BERT", "GPT"
+            |description=Questa pagina descrive l'architettura Transformer, un modello di rete neurale utilizzato nell'elaborazione del linguaggio naturale (NLP). Introdotta nel 2017, l'architettura Transformer si basa su un meccanismo di self-attention per elaborare le relazioni tra le parole in una frase. La pagina esplora le varianti Encoder e Decoder dell'architettura, con esempi come BERT e GPT, e fornisce link a risorse utili per approfondire l'argomento.
+            |image=Uploaded_file.png
+            }}