MMLU
Intro
Il benchmark MMLU (Large Language Models Understanding) è stato ideato per valutare le capacità di comprensione dei modelli di linguaggio di grandi dimensioni. Questo tipo di benchmark si concentra su vari aspetti della comprensione del linguaggio naturale, inclusa la capacità dei modelli di interpretare, generare testo, e comprendere le sfumature del linguaggio umano.
Esempi di domande:
One of the reasons that the government discourages and regulates monopolies is that (A) producer surplus is lost and consumer surplus is gained. (B) monopoly prices ensure productive efficiency but cost society allocative efficiency. (C) monopoly firms do not engage in significant research and development. (D) consumer surplus is lost with higher prices and lower levels of output.
When you drop a ball from rest it accelerates downward at 9.8 m/s2. If you instead throw it downward assuming no air resistance its acceleration immediately after leaving your hand is (A) 9.8 m/s2 (B) more than 9.8 m/s2 (C) less than 9.8 m/s2 (D) Cannot say unless the speed of throw is given.