Uncategorized

Are Open-Source Large Language Models Catching up?



{forest}
for tree=
forked edges,
grow’=0,
draw,
rounded corners,
node options=align=center,
text width=2.7cm,
s sep=6pt,
calign=edge midpoint,
,
[LLM’s Capabilities, fill=gray!45, parent
[General §3.1, domain_box
[AlpacaEval, MT-bench, ELO rating, Open-LLM leaderboard, datasets
[Llama-2, WizardLM, GodziLLa2, Zephyr, models]
]
]
[Agent §3.2, domain_box
[Using Tools, domain_box
[API-Bank, ToolBench, APIBench, ToolAlpaca, etc., datasets
[Gorilla, ToolLLaMA, few_models]
]
]
[Self-Debugging, domain_box
[InterCode-Bash, InterCode-SQL, MINT-MBPP, MINT-HumanEval, RoboCodeGen, etc., datasets
]
]
[Following NL Feedback, domain_box
[MINT, few_datasets]
]
[Exploring Environments, domain_box
[ALFWorld, InterCode-CTF, WebArena, datasets]
]
[Lemur-chat, AgentLlama, models]
]
[Logical Reasoning §3.3, domain_box
[Math, domain_box
[GSM8K, MATH, TheoremQA, etc., datasets
[WizardMath, few_models]
]
]
[Coding, domain_box
[HumanEval, MBPP, APPs, etc., datasets
[WizardCoder, few_models]
]
]
]
[Modelling Long-context §3.4, domain_box
[SCROLLS, Zero-SCROLLS, LongBench, L-Eval, BAMBOO, M4LE, etc., datasets
[Llama-2-long, few_models]
]
]
[Application-specific §3.5.5, domain_box
[Query-focused Summarization, domain_box
[QMSum, SQuALITY, CovidET, NEWTS, datasets
[finetuned model, few_models]
]
]
[Open-ended QA, domain_box
[NQ,TriviaQA,NewsQA,SQuAD,Quoref,
NarrativeQA,DROP, datasets
[InstructRetro, few_models]
]
]
[Medical, domain_box
[Dreaddit, loneliness, MIMIC-CXR, OpenI, etc., datasets
[MentaLLaMA, Radiology-Llama-2, few_models]
]
]
[Generating Structured Data, domain_box
[Rotowire, Struc-Bench-Latex, Struc-Bench-HTML, datasets
[Struct-Bench, few_models]
]
]
[Generating Critiques, domain_box
[AlpacaFarm, FairEval, CritiqueEval, etc., datasets
[Shepherd, few_models]
]
]
]
[Trustworthiness §3.6, domain_box
[Hallucination, domain_box
[TruthfulQA, FactualityPrompt, FActScore,
KoLA-KC, HaluEval, FACTOR, datasets
[Platypus,
Chain-of-Verification, etc., few_models]
]
]
[Safety, domain_box
[SafetyBench, XSTEST, etc., datasets]
]
]
]



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *