Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
LLM Leaderboard: Human Feedback
This talk presents a leaderboard app for evaluating LLMs with human feedback, analyzing model performance across demographics and discussing future improvements.
We will be presenting a leaderboard app for evaluating LLMs that we developed and the methodology behind it. Our goal was to gather feedback from human participants on what it is like working with these AI models on completing everyday tasks. We recruited verified participants and we analysed how different models scored with people from different demographic groups. We’re working on the next iteration of the leaderboard and we’d love to talk about what we have, our plans, and get feedback from the crowd.
Containerized Hugging Face Space tracks and displays a running User Experience Leaderboard.