6 Comments

New Huggingface Leaderboard V2 to make the benchmarks more difficult is really cool. Thanks for sharing!

Expand full comment

No problem! Leaderboards/benchmarks are such an interesting space.

Expand full comment

As I have been saying for a while now, I read your issues with constant curiosity and interest. I appreciated and strongly recommend these two aspects - on which I added three telegraphic suggestions - which could also be profoundly inspiring for those who write on the topic for future pieces:

1. As performance is increasingly plateauing - will there be a wave of focus on other non-performance aspects for differentiation especially for non-expert consumers? What will be the first ones to focus on?

2. The 'sweet talk' of AI assistants - will this also be differentiated when all companies introduce it as a feature?

Plus: how you can build a personal assistant. Thank you for sharing these interesting ideas and links.

Expand full comment

Working on the personal assistant, currently. Not a lot of time to tackle it though and it's pretty complex (especially making generalizable).

1. I think performance is plateauing a lot less than people make it out to be and there's already a huge focus on other aspects of model performance than just training. I think we've still got a long way to go to understand AI and novel methods for performance improvement will continuously pop up. It's hard to know what to focus on but the AI community as a whole will likely focus on all of them.

2. Not sure about this one. It all depends how companies reinforce their models and I think each will do it differently. A lot is still being figured out about this.

Expand full comment

Thank you really for your answers Logan. I'm very curious about these topics and how people that wok everyday on this look at their evolutions. Your view is one of my 'unmissable'. Looking forward to reading other issues on these topics!

Expand full comment
Comment deleted
Jun 30
Comment deleted
Expand full comment

Thanks for sharing! I'll take a look

Expand full comment