LLM-as-a-Judge, Instruction Pretraining, Solving Benchmarks Instead of Real-World ML Problems, and More

Weekly updates and resources 7/22/24

Jul 23, 2024

∙ Paid

I apologize for this week’s list of resources coming out a day late. I spent this weekend on vacation and took a break today to spend some time with my kids. :)

Here are the most important machine learning resources and updates from the past week. There were a lot of excellent articles this week. I highly recommend checking out at least the top 5. I’ve updated the ML Road Map to include some of this week’s resources.

I share more frequent ML updates on X if you want to follow me there. You can support Society's Backend for just $1/mo to get a full list of everything I’m reading in your inbox each week. You can find last week's updates here.

Using LLMs for Evaluation

Researchers are using powerful large language models (LLMs) like GPT-4 to evaluate other models' outputs, a method called LLM-as-a-Judge. This technique is scalable, reduces the need for human evaluati

ons, and aligns well with human preferences, although it introduces some biases. Various methods, such as position switching and using multiple LLMs, help mitigate these biases.

Deep (Learning) Focus

Using LLMs for Evaluation

As large language models (LLMs) have become more and more capable, one of the most difficult aspects of working with these models is determining how to properly evaluate them. Many powerful models exist, and they each solve a wide variety of complex, open-ended tasks. As a result, discerning differences in performance between these mo…

10 months ago · 28 likes · 4 comments · Cameron R. Wolfe, Ph.D.

🥇Top ML Papers of the Week

This week's top ML papers cover improvements in legibility and efficiency for large language models (LLMs) and new encoding methods for better spreadsheet understanding. They also explore advanced prompt engineering, weak supervision for strong reasoning, and the vulnerabilities in LLMs to past tense jailbreaking. Additionally, the papers present frameworks for long-context retrieval, efficient reasoning distillation, and practical tips for evaluating advanced LLMs.

NLP Newsletter

🥇Top ML Papers of the Week

1). Improving Legibility of LLM Outputs - iteratively trains small verifiers to predict solution correctness, helpful provers to produce correct solutions accepted by the verifier, and sneaky provers that produce incorrect solutions that fool the verifier; this process helps train models that can produce text that is correct and easy to understand by bo…

10 months ago · 14 likes · 1 comment · elvis

Instruction Pretraining LLMs

The article discusses the use of instruction finetuning to enhance the training of large language models (LLMs). It introduces methods for generating instruction-response data and compares the efficiency of instruction pretraining with traditional pretraining. Additionally, the article reviews recent research advancements and techniques in the field of LLMs.

Ahead of AI

Instruction Pretraining LLMs

A lot has happened last month: Apple announced the integration of on-device LLMs, Nvidia shared their large Nemotron model, FlashAttention-3 was announced, Google's Gemma 2 came out, and much more. You've probably already read about it all in various news outlets. So, in this article, I want to focus on recent research centered on instruction finetuning, a fundamental technique for training LLMs…

10 months ago · 120 likes · 8 comments · Sebastian Raschka, PhD

SB 1047, AI regulation, and unlikely allies for open models

Efforts to regulate AI are increasing, but many experts argue that creating effective regulations is challenging due to the technology's complexity. California's SB 1047 bill, which impacts AI models, has sparked debate over its potential to hinder open-source development and favor big tech companies. The growing scrutiny of big tech and support for open-source AI models may shape the future landscape of AI regulation.

Interconnects

SB 1047, AI regulation, and unlikely allies for open models

For my about quarterly update in the Interconnects universe, we have a few items: We now have a basic merch store, if you are so inclined. I’ve got a bunch of interesting guests lined up for interviews later this summer. Subscribe to the podcast feed…

10 months ago · 11 likes · 2 comments · Nathan Lambert

Stuck on Benchmark Island

Machine learning research often focuses on "benchmark islands" rather than solving real problems. Researchers are drawn to these islands due to ease of publishing and funding. This trend leads to numerous papers that may lack practical value, as seen in COVID-19 diagnosis studies from X-ray images.

Mindful Modeler

Stuck on Benchmark Island

A lot of machine learning research has detached itself from solving real problems, and created their own "benchmark-islands". How are benchmark islands created? Why do researchers choose to live on them? Let’s look at an example of a benchmark island: hundreds, probably thousands of papers about COVID-19 diagnosis from X-ray i…

10 months ago · 13 likes · 4 comments · Christoph Molnar

The State of AI in China

China is rapidly advancing in artificial intelligence due to its centralized government control, which allows for extensive data use and minimal regulation. However, it faces significant challenges from import restrictions, investment limitations, and sanctions imposed by other countries. Despite these obstacles, China is working on developing domestic AI technologies and has shown progress with an open Chinese AI model topping recent leaderboards.

https://societysbackend.com/p/the-state-of-ai-in-china

cosmic-cortex/mlfz: An educational machine learning library.

The mlfz library is an educational tool for learning machine learning by examining simple, reference implementations of algorithms. Users are encouraged to explore the source code and documentation, which serves as an interactive textbook on machine learning. Contributions are welcome, and the creator offers a Mathematics of Machine Learning book for further support.

https://github.com/cosmic-cortex/mlfz

3 things parents and students told us about how generative AI can support learning

Google's research shows generative AI provides real-time feedback for parents, enhances learning on new subjects, and customizes pathways for students with learning differences. Teachers remain essential, with AI supporting rather than replacing them. Google prioritizes collaboration with educators to ensure AI tools are effective and beneficial.

https://blog.google/technology/ai/3-things-parents-and-students-told-us-about-how-generative-ai-can-support-learning/

Announcing the SCALE BETA

SCALE is a GPGPU toolkit that allows CUDA programs to run on AMD GPUs. It aims to make GPU programming more flexible, supporting multiple hardware vendors. Try the free SCALE beta and use one codebase for different GPUs.

https://scale-lang.com/posts/2024-07-12-release-announcement

The Engineer’s Guide To Deep Learning

The Engineer's Guide To Deep Learning discusses the current golden age of AI, focusing on the transformative impact of the Transformer model introduced in 2017. The guide aims to help engineers quickly grasp the concepts of neural networks, RNNs, NLP, and the Transformer through concise explanations and practical Python code examples. The author, Hironobu SUZUKI, shares his expertise in software engineering and AI, emphasizing the importance of understanding these advanced technologies for future breakthroughs.

https://www.interdb.jp/dl/index.html

Keep reading with a 7-day free trial

Subscribe to Society's Backend to keep reading this post and get 7 days of free access to the full post archives.