Society's Backend

Society's Backend

Share this post

Society's Backend
Society's Backend
LLM-as-a-Judge, Instruction Pretraining, Solving Benchmarks Instead of Real-World ML Problems, and More
Copy link
Facebook
Email
Notes
More
ML for SWEs

LLM-as-a-Judge, Instruction Pretraining, Solving Benchmarks Instead of Real-World ML Problems, and More

Weekly updates and resources 7/22/24

Logan Thorneloe's avatar
Logan Thorneloe
Jul 23, 2024
∙ Paid
6

Share this post

Society's Backend
Society's Backend
LLM-as-a-Judge, Instruction Pretraining, Solving Benchmarks Instead of Real-World ML Problems, and More
Copy link
Facebook
Email
Notes
More
4
Share

I apologize for this week’s list of resources coming out a day late. I spent this weekend on vacation and took a break today to spend some time with my kids. :)

Here are the most important machine learning resources and updates from the past week. There were a lot of excellent articles this week. I highly recommend checking out at least the top 5. I’ve updated the ML Road Map to include some of this week’s resources.

I share more frequent ML updates on X if you want to follow me there. You can support Society's Backend for just $1/mo to get a full list of everything I’m reading in your inbox each week. You can find last week's updates here.

  1. Using LLMs for Evaluation

  2. 🥇Top ML Papers of the Week

  3. Instruction Pretraining LLMs

  4. SB 1047, AI regulation, and unlikely allies for open models

  5. Stuck on Benchmark Island

  6. The State of AI in China

  7. cosmic-cortex/mlfz: An educational machine learning library.

  8. 3 things parents and students told us about how generative AI can support learning

  9. Announcing the SCALE BETA

  10. The Engineer’s Guide To Deep Learning

  11. The Rise of Agentic Data Generation

  12. The State of Chinese AI

  13. Fast, accurate climate modeling with NeuralGCM

  14. Google DeepMind's Chatbot-Powered Robot Is Part of a Bigger Revolution

  15. How we built AlphaFold 3 to predict the structure and interaction of all of life’s molecules

  16. Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation

  17. Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

  18. Deepfake Detection: Building the Future with Advanced AI Techniques [Deepfakes]

  19. Introducing the Coalition for Secure AI (CoSAI) and founding member organizations

  20. Supervised Machine Learning for Science

  21. Apple shows off open AI prowess: new models outperform Mistral and Hugging Face offerings

  22. dair-ai/ML-Papers-Explained: Explanation to key concepts in ML

  23. Long context

  24. LLM Engineer's Handbook: Master the art of engineering Large Language Models from concept to production

  25. Hallucination Detector, Battle of the Image Generators, How Open Are Open Models?, Copyright Claim Fails Against GitHub

  26. Tips for Effectively Training Your Machine Learning Models

  27. Detecting Deepfakes: Leveraging AI Artifacts for Scalable and Generalizable detection [Deepfakes]

  28. AI Nationalization is Inevitable – Leopold Aschenbrenner


Using LLMs for Evaluation

Researchers are using powerful large language models (LLMs) like GPT-4 to evaluate other models' outputs, a method called LLM-as-a-Judge. This technique is scalable, reduces the need for human evaluati

ons, and aligns well with human preferences, although it introduces some biases. Various methods, such as position switching and using multiple LLMs, help mitigate these biases.

Deep (Learning) Focus
Using LLMs for Evaluation
As large language models (LLMs) have become more and more capable, one of the most difficult aspects of working with these models is determining how to properly evaluate them. Many powerful models exist, and they each solve a wide variety of complex, open-ended tasks. As a result, discerning differences in performance between these mo…
Read more
10 months ago · 28 likes · 4 comments · Cameron R. Wolfe, Ph.D.

 


🥇Top ML Papers of the Week

This week's top ML papers cover improvements in legibility and efficiency for large language models (LLMs) and new encoding methods for better spreadsheet understanding. They also explore advanced prompt engineering, weak supervision for strong reasoning, and the vulnerabilities in LLMs to past tense jailbreaking. Additionally, the papers present frameworks for long-context retrieval, efficient reasoning distillation, and practical tips for evaluating advanced LLMs.

NLP Newsletter
🥇Top ML Papers of the Week
1). Improving Legibility of LLM Outputs - iteratively trains small verifiers to predict solution correctness, helpful provers to produce correct solutions accepted by the verifier, and sneaky provers that produce incorrect solutions that fool the verifier; this process helps train models that can produce text that is correct and easy to understand by bo…
Read more
10 months ago · 14 likes · 1 comment · elvis

 


Instruction Pretraining LLMs

The article discusses the use of instruction finetuning to enhance the training of large language models (LLMs). It introduces methods for generating instruction-response data and compares the efficiency of instruction pretraining with traditional pretraining. Additionally, the article reviews recent research advancements and techniques in the field of LLMs.

Ahead of AI
Instruction Pretraining LLMs
A lot has happened last month: Apple announced the integration of on-device LLMs, Nvidia shared their large Nemotron model, FlashAttention-3 was announced, Google's Gemma 2 came out, and much more. You've probably already read about it all in various news outlets. So, in this article, I want to focus on recent research centered on instruction finetuning, a fundamental technique for training LLMs…
Read more
10 months ago · 120 likes · 8 comments · Sebastian Raschka, PhD

SB 1047, AI regulation, and unlikely allies for open models

Efforts to regulate AI are increasing, but many experts argue that creating effective regulations is challenging due to the technology's complexity. California's SB 1047 bill, which impacts AI models, has sparked debate over its potential to hinder open-source development and favor big tech companies. The growing scrutiny of big tech and support for open-source AI models may shape the future landscape of AI regulation.

Interconnects
SB 1047, AI regulation, and unlikely allies for open models
For my about quarterly update in the Interconnects universe, we have a few items: We now have a basic merch store, if you are so inclined. I’ve got a bunch of interesting guests lined up for interviews later this summer. Subscribe to the podcast feed…
Read more
10 months ago · 11 likes · 2 comments · Nathan Lambert

 


Stuck on Benchmark Island

Machine learning research often focuses on "benchmark islands" rather than solving real problems. Researchers are drawn to these islands due to ease of publishing and funding. This trend leads to numerous papers that may lack practical value, as seen in COVID-19 diagnosis studies from X-ray images.

Mindful Modeler
Stuck on Benchmark Island
A lot of machine learning research has detached itself from solving real problems, and created their own "benchmark-islands". How are benchmark islands created? Why do researchers choose to live on them? Let’s look at an example of a benchmark island: hundreds, probably thousands of papers about COVID-19 diagnosis from X-ray i…
Read more
10 months ago · 13 likes · 4 comments · Christoph Molnar

 


The State of AI in China

China is rapidly advancing in artificial intelligence due to its centralized government control, which allows for extensive data use and minimal regulation. However, it faces significant challenges from import restrictions, investment limitations, and sanctions imposed by other countries. Despite these obstacles, China is working on developing domestic AI technologies and has shown progress with an open Chinese AI model topping recent leaderboards.

https://societysbackend.com/p/the-state-of-ai-in-china


cosmic-cortex/mlfz: An educational machine learning library.

The mlfz library is an educational tool for learning machine learning by examining simple, reference implementations of algorithms. Users are encouraged to explore the source code and documentation, which serves as an interactive textbook on machine learning. Contributions are welcome, and the creator offers a Mathematics of Machine Learning book for further support.

https://github.com/cosmic-cortex/mlfz 


3 things parents and students told us about how generative AI can support learning

Google's research shows generative AI provides real-time feedback for parents, enhances learning on new subjects, and customizes pathways for students with learning differences. Teachers remain essential, with AI supporting rather than replacing them. Google prioritizes collaboration with educators to ensure AI tools are effective and beneficial.

https://blog.google/technology/ai/3-things-parents-and-students-told-us-about-how-generative-ai-can-support-learning/ 


Announcing the SCALE BETA

SCALE is a GPGPU toolkit that allows CUDA programs to run on AMD GPUs. It aims to make GPU programming more flexible, supporting multiple hardware vendors. Try the free SCALE beta and use one codebase for different GPUs.

https://scale-lang.com/posts/2024-07-12-release-announcement 


The Engineer’s Guide To Deep Learning

The Engineer's Guide To Deep Learning discusses the current golden age of AI, focusing on the transformative impact of the Transformer model introduced in 2017. The guide aims to help engineers quickly grasp the concepts of neural networks, RNNs, NLP, and the Transformer through concise explanations and practical Python code examples. The author, Hironobu SUZUKI, shares his expertise in software engineering and AI, emphasizing the importance of understanding these advanced technologies for future breakthroughs.

https://www.interdb.jp/dl/index.html 


Keep reading with a 7-day free trial

Subscribe to Society's Backend to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Logan Thorneloe
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More