LLM-as-a-Judge, Instruction Pretraining, Solving Benchmarks Instead of Real-World ML Problems, and More
Weekly updates and resources 7/22/24
I apologize for this week’s list of resources coming out a day late. I spent this weekend on vacation and took a break today to spend some time with my kids. :)
Here are the most important machine learning resources and updates from the past week. There were a lot of excellent articles this week. I highly recommend checking out at least the top 5. I’ve updated the ML Road Map to include some of this week’s resources.
I share more frequent ML updates on X if you want to follow me there. You can support Society's Backend for just $1/mo to get a full list of everything I’m reading in your inbox each week. You can find last week's updates here.
cosmic-cortex/mlfz: An educational machine learning library.
3 things parents and students told us about how generative AI can support learning
Google DeepMind's Chatbot-Powered Robot Is Part of a Bigger Revolution
How we built AlphaFold 3 to predict the structure and interaction of all of life’s molecules
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Deepfake Detection: Building the Future with Advanced AI Techniques [Deepfakes]
Introducing the Coalition for Secure AI (CoSAI) and founding member organizations
Apple shows off open AI prowess: new models outperform Mistral and Hugging Face offerings
dair-ai/ML-Papers-Explained: Explanation to key concepts in ML
Detecting Deepfakes: Leveraging AI Artifacts for Scalable and Generalizable detection [Deepfakes]
Using LLMs for Evaluation
Researchers are using powerful large language models (LLMs) like GPT-4 to evaluate other models' outputs, a method called LLM-as-a-Judge. This technique is scalable, reduces the need for human evaluati
ons, and aligns well with human preferences, although it introduces some biases. Various methods, such as position switching and using multiple LLMs, help mitigate these biases.
Â
🥇Top ML Papers of the Week
This week's top ML papers cover improvements in legibility and efficiency for large language models (LLMs) and new encoding methods for better spreadsheet understanding. They also explore advanced prompt engineering, weak supervision for strong reasoning, and the vulnerabilities in LLMs to past tense jailbreaking. Additionally, the papers present frameworks for long-context retrieval, efficient reasoning distillation, and practical tips for evaluating advanced LLMs.
Â
Instruction Pretraining LLMs
The article discusses the use of instruction finetuning to enhance the training of large language models (LLMs). It introduces methods for generating instruction-response data and compares the efficiency of instruction pretraining with traditional pretraining. Additionally, the article reviews recent research advancements and techniques in the field of LLMs.
SB 1047, AI regulation, and unlikely allies for open models
Efforts to regulate AI are increasing, but many experts argue that creating effective regulations is challenging due to the technology's complexity. California's SB 1047 bill, which impacts AI models, has sparked debate over its potential to hinder open-source development and favor big tech companies. The growing scrutiny of big tech and support for open-source AI models may shape the future landscape of AI regulation.
Â
Stuck on Benchmark Island
Machine learning research often focuses on "benchmark islands" rather than solving real problems. Researchers are drawn to these islands due to ease of publishing and funding. This trend leads to numerous papers that may lack practical value, as seen in COVID-19 diagnosis studies from X-ray images.
Â
The State of AI in China
China is rapidly advancing in artificial intelligence due to its centralized government control, which allows for extensive data use and minimal regulation. However, it faces significant challenges from import restrictions, investment limitations, and sanctions imposed by other countries. Despite these obstacles, China is working on developing domestic AI technologies and has shown progress with an open Chinese AI model topping recent leaderboards.
https://societysbackend.com/p/the-state-of-ai-in-china
cosmic-cortex/mlfz: An educational machine learning library.
The mlfz library is an educational tool for learning machine learning by examining simple, reference implementations of algorithms. Users are encouraged to explore the source code and documentation, which serves as an interactive textbook on machine learning. Contributions are welcome, and the creator offers a Mathematics of Machine Learning book for further support.
https://github.com/cosmic-cortex/mlfzÂ
3 things parents and students told us about how generative AI can support learning
Google's research shows generative AI provides real-time feedback for parents, enhances learning on new subjects, and customizes pathways for students with learning differences. Teachers remain essential, with AI supporting rather than replacing them. Google prioritizes collaboration with educators to ensure AI tools are effective and beneficial.
Announcing the SCALE BETA
SCALE is a GPGPU toolkit that allows CUDA programs to run on AMD GPUs. It aims to make GPU programming more flexible, supporting multiple hardware vendors. Try the free SCALE beta and use one codebase for different GPUs.
https://scale-lang.com/posts/2024-07-12-release-announcementÂ
The Engineer’s Guide To Deep Learning
The Engineer's Guide To Deep Learning discusses the current golden age of AI, focusing on the transformative impact of the Transformer model introduced in 2017. The guide aims to help engineers quickly grasp the concepts of neural networks, RNNs, NLP, and the Transformer through concise explanations and practical Python code examples. The author, Hironobu SUZUKI, shares his expertise in software engineering and AI, emphasizing the importance of understanding these advanced technologies for future breakthroughs.
https://www.interdb.jp/dl/index.htmlÂ
Keep reading with a 7-day free trial
Subscribe to Society's Backend to keep reading this post and get 7 days of free access to the full post archives.