Pelosi opposes SB 1047, New LLM Training Paradigms, Prompt Caching to Save 90% of API Costs, and More
Machine learning resources and updates 8/19/2024
Here are the most important machine learning resources and updates from the past week. Follow me on X and/or LinkedIn for more frequent posts and updates. You can find last weekโs updates here:
Pelosi Statement in Opposition to California Senate Bill 1047
How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models
xAI introduces Grok-2 | Stronger than Claude 3.5 Sonnet!? (Tested)
How to jailbreak a million multi-modal AI Agents Exponentially Fast [Breakdowns]
Pelosi Statement in Opposition to California Senate Bill 1047
Nancy Pelosi opposes California Senate Bill 1047, believing it is well-intentioned but misguided. She argues it could harm AI innovation and suggests seeking advice from top AI experts like Fei-Fei Li. Pelosi urges comprehensive review of all AI proposals to find the best path forward for California.
This is huge. So far the SU government has been large AI-opposed instead of seeking a better understanding of the impact of regulation from AI experts. We need governing bodies that are educated a regulate properly.
๐ฅTop ML Papers of the Week
As always, Iโm including the top ML papers of the week:
This week's top ML papers introduce groundbreaking AI models like The AI Scientist, which can independently write and review scientific papers, and Grok-2, which excels in code, math, and reasoning tasks. LongWriter's AgentWrite enables LLMs to generate long texts coherently, while EfficientRAG improves information retrieval through iterative chunk tagging. Additionally, rStar enhances small language model reasoning via self-play mutual reasoning, and MedGraphRAG uses a graph-based framework to boost precision in medical Q&A tasks.
New LLM Pre-training and Post-training Paradigms
The article reviews recent advancements in pre-training and post-training methods for large language models (LLMs). It examines the pipelines of new models from Alibaba, Apple, Google, and Meta. All these models use multi-stage pre-training and specific post-training optimizations.
When looking for the most effective research for training machine learning models, it pays to check what top companies are doing. Theyโve done extensive research to understand the best methods.
One Year of Society's Backend
Societyโs Backend is one year old! I went through what I learned, what I would do differently, and why I think writing is so beneficial in my article last week.
Nous Hermes 3 and exploiting underspecified evaluations
OpenAI, Google, Anthropic, and others are pushing the boundaries of language models, but the criteria to join their elite ranks remain unclear. Nous Research's Hermes 3 models aim to compete by fine-tuning existing models, yet their evaluation methods and results lack transparency, raising questions about their true capabilities. Despite this, Hermes 3 shows promise for general chat and role-playing, though its performance lags behind established models like Llama 3.1.
How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models
NVIDIA is using structured weight pruning and knowledge distillation to make large language models like Llama smaller and more efficient. They announced Llama 3.1, featuring their largest model yet and two smaller, more deployable models. This research aims to make powerful AI more accessible without starting new models from scratch.
xAI introduces Grok-2 | Stronger than Claude 3.5 Sonnet!? (Tested)
xAI has introduced a new AI model called Grok-2. It is claimed to be stronger than Claude 3.5 Sonnet. The model's performance was tested and discussed.
Prompt caching with Claude
Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more background knowledge and example outputsโall while reducing costs by up to 90% and latency by up to 85% for long prompts.
We need cheaper AI for it to be more useful and used more widely. Prompt caching is an effective and relatively simple way of achieving this.
Unreasonably Effective AI with Demis Hassabis
In a YouTube video by Google DeepMind, Demis Hassabis discusses the impressive capabilities of AI. He explains how AI can solve complex problems and learn tasks effectively. The video highlights AI's potential to impact various fields positively.
How to jailbreak a million multi-modal AI Agents Exponentially Fast [Breakdowns]
The article discusses how injecting a single adversarial image into one AI agent's memory can cause a chain reaction, spreading malicious influence to millions of other agents exponentially fast. The technique exploits adversarial perturbations, which are small, imperceptible changes to images that can trick AI classifiers. The author emphasizes understanding the math behind these perturbations and explores various methods, including gradient-based techniques and evolutionary algorithms, to improve future attacks.
Introducing SWE-bench Verified
SWE-bench Verified is a new version of the SWE-bench benchmark, which evaluates large language models on real-world software issues from GitHub. The update addresses previous difficulties by removing ambiguous or unsolvable tasks, making evaluations more accurate. The new dataset was verified by professional developers and shows improved performance for models like GPT-4.
An argument for logging off
The author argues that focusing on things within our control reduces stress and anxiety. Limiting exposure to irrelevant information can improve mental well-being. Expanding one's influence or engaging in information for entertainment can also be beneficial.
I included this because it explains very clearly something Iโve been thinking about for a long time. Itโs something I think it worth reading for anyone to reduce the stress in their life.
Keep reading with a 7-day free trial
Subscribe to Society's Backend to keep reading this post and get 7 days of free access to the full post archives.