Society's Backend

Society's Backend

Share this post

Society's Backend
Society's Backend
Pelosi opposes SB 1047, New LLM Training Paradigms, Prompt Caching to Save 90% of API Costs, and More
Copy link
Facebook
Email
Notes
More
ML for SWEs

Pelosi opposes SB 1047, New LLM Training Paradigms, Prompt Caching to Save 90% of API Costs, and More

Machine learning resources and updates 8/19/2024

Logan Thorneloe's avatar
Logan Thorneloe
Aug 19, 2024
∙ Paid
9

Share this post

Society's Backend
Society's Backend
Pelosi opposes SB 1047, New LLM Training Paradigms, Prompt Caching to Save 90% of API Costs, and More
Copy link
Facebook
Email
Notes
More
1
Share
Generated using FLUX.1 via Grok

Here are the most important machine learning resources and updates from the past week. Follow me on X and/or LinkedIn for more frequent posts and updates. You can find last week’s updates here:

Godmother of AI Warns Against Regulation, Apple Intelligence System Prompts, Faster and Cheaper AI, and More

Godmother of AI Warns Against Regulation, Apple Intelligence System Prompts, Faster and Cheaper AI, and More

Logan Thorneloe
·
August 12, 2024
Read full story

Support the Society's Backend community for just $1/mo to get the full list each week. Society's Backend is reader-supported. Thanks to all paying subscribers! 😊

  1. Pelosi Statement in Opposition to California Senate Bill 1047

  2. 🥇Top ML Papers of the Week

  3. New LLM Pre-training and Post-training Paradigms

  4. One Year of Society's Backend

  5. Nous Hermes 3 and exploiting underspecified evaluations

  6. How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models

  7. xAI introduces Grok-2 | Stronger than Claude 3.5 Sonnet!? (Tested)

  8. Prompt caching with Claude

  9. Unreasonably Effective AI with Demis Hassabis

  10. How to jailbreak a million multi-modal AI Agents Exponentially Fast [Breakdowns]

  11. Introducing SWE-bench Verified

  12. An argument for logging off


Pelosi Statement in Opposition to California Senate Bill 1047

Nancy Pelosi opposes California Senate Bill 1047, believing it is well-intentioned but misguided. She argues it could harm AI innovation and suggests seeking advice from top AI experts like Fei-Fei Li. Pelosi urges comprehensive review of all AI proposals to find the best path forward for California.

This is huge. So far the SU government has been large AI-opposed instead of seeking a better understanding of the impact of regulation from AI experts. We need governing bodies that are educated a regulate properly.

source


🥇Top ML Papers of the Week

As always, I’m including the top ML papers of the week:

This week's top ML papers introduce groundbreaking AI models like The AI Scientist, which can independently write and review scientific papers, and Grok-2, which excels in code, math, and reasoning tasks. LongWriter's AgentWrite enables LLMs to generate long texts coherently, while EfficientRAG improves information retrieval through iterative chunk tagging. Additionally, rStar enhances small language model reasoning via self-play mutual reasoning, and MedGraphRAG uses a graph-based framework to boost precision in medical Q&A tasks.

source


New LLM Pre-training and Post-training Paradigms

The article reviews recent advancements in pre-training and post-training methods for large language models (LLMs). It examines the pipelines of new models from Alibaba, Apple, Google, and Meta. All these models use multi-stage pre-training and specific post-training optimizations.

When looking for the most effective research for training machine learning models, it pays to check what top companies are doing. They’ve done extensive research to understand the best methods.

source


One Year of Society's Backend

Society’s Backend is one year old! I went through what I learned, what I would do differently, and why I think writing is so beneficial in my article last week.

source


Nous Hermes 3 and exploiting underspecified evaluations

OpenAI, Google, Anthropic, and others are pushing the boundaries of language models, but the criteria to join their elite ranks remain unclear. Nous Research's Hermes 3 models aim to compete by fine-tuning existing models, yet their evaluation methods and results lack transparency, raising questions about their true capabilities. Despite this, Hermes 3 shows promise for general chat and role-playing, though its performance lags behind established models like Llama 3.1.

source


How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models

NVIDIA is using structured weight pruning and knowledge distillation to make large language models like Llama smaller and more efficient. They announced Llama 3.1, featuring their largest model yet and two smaller, more deployable models. This research aims to make powerful AI more accessible without starting new models from scratch.

source


xAI introduces Grok-2 | Stronger than Claude 3.5 Sonnet!? (Tested)

xAI has introduced a new AI model called Grok-2. It is claimed to be stronger than Claude 3.5 Sonnet. The model's performance was tested and discussed.


Prompt caching with Claude

Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more background knowledge and example outputs—all while reducing costs by up to 90% and latency by up to 85% for long prompts.

We need cheaper AI for it to be more useful and used more widely. Prompt caching is an effective and relatively simple way of achieving this.

source


Unreasonably Effective AI with Demis Hassabis

In a YouTube video by Google DeepMind, Demis Hassabis discusses the impressive capabilities of AI. He explains how AI can solve complex problems and learn tasks effectively. The video highlights AI's potential to impact various fields positively.


How to jailbreak a million multi-modal AI Agents Exponentially Fast [Breakdowns]

The article discusses how injecting a single adversarial image into one AI agent's memory can cause a chain reaction, spreading malicious influence to millions of other agents exponentially fast. The technique exploits adversarial perturbations, which are small, imperceptible changes to images that can trick AI classifiers. The author emphasizes understanding the math behind these perturbations and explores various methods, including gradient-based techniques and evolutionary algorithms, to improve future attacks.

source


Introducing SWE-bench Verified

SWE-bench Verified is a new version of the SWE-bench benchmark, which evaluates large language models on real-world software issues from GitHub. The update addresses previous difficulties by removing ambiguous or unsolvable tasks, making evaluations more accurate. The new dataset was verified by professional developers and shows improved performance for models like GPT-4.

source


An argument for logging off

The author argues that focusing on things within our control reduces stress and anxiety. Limiting exposure to irrelevant information can improve mental well-being. Expanding one's influence or engaging in information for entertainment can also be beneficial.

I included this because it explains very clearly something I’ve been thinking about for a long time. It’s something I think it worth reading for anyone to reduce the stress in their life.

source

Keep reading with a 7-day free trial

Subscribe to Society's Backend to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Logan Thorneloe
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More