r/MachineLearning · 2h ago · 6 · research fine tuning workflow

RPS (Regressive Plasticity Schedule) is a two-stage training approach combining curriculum learning with adaptive learning rate decay, showing improvements on ARC-AGI benchmarks and program synthesis tasks. The method trains models on easy data with high learning rates, then hard data with reduced learning rates, demonstrating 4% vs 2.4% performance gains over equal learning rate baselines.

r/MachineLearning · 3h ago · 7 · research inference open source

A proof-of-concept exploring inference-time learning within Mixture of Experts (MoE) architectures by inserting specialized expert modules that can update sibling expert weights dynamically. The work combines existing components in a novel way to enable adaptive behavior during inference, potentially useful for building more flexible AI systems without retraining.

r/MachineLearning · 3h ago · 6 · research inference

A Reddit discussion questioning why major AI labs haven't adopted adaptive/dynamic vision tokenization despite research showing potential efficiency gains. The post explores technical trade-offs like pipeline constraints requiring fixed token counts, uncertainty in scaling laws for adaptive methods, and whether marginal improvements justify implementation complexity.

Latent Space · 11h ago · 9 · new model research inference benchmark

OpenAI's general-purpose LLM achieved a novel research result on the Erdős unit distance problem through extended reasoning (125-page output), demonstrating that inference-time scaling enables frontier mathematical reasoning without domain-specific scaffolding. This validates test-time compute as a key scaling paradigm and suggests reasoning capabilities may generalize beyond competition math to open research problems.

r/MachineLearning · 14h ago · 8 · research agent open source benchmark

Research on masked diffusion language models (MDLMs) for world modeling in RL environments, addressing mode collapse and diversity limitations of autoregressive models. Introduces GRPO training framework with zero-shot transfer across multiple open-source environments and agent backbones, with open-sourced code and dataset of 239K trajectories.

r/MachineLearning · 22h ago · 8 · research benchmark inference

OpenAI's reasoning model discovered a counterexample to a long-standing conjecture in discrete geometry (Erdős's unit-distance problem), with the proof verified by an AI grading pipeline and human mathematicians. The result is technically significant for AI-for-science, but lacks crucial experimental details (model name, sampling strategy, compute budget, full pipeline specs) needed to assess whether this represents genuine autonomous research capability or selective reporting from extensive search.

Simon Willison · 1d ago · 6 · tool inference

Interactive tool that visualizes LLM token generation speeds (5-800 tokens/second) to help developers understand what different inference throughput claims actually feel like in practice. Useful for evaluating model performance claims and understanding real-world latency implications.

r/MachineLearning · 1d ago · 8 · agent inference deployment benchmark

Practical cost-optimization study comparing five LLMs (Opus, GPT-5, Sonnet, DeepSeek V4, Hunyuan) on an MCP-based file management agent across 500+ tool calls, revealing surprisingly small quality gaps (96-99% success) despite 10x price differences. Author deployed Hunyuan locally via MLX on M2 Ultra for $5.5k, reducing daily inference costs from $40 to $9 through intelligent routing (local/cheap API for routine tasks, expensive models for complex failures).

r/LocalLLaMA · 1d ago · 8 · new model tool inference open source deployment

Command A+ is a new 25B active parameter open-source MoE model from Cohere optimized for agentic and reasoning tasks with multimodal support. The article provides practical integration guides for Transformers, vLLM, SGLang, and Docker deployments, plus details on quantization options and model architecture including sparse MoE with 128 experts and multilingual support across 48 languages.

Simon Willison · 1d ago · 6 · new model agent deployment

Google I/O 2026 introduced Gemini 3.5 Flash and Gemini Spark, a new AI agent product integrating with Google Workspace apps, running on Gemini 3.5 Flash and a closed-source Go binary called Antigravity. Key technical consideration: Spark uses isolated ephemeral VMs with DLP policies for enterprise security, though the author notes this is a critical area given prompt injection risks with sensitive data flows.

r/MachineLearning · 1d ago · 8 · open source research library agent

Engineer open-sourced NOML, a custom RL algorithm for continuous control that addresses instability in flight simulation by combining anchor policy (safe action fallback), hierarchical actor architecture (independent MLP heads per control axis), and mirror learning for data efficiency. The approach diverges from standard TD3 by eliminating exploration noise while maintaining stability through structural constraints rather than reward shaping.

r/LocalLLaMA · 1d ago · 7 · inference optimization open source

Pull request discussion on implementing MTP (Multi-token prediction) speculative decoding for Gemma 4 models in llama.cpp, achieving >2x speedup on dense models with caveats around hardware compatibility and multi-GPU support. The thread documents real-world performance testing across different GPU setups, revealing variable results depending on hardware configuration and noting current limitations like broken multi-GPU support and incompatibility with quantized KV cache.

r/MachineLearning · 1d ago · 8 · agent prompt engineering research open source

CANTANTE is a novel framework that automates multi-agent LLM system configuration by solving the credit assignment problem, allowing per-agent prompt optimization from global task rewards rather than manual tuning. The approach outperforms DSPy baselines (GEPA, MIPROv2) by 12-19 points on standard benchmarks while maintaining inference costs, with open-source code available.

r/MachineLearning · 1d ago · 7 · tutorial research workflow

This article explains Riemannian optimization techniques for machine learning on manifolds (like hyperspheres), focusing on how to adapt gradient descent to preserve geometric constraints using exponential maps and retractions. It provides practical implementation guidance for constraining neural network parameters to stay on spherical manifolds, with code examples using PyTorch.

Latent Space · 1d ago · 9 · new model api update agent workflow

Google released Gemini 3.5 Flash (GA immediately) with 1M context window, 65k max output, and agentic/coding capabilities, plus the new Gemini Omni multimodal family for video/audio generation and editing. The stack includes expanded Antigravity agents across desktop/CLI/SDK/API, with Google reporting 3.2 quadrillion tokens/month processed and 900M+ monthly users.

OpenAI Research · 1d ago · 6 · research benchmark

OpenAI's model solved a long-standing discrete geometry problem (the unit distance conjecture), demonstrating AI capability in mathematical reasoning and proof generation. While impressive as a research milestone, this is primarily a mathematics/science application story rather than a technical advancement for building AI systems.

OpenAI Blog · 1d ago · 6 · workflow tool

Ramp shares their workflow using Codex (OpenAI's code model) integrated with GPT-5.5 for automated code review, reducing feedback cycles from hours to minutes. The article highlights practical implementation of AI-assisted code review as part of their development process, offering insights into how organizations can adopt similar AI-powered review systems.

Simon Willison · 1d ago · 9 · new model api update deployment inference

Google released Gemini 3.5 Flash to general availability with 1M input/65K output tokens, integrated into billions of consumer products, but at 3-6x higher pricing than previous Flash models ($1.50/$9 per million tokens). The release includes a new Interactions API (beta) for server-side history management and demonstrates industry-wide trend of pricing increases for new model releases across OpenAI, Anthropic, and Google.

r/LocalLLaMA · 1d ago · 6 · new model benchmark

Community discussion about HRM-Text, a new 1B parameter model with impressive benchmark claims. The post raises valid skepticism about the benchmarks and seeks technical explanation of the model's architecture and practical limitations for engineers evaluating whether to adopt it.