r/MachineLearning · 5h ago · 6 · research

A researcher shares a survey on weight-space learning—an emerging field focused on learning and reasoning directly in neural network parameter spaces rather than just input-output behavior. The post includes a pointer to a comprehensive arxiv survey and expresses interest in connecting with others working on related research problems.

r/MachineLearning · 7h ago · 6 · benchmark dataset open source

A multilingual speech language models challenge covering speaker diarization, ASR, and conversational understanding across 14 languages with 2,100 hours of free dataset. Two tracks focus on speech recognition/diarization and semantic understanding through QA, with practical experience building production speech systems.

r/MachineLearning · 9h ago · 6 · prompt engineering research

Reddit discussion exploring why LLMs express reasoning through natural language chains-of-thought rather than operating directly in latent vector space, and the tradeoffs between vector-based and language-based reasoning for interpretability, efficiency, and task performance. Touches on practical considerations for model architecture and reasoning transparency that are relevant to LLM engineering but lacks concrete technical solutions or research findings.

r/MachineLearning · 12h ago · 8 · benchmark research open source

New structured output benchmark that measures value accuracy and faithfulness beyond just JSON schema validation, revealing significant gaps between schema compliance (90%+) and actual value correctness across all models. Includes comprehensive evaluation framework with 7 key metrics across text, image, and audio modalities, with open-source code and leaderboard showing GPT-4 leading and GLM-4 performing competitively.

Anthropic Blog · 15h ago · 7 · tool api update workflow open source

Anthropic released Claude connectors for creative tools including Blender, Autodesk, Adobe, Ableton, and Splice, built on the Model Context Protocol (MCP) standard. These connectors enable Claude to integrate directly with professional creative software, allowing developers to build AI-assisted workflows for 3D modeling, design, music production, and related tasks. The MCP-based approach ensures compatibility across multiple LLMs and emphasizes interoperability.

r/MachineLearning · 17h ago · 7 · tool visualization optimization

Interactive browser-based tool for visualizing neural network loss landscapes using dimensionality reduction techniques from Li et al. (NeurIPS 2018), allowing users to experiment with different architectures (MLPs to ResNet-8) and optimizers to understand how they navigate high-dimensional optimization spaces. Provides practical intuition-building for understanding local minima geometry and optimizer behavior, though acknowledges limitations of 2D/3D projections for representing true high-dimensional surfaces.

r/LocalLLaMA · 18h ago · 8 · new model inference benchmark

NVIDIA released Nemotron 3 Nano Omni, a 31B multimodal model combining video, audio, image, and text understanding using a Mamba2-Transformer hybrid MoE architecture. Available commercially on Hugging Face/NGC with practical deployment guidance including vLLM 0.20.0+ requirements and ~62GB VRAM needs for inference.

HuggingFace Blog · 18h ago · 8 · new model inference agent open source

NVIDIA released Nemotron 3 Nano Omni, a multimodal model designed for efficient processing of documents, audio, video, and GUI-based agentic tasks with 7.4-9.2x higher system efficiency than comparable models. The 30B model uses Mamba state-space layers, MoE routing, and grouped-query attention to handle long-context reasoning across modalities while maintaining low latency for interactive workloads.

r/LocalLLaMA · 18h ago · 9 · new model open source agent inference benchmark

Ling-2.6-flash, a 104B parameter model with 7.4B active parameters, is now open-source and optimized for agent workloads with hybrid linear attention (MLA + Lightning Linear) and sparse MoE architecture. The model achieves 4× throughput improvements over comparable models while reducing token consumption—a critical optimization for production agent deployments where token costs are a major barrier.

r/MachineLearning · 19h ago · 5 · agent workflow

A developer shares an experiment comparing two iterations of an AI agent playing Dark Hex against itself, with a Colab notebook for reproducibility. While it demonstrates agent training/iteration workflows, it lacks technical depth on the methodology, model architecture, or learnings that would be immediately useful for other builders.

r/MachineLearning · 21h ago · 7 · library open source tool inference

Dynabatch is a PyTorch sampler that dynamically adjusts batch sizes based on sequence lengths using XGBoost to predict GPU memory pressure, achieving 3.3x throughput improvement on encoder-decoder models like NLLB-200. The tool uses a practical approach of sorting by token length and selecting optimal batch sizes within memory constraints, with built-in fallbacks for OOM errors.

Simon Willison · 1d ago · 7 · tool workflow deployment

pip 26.1 introduces lockfile support (pylock.toml) for reproducible Python dependency management and dependency cooldowns via --uploaded-prior-to flag, enabling engineers to pin packages to versions older than a specified number of days for stability. These features are particularly useful for AI/ML projects that depend on packages like Datasette and LLM, improving dependency reproducibility in production environments.

Simon Willison · 1d ago · 6 · new model open source fine tuning research

Talkie is a 13B language model trained exclusively on pre-1931 English text, with both base and instruction-tuned variants available under Apache 2.0 license. The project demonstrates novel approaches to training on out-of-copyright data and addresses contamination challenges, though the chat version relies on modern LLMs (Claude) for preference optimization, creating an interesting tension between data purity and practical fine-tuning.

r/LocalLLaMA · 1d ago · 7 · research benchmark training

Researchers trained 'vintage' language models on historical text (pre-1931) to study how LMs understand time, predict future events, and generate novel ideas. They evaluate these models on tasks like forecasting historical surprises and coding problems, providing insights into model capabilities and scaling behavior across different knowledge cutoffs.

HuggingFace Blog · 1d ago · 7 · new model open source deployment inference

NVIDIA and Siemens Healthineers released NV-Raw2Insights-US, an AI model that reconstructs ultrasound images directly from raw sensor data instead of traditional beamforming pipelines, enabling personalized speed-of-sound correction in real-time. The system uses Holoscan Sensor Bridge (open-source FPGA IP) to stream high-bandwidth ultrasound data to GPUs, demonstrating an end-to-end AI approach to medical imaging that learns adaptive physics-aware transformations for each patient.

OpenAI Blog · 1d ago · 6 · api update deployment

OpenAI's GPT models and Codex are now accessible through AWS, allowing developers to integrate these models within AWS infrastructure for enterprise deployments. This is primarily a deployment/infrastructure announcement rather than a technical capability breakthrough, but relevant for engineers deploying AI applications in AWS environments.