r/MachineLearning · 3h ago · 6 · workflow open source

A developer seeks architectural patterns for organizing benchmark infrastructure using type-safe data structures (Dataclasses/Pydantic) to manage datasets, task schemas, and experiment composition. While this is a practical engineering question rather than news, it reflects real challenges in building reproducible ML benchmarks and may surface useful open-source projects or design patterns worth studying.

r/MachineLearning · 3h ago · 6 · research benchmark

A technical critique of the 2024 'Ingenia Theorem' paper claiming AGI via ML is impossible, identifying a critical flaw: the proof equivocates between 'human-level classifier' and 'all polytime-sampleable distributions,' which would absurdly prove ImageNet classification is intractable. This is relevant for understanding the theoretical foundations and limitations arguments in AI/ML research.

r/MachineLearning · 4h ago · 5 · tutorial workflow

A developer discusses choosing between logistic regression and tree-based models (random forests) for a UFC fight prediction project, noting that MMA statistics exhibit nonlinear relationships and feature interactions that logistic regression may miss. The post highlights practical ML modeling decisions around feature engineering and model selection for binary classification with domain-specific constraints like betting value optimization.

r/LocalLLaMA · 5h ago · 9 · new model inference tool deployment

Ovis2.6-80B-A3B is a new multimodal LLM featuring a Mixture-of-Experts architecture with 80B total parameters but only ~3B active during inference, offering strong performance with low serving costs. Key improvements include 64K context window, up to 2880×2880 image resolution support, active visual reasoning via "Think with Image" capability, and enhanced OCR/document understanding—with practical implementation examples provided.

r/MachineLearning · 6h ago · 7 · research library open source

A novel Vision Transformer backbone using block-sparse core-periphery attention that reduces complexity from O(N²) to O(2NC + C²), trained with nested dropout for elastic inference-time cost adjustment. Achieves competitive accuracy with DINOv3 while maintaining stability across resolutions (256-1024) and demonstrates interesting emergent attention patterns.

r/MachineLearning · 7h ago · 8 · research fine tuning prompt engineering workflow

Fast-Slow Training (FST) combines in-context learning via optimized prompts (fast weights) with parameter updates (slow weights) to achieve 3x better sample efficiency than pure RL while reducing catastrophic forgetting and preserving model plasticity. This dual-timescale approach maintains closer alignment to base models while enabling effective continual learning across multiple tasks.

r/MachineLearning · 9h ago · 6 · rag workflow deployment

Post sharing conference decks from Knowledge Graph Conference highlighting production enterprise systems (Bloomberg, AbbVie, Morgan Stanley) using knowledge graphs as reasoning infrastructure rather than retrieval layers, demonstrating real compliance and governance implementations where KGs serve as source-of-truth with LLM interfaces.

Latent Space · 15h ago · 7 · fine tuning benchmark agent open source research api update

OpenAI is deprecating fine-tuning APIs, shifting the AI engineering landscape toward open models, longer context windows, and agentic systems. The piece covers emerging research benchmarks (FrontierMath, medical evals), agentic breakthroughs in math/physics/coding, and the practical move away from proprietary model fine-tuning toward prompt engineering and open-source RLFT alternatives.

r/MachineLearning · 18h ago · 8 · open source tutorial library

A minimal 160-200 line PyTorch implementation of JEPA (Joint-Embedding Predictive Architecture) algorithms that strips away scaling complexities to expose core mathematical concepts. Includes tutorial documentation mapping algorithm theory directly to implementation, making it valuable for understanding self-supervised learning approaches.

Simon Willison · 1d ago · 9 · api update inference tool

OpenAI's reasoning-capable models now use a new /v1/responses endpoint instead of /v1/chat/completions, enabling interleaved reasoning across tool calls for GPT-5 class models. Developers can now view summarized reasoning tokens in their prompts with new command flags (-R/--hide-reasoning) to control visibility.

r/MachineLearning · 1d ago · 6 · tool rag workflow

A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.

r/MachineLearning · 1d ago · 9 · new model benchmark inference open source

TabPFN-3 releases a major tabular foundation model update enabling 1M-row inference on single H100s with 10-1000x faster inference and a novel thinking mode for test-time compute optimization. The model achieves 93% win rate over classical ML and demonstrates significant improvements in speed, scale, and multi-class support through architectural innovations like row-chunked inference and KV caching.

r/MachineLearning · 1d ago · 7 · research benchmark open source

Research revealing that the ratio of MLP to attention spectral norms in decoder transformers predicts rank collapse in final layers, with optimal stability maintained at 0.5-2 ratio. This provides actionable guidance for model architecture design and debugging, with an accompanying open-source implementation for analysis.

r/LocalLLaMA · 1d ago · 6 · tool open source benchmark

An open-source evaluation tool for distributed LLM assessment that supports multiple grading methods (LLM-based, regex, custom scripts) and distributes tasks across machines. The tool enables engineers to evaluate model outputs at scale, though discussions highlight concerns about LLM self-grading reliability and regex false-negatives.

r/MachineLearning · 1d ago · 6 · tool benchmark inference

Engineer seeks specialized cache simulation tools for LLM prompt caching workloads with multi-tier hierarchies, token-weighted objects, and edit-driven traces—current options like libCacheSim don't model the cost/residency structure of systems like Anthropic's tiered prompt cache. This is a technical community question surfacing a real gap in tooling for LLM inference optimization and cache policy research.

Latent Space · 1d ago · 9 · new model benchmark inference

Thinking Machines released TML-Interaction-Small, a 276B parameter MoE model optimized for real-time multimodal interaction with <200ms latency, featuring encoder-free early fusion and novel benchmarks (TimeSpeak, CueSpeak, RepCount-A, ProactiveVideoQA) designed to measure continuous, simultaneous interaction capabilities that exceed GPT-4o Realtime and Gemini 3.1-Flash on audio/visual tasks. The approach prioritizes time-aligned microturns and synchronized audio-visual processing, advancing the practical implementation of responsive voice AI systems.

OpenAI Blog · 1d ago · 5 · workflow api update

AutoScout24 Group's case study demonstrates practical applications of Codex and ChatGPT for accelerating development workflows and code quality improvements. While showing real-world AI integration in software teams, the content is primarily business-focused with limited technical depth on implementation details or novel engineering techniques.

OpenAI Blog · 1d ago · 7 · benchmark research agent

Parameter Golf is a competition framework that challenged 1,000+ participants to optimize ML research, coding agents, and model design under computational constraints, covering practical techniques like quantization and efficient model architectures. The large submission volume suggests useful real-world patterns and techniques emerged for building efficient AI systems.