r/MachineLearning · 11h ago · 6 · rag workflow prompt engineering

Reddit discussion proposing a personalized cognitive profiling system that tracks not just facts but learning patterns, struggling points, and effective explanation styles to improve LLM context retrieval over time. The idea combines dynamic profiling with RAG-like personalization to create an evolving understanding of how individual users think, rather than basic chat memory.

r/MachineLearning · 12h ago · 7 · open source agent tool workflow

Spice is an open-source decision layer framework that sits above execution agents, providing context-aware task routing and decision-making through a perception → simulation → decision → execution → reflection loop. Rather than replacing agents like Claude or Codex, it adds orchestration capabilities including state modeling, option simulation, and outcome reflection to coordinate multi-agent workflows.

r/MachineLearning · 13h ago · 7 · research inference open source library

SM1 (Scalar Mamba1) implements a closed-form solution for state-space models with d_state=1 using pure PyTorch operations, eliminating the selective scan bottleneck and reducing memory by 16x compared to standard Mamba implementations. The author demonstrates practical benefits: training a 130M parameter model on MIDI data with minimal memory footprint (56KB state, no KV cache) on consumer hardware, highlighting that scalar state dimensions can be sufficient when token representations already encode structure.

r/MachineLearning · 13h ago · 7 · rag workflow benchmark

This post demonstrates practical RAG optimization techniques including tiered retrieval scoring, corpus-quality awareness metrics, and empirical results across three real-world datasets with varying content density. The author introduces a 'yield score' metric to predict generation quality and notes that semantic relevance still performs reasonably well even on thin, positioning-heavy corpora—a pattern RAG benchmarks typically don't account for.

Latent Space · 14h ago · 6 · agent workflow api update

Industry shift from models as primary product to agents as integrated systems combining models, harnesses, UI, and workflows. Major players (OpenAI, AI21, DeepSeek) are building dedicated agent teams and reducing standalone model focus, with concrete shipping examples like OpenAI's Codex updates and Claude's auto-mode expansion showing product differentiation moving beyond model quality alone.

r/MachineLearning · 14h ago · 7 · tutorial prompt engineering

A hands-on explanation of LLM architecture breaking down how token prediction works through embeddings, positional encoding, attention, and the LM Head—using a simple 4-sentence example to illustrate why models predict contextually appropriate tokens. Demystifies transformer mechanics by focusing on the core probability matching problem rather than advanced concepts, making it accessible for engineers learning from first principles.

r/LocalLLaMA · 15h ago · 8 · new model open source tool inference

LongCat-Video-Avatar 1.5 is an open-source framework for audio-driven human video generation with production-ready stability, supporting multiple input modalities (Audio-Text-to-Video, Audio-Text-Image-to-Video, Video Continuation) and compatible with Diffusers/Transformers libraries. The release includes comprehensive technical documentation, integration guides, and a detailed human evaluation benchmark across 6 application scenarios with both subjective and objective quality metrics.

r/MachineLearning · 18h ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

HuggingFace Blog · 18h ago · 8 · new model inference open source tool

NVIDIA introduces Nemotron-Labs Diffusion, a new family of diffusion language models that generate multiple tokens in parallel and iteratively refine them, addressing latency bottlenecks in autoregressive generation. These models offer 3x-4x speedups on modern GPUs, support multiple generation modes (autoregressive, diffusion, self-speculation), and are available in 3B-14B scales with open licensing and training code via Megatron framework.

Anthropic Research · 18h ago · 7 · new model benchmark tool

Anthropic's Project Glasswing has discovered 10,000+ high/critical vulnerabilities in critical infrastructure software using Claude Mythos Preview, demonstrating AI's capability in automated security testing at scale. The post discusses Mythos Preview's vulnerability detection performance, coordination challenges with the 90-day disclosure timeline, and implications for AI-assisted security workflows.

r/MachineLearning · 21h ago · 6 · inference deployment workflow

Discussion of whether to build a custom lightweight image encoder for video frame classification instead of using foundation models like CLIP/DINO, with focus on CPU inference speed and deployment constraints. The poster describes a practical pipeline processing video streams through embeddings into a small transformer, seeking guidance on whether custom training on domain-specific data (few million images, 4-5 labels) would improve both speed and accuracy versus established encoders.

HuggingFace Blog · 1d ago · 8 · fine tuning benchmark open source inference

Dharma released DharmaOCR, a pair of specialized 3B-parameter language models that outperform frontier APIs on structured OCR tasks while being significantly cheaper to operate, challenging the industry assumption that largest models are always best. The article explores how specialization, fine-tuning pipelines, and distributional alignment can yield better performance and cost-efficiency than scaling parameters, supported by benchmarks and research across multiple domains.

r/MachineLearning · 1d ago · 8 · new model open source tool deployment

NuExtract3 is a new 4B open-weight model (Apache-2.0) purpose-built for document understanding tasks like PDF extraction, table recognition, and structured data extraction from complex layouts. It's immediately practical with free HuggingFace space, multiple quantization options (GPTQ, W8A8, FP8, Q4, Q6), and low resource requirements (4GB VRAM), making it a viable local alternative to API-based document extraction pipelines.

r/MachineLearning · 1d ago · 7 · benchmark workflow agent

Community discussion identifying gaps between standard benchmarks and real-world AI system robustness, particularly around ambiguous intent, context handling, and multi-turn sessions. Highlights the disconnect between optimizing for clean evaluation metrics versus building production-resilient systems.

OpenAI Blog · 1d ago · 6 · tool deployment workflow

Virgin Atlantic leveraged OpenAI's Codex to accelerate mobile app development under tight deadline constraints, achieving high test coverage and production quality. The case study demonstrates practical application of AI code generation for shipping real-world products with strong quality metrics.

Latent Space · 1d ago · 7 · tool deployment agent inference

Daytona provides cloud-based sandboxed compute infrastructure optimized for AI agents, enabling stateful, instantly-spinnable environments that handle massive scale (850k+ sandboxes/day). The infrastructure supports agentic workflows requiring composable computers with dynamic resource scaling, bare-metal architecture, and instant startup times (~60ms), addressing the emerging market gap between traditional code execution and agent-specific compute needs.