AIPULSE
Live

Discover

Latest research papers and top Hacker News stories

Latest Papers24
1
RoboticsAIVisioncs.MA

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent. To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driv

Zehao Wang, Huaide Jiang, Shuaiwu Dong·1 day ago
2
AINLPAIcs.IR

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed alongside the original corpus. Because the method modifies only the corpus, it can be applied once

Yuxing Lu, Xukai Zhao, Wei Wu·1 day ago
3
VisionAIVision

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the generation history through a novel three-partition KV-cache strategy. Specifically, we categorize the historical context into three distinct types: (1) Sink tokens, which preserve early anchor frames at full resolution to maintain global semantics; (2) Mid tokens, which achieve

Xiaofeng Mao, Shaohao Rui, Kaining Ying·1 day ago
4
VisionAIVision

PixelSmile: Toward Fine-Grained Facial Expression Editing

Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a diffusion framework that disentangles expression semantics via fully symmetric joint training. PixelSmile combines intensity supervision with contrastive learning to produce stronger and more distinguis

Jiabin Hua, Hengyuan Xu, Aojie Li·1 day ago
5
AIAIcs.MM

Back to Basics: Revisiting ASR in the Age of Voice Agents

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which conditions, in which languages, will cause what degree of degradation. We introduce WildASR, a multilingual (four-language) diagnostic benchmark sourced entirely from real human speech that factorizes ASR robustness along three axes: environmental degradation, demographic shift, and l

Geeyang Tay, Wentao Ma, Jaewon Lee·1 day ago
6
NLPAINLP

Natural-Language Agent Harnesses

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externalized as a portable executable artifact. We introduce \textbf{Natural-Language Agent Harnesses} (NLAHs), which express harness behavior in editable natural language, and \textbf{Intelligent Harness Runtime} (IHR), a shared runtime that executes these harnesses through explicit contr

Linyue Pan, Lexiao Zou, Shuo Guo·1 day ago
7
VisionMLVision

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard negative samples. Hard negatives have been shown to improve performance on compositionality tasks, but are often specific to a single benchmark, do not generalize, and can cause substantial degradation of basic V&L capabilities such as zero-shot or retrieval performance,

Hai X. Pham, David T. Hoffmann, Ricardo Guerrero·1 day ago
8
AIVisionAI

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify systematic biases, we show that cross-modal inconsistency provides a rich and natural signal for learning. We introduce RC2, a reinforcement learning framework that resolves internal conflicts by enforcing cross-modal cycle consistency. By requiring a model to perform backward in

Zirui Zhang, Haoyu Dong, Kexin Pei·1 day ago
9
AIMLAIcs.AR

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~1, the pipeline decomposes a design into sub-kernels, independently optimizes each using pragma and code-level transformations, and formulates an Integer Linear Program (ILP) to assemble globally promising configurations under an area constraint. In Stage~2, it launches $N$ exper

Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri·1 day ago
10
VisionAIVision

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing subjects. To address this, we introduce Hybrid Memory, a novel paradigm requiring models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects, ensuring motion continuity during out-of-view intervals. To facilitate research in this direction, we

Kaijin Chen, Dingkang Liang, Xin Zhou·1 day ago
11
MLAIML

Neural Network Conversion of Machine Learning Pipelines

Transfer learning and knowledge distillation has recently gained a lot of attention in the deep learning community. One transfer approach, the student-teacher learning, has been shown to successfully create ``small'' student neural networks that mimic the performance of a much bigger and more complex ``teacher'' networks. In this paper, we investigate an extension to this approach and transfer from a non-neural-based machine learning pipeline as teacher to a neural network (NN) student, which would allow for joint optimization of the various pipeline components and a single unified inference e

Man-Ling Sung, Jan Silovsky, Man-Hung Siu·1 day ago
12
AIcs.SE

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an LLM agent exercises that surface as a synthetic power user at 1,000x human cadence; (3) Unbeatable Tests, ground-truth verification the code author cannot fake; and (4) Drift Control, continuous quality measurement with automated pause gates. We validate across two production syst

Yannick Roy·1 day ago
13
MLAIMLcs.AR

A Unified Memory Perspective for Probabilistic Trustworthy AI

Trustworthy artificial intelligence increasingly relies on probabilistic computation to achieve robustness, interpretability, security and privacy. In practical systems, such workloads interleave deterministic data access with repeated stochastic sampling across models, data paths and system functions, shifting performance bottlenecks from arithmetic units to memory systems that must deliver both data and randomness. Here we present a unified data-access perspective in which deterministic access is treated as a limiting case of stochastic sampling, enabling both modes to be analyzed within a c

Xueji Zhao, Likai Pei, Jianbo Liu·1 day ago
14
MLML

On Neural Scaling Laws for Weather Emulation through Continual Training

Neural scaling laws, which in some domains can predict the performance of large neural networks as a function of model, data, and compute scale, are the cornerstone of building foundation models in Natural Language Processing and Computer Vision. We study neural scaling in Scientific Machine Learning, focusing on models for weather forecasting. To analyze scaling behavior in as simple a setting as possible, we adopt a minimal, scalable, general-purpose Swin Transformer architecture, and we use continual training with constant learning rates and periodic cooldowns as an efficient training strat

Shashank Subramanian, Alexander Kiefer, Arnur Nigmetov·1 day ago
15
VisionAIVision

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties performance to large batches and hard negative mining, and it ignores both the geometric structure of maps and the coverage mismatch between street-view and overhead imagery. In particular, salient landmarks visible from the street view can fall outside a fixed satellite crop, makin

Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz·1 day ago
16
NLPAINLPcs.CY

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-irrelevant factors (i.e., features of responses that are unrelated to the construct assessed) and adversarial conditions. Given the rising usage of large language models in automated scoring systems, there is a renewed focus on ``hallucinations'' and the robustness of these LLM-ba

Cole Walsh, Rodica Ivan·1 day ago
17
MLML

Longitudinal Digital Phenotyping for Early Cognitive-Motor Screening

Early detection of atypical cognitive-motor development is critical for timely intervention, yet traditional assessments rely heavily on subjective, static evaluations. The integration of digital devices offers an opportunity for continuous, objective monitoring through digital biomarkers. In this work, we propose an AI-driven longitudinal framework to model developmental trajectories in children aged 18 months to 8 years. Using a dataset of tablet-based interactions collected over multiple academic years, we analyzed six cognitive-motor tasks (e.g., fine motor control, reaction time). We appl

Diego Jimenez-Oviedo, Ruben Vera-Rodriguez, Ruben Tolosana·1 day ago
18
MLMLcs.SE

Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring

Safety monitoring is essential for Cyber-Physical Systems (CPSs). However, unsafe events are rare in real-world CPS operations, creating an extreme class imbalance that degrades safety predictors. Standard rebalancing techniques perform poorly on time-series CPS telemetry, either generating unrealistic synthetic samples or overfitting on the minority class. Meanwhile, behavioral uncertainty in CPS operations, defined as the degree of doubt or uncertainty in CPS decisions , is often correlated with safety outcomes but unexplored in safety monitoring. To that end, we propose U-Balance, a supervi

John Ayotunde, Qinghua Xu, Guancheng Wang·1 day ago
19
RoboticsAIRoboticscs.HC

A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots

This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms. By holding behavior constant while varying the explanatory frame, the platform provides a controlled way to investigate how language and framing shape the adoption of the intentional stance in robotics.

Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier·2 days ago
20
NLPAIMLcs.CY

Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in acad

Mingmeng Geng, Yuhang Dong, Thierry Poibeau·2 days ago
21
MLML

Anchored-Branched Steady-state WInd Flow Transformer (AB-SWIFT): a metamodel for 3D atmospheric flow in urban environments

Air flow modeling at a local scale is essential for applications such as pollutant dispersion modeling or wind farm modeling. To circumvent costly Computational Fluid Dynamics (CFD) computations, deep learning surrogate models have recently emerged as promising alternatives. However, in the context of urban air flow, deep learning models struggle to adapt to the high variations of the urban geometry and to large mesh sizes. To tackle these challenges, we introduce Anchored Branched Steady-state WInd Flow Transformer (AB-SWIFT), a transformer-based model with an internal branched structure uniq

Armand de Villeroché, Rem-Sophia Mouradi, Vincent Le Guen·2 days ago
22
AIAI

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship using the GSM8K and MATH subsets of PROCESSBENCH, a human-annotated benchmark for identifying the earliest erroneous step in mathematical reasoning. We evaluate two LLM-based math tutor agent settings, instantiated with GPT-4 and GPT-5, in two independent tasks on the same math pro

Liang Zhang, Yu Fu, Xinyi Jin·2 days ago
23
VisionMLVision

LanteRn: Latent Visual Structured Reasoning

While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches take steps toward thinking with images by invoking tools or generating intermediate images, they either rely on external modules, or incur unnecessary computation by reasoning directly in pixel space. In this paper, we introduce LanteRn, a framework that enables LMMs to interlea

André G. Viveiros, Nuno Gonçalves, Matthias Lindemann·2 days ago
24
AIcs.HC

Visual or Textual: Effects of Explanation Format and Personal Characteristics on the Perception of Explanations in an Educational Recommender System

Explanations are central to improving transparency, trust, and user satisfaction in recommender systems (RS), yet it remains unclear how different explanation formats (visual vs. textual) are suited to users with different personal characteristics (PCs). To this end, we report a within-subject user study (n=54) comparing visual and textual explanations and examine how explanation format and PCs jointly influence perceived control, transparency, trust, and satisfaction in an educational recommender system (ERS). Using robust mixed-effects models, we analyze the moderating effects of a wide rang

Qurat Ul Ain, Mohamed Amine Chatti, Nasim Yazdian Varjani·2 days ago
Hacker News AI24
1
426

My minute-by-minute response to the LiteLLM malware attack

HN·2 days ago
2
359

Anatomy of the .claude/ folder

HN·about 20 hours ago
3
321

AI got the blame for the Iran school bombing. The truth is more worrying

HN·about 18 hours ago
4
319

Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layer

HN·1 day ago
5
256

We rewrote JSONata with AI in a day, saved $500k/year

HN·1 day ago
6
229

HyperAgents: Self-referential self-improving agents

HN·4 days ago
7
206

Iran-linked hackers breach FBI director's personal email

HN·about 20 hours ago
8
182

Everything old is new again: memory optimization

HN·5 days ago
9
148

DOJ confirms FBI Director Kash Patel's personal email was hacked

HN·about 13 hours ago
10
122

Agent-to-agent pair programming

HN·1 day ago
11
108

Anthropic Subprocessor Changes

HN·1 day ago
12
77

Netflix raises prices for every subscription tier by up to 12.5 percent

HN·about 19 hours ago
13
77

Claude loses its >99% uptime in Q1 2026

HN·about 20 hours ago
14
74

21,864 Yugoslavian .yu domains

HN·3 days ago
15
67

Chroma Context-1: Training a Self-Editing Search Agent

HN·1 day ago
16
53

Why are executives enamored with AI, but ICs aren't?

HN·about 12 hours ago
17
44

Show HN: Open-Source Animal Crossing–Style UI for Claude Code Agents

HN·about 18 hours ago
18
38

HandyMKV for MakeMKV and HandBrake Automation

HN·1 day ago
19
29

AI bug reports went from junk to legit overnight, says Linux kernel czar

HN·about 14 hours ago
20
23

CERN uses tiny AI models burned into silicon for real-time LHC data filtering

HN·about 3 hours ago
21
21

Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)

HN·2 days ago
22
16

Memory chip stocks shed $100B as AI-driven shortage trade unwinds

HN·about 14 hours ago
23
13

Solving Semantle with the Wrong Embeddings

HN·4 days ago
24
13

Namespace: We've raised $23M to build the compute layer for code

HN·about 16 hours ago