AIPULSE
Live

Discover

Latest research papers and top Hacker News stories

Latest Papers24
1
AIcs.DC

Designing Datacenter Power Delivery Hierarchies for the AI Era

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an effici

Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare·3 days ago
2
NLPAIMLcs.DB

A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation

Distribution utilities are now expected to deliver bills that customers can actually read attach a defensible carbon number to every kWh sold and schedule load against grid stress and emissions constraints We propose an end-to-end framework that unifies four production-grade capabilities under one architectural roof a generative-AI agent that drafts each customers natural-language billing statement from structured numeric inputs under a constrained decoding policy a transformer-based forecaster that supplies the day-ahead consumption estimate with calibrated quantile bands

Pavan Manjunath, Thomas Pruefer·3 days ago
3
AINLPMLcs.CY

AI-Mediated Communication Can Steer Collective Opinion

Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions and shape individuals' opinions during human-AI interactions, less attention has been paid to its influence on collective opinion formation when mediating human-to-human communication. We address this gap via a combination of empirical and theoretical analyses. We show empirically that LLMs from multi

Stratis Tsirtsis, Kai Rawal, Chris Russell·3 days ago
4
VisionAIVision

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

Billion-parameter Vision-Language-Action (VLA) policies have recently shown impressive performance in robotic manipulation, yet their size and inference cost remain major obstacles for real-time closed-loop control. We introduce \textbf{VLA-AD}, a distillation framework that uses a Vision-Language Model as an offline semantic supervisor to transfer large VLA teachers into lightweight student policies. Instead of relying only on low-level action imitation, VLA-AD augments teacher-provided 7-DoF action targets with high-level semantic guidance, including task phase anchors and multi-frame operat

Jin Shi, Brady Zhang, Yishun Lu·3 days ago
5
MLML

Dynamics-Level Watermarking of Flow Matching Models with Random Codes

We introduce a dynamics-level approach to watermarking generative models. Rather than embedding signals into model weights or outputs, we embed the watermark directly into the learned continuous dynamics -- the velocity field of a flow matching model. We formulate this as random coding over a continuous channel: a key-dependent perturbation is added during training, and the message is recovered at detection time from black-box queries. The perturbation is designed to leave the generated distribution unchanged. Experiments on MNIST and CIFAR-10 across different architectures confirm reliable me

Shuchan Wang·3 days ago
6
AIAI

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to iteratively generate, evaluate, and optimize executable forecasting software. In a fully prospective, real-time evaluation during the 2025-2026 US respiratory season, the system autonomously discovered methodologically diverse models for influe

Sarah Martinson, Michael P. Brenner, Martyna Plomecka·3 days ago
7
MLAINLP

Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find

When researchers ask whether two transformer layers are "equivalent" for compression, they often conflate distinct tests. Replacement asks whether one layer's map can substitute for another's in place; interchange asks whether two layers approximately commute when their positions are swapped. Both are output-grounded swap-KL probes, but they need not agree: on pretrained transformers the protocol gap can change which layers look safe to prune by several-fold under the same evaluator, especially when replacement distances are high. We measure both protocols across checkpoints and architectures.

Gabriel Garcia·3 days ago
8
AINLPMLcs.MA

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz·3 days ago
9
NLPAIMLcs.ET

A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation

The accelerating convergence of smart metering, generative artificial intelligence, and quantum-inspired combinatorial optimisation is reshaping how energy utilities manage physical infrastructure, customer engagement, and environmental accountability

Pavan Manjunath, Thomas pruefer·3 days ago
10
ML

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy

Magnetic order is a fundamental property of materials, governing collective behavior and enabling a broad range of functionalities. Yet magnetic structure remains difficult to determine: experiments are costly and specialized, while first-principles methods often struggle with the noncollinear and incommensurate orders found in real materials. Here we introduce magnetic structure network (MSN), an E(3) equivariant graph neural network that predicts both collinear and non-collinear magnetic structures directly from atomic crystal structures, trained directly on experimentally determined structu

Abhijatmedhi Chotrattanapituk, Ryotaro Okabe, Eunbi Rha·3 days ago
11
AIVisioncs.GR

Evaluating Design Video Generation: Metrics for Compositional Fidelity

Generative video models are increasingly used in design animation tasks, yet no standardized evaluation framework exists for this domain. Unlike natural video generation, design animation imposes structured constraints: specific components shall animate with prescribed motion types, directions, speed and timing, while non-animated regions must remain stable and layout structure must be preserved. This paper provides a fully automated evaluation framework organized across four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity. This eliminates the reliance o

Adrienne Deganutti, Dingning Cao, Jaejung Seol·3 days ago
12
NLPMLNLP

Artificial Aphasias in Lesioned Language Models

Aphasias, selective language impairments which can arise from brain damage, reveal the functional organization of human language by providing causal links between affected brain regions and specific symptom profiles. Drawing on this literature, we introduce an aphasia-inspired technique to characterize the emergent functional organization of language models (LMs). We ``lesion'' (zero-out) model parameters and measure the effects of this intervention against clinical aphasia symptoms, as diagnosed by the Text Aphasia Battery (TAB). When applied to 112,426 outputs from five 1B-scale LMs, the ful

Nathan Roll, Jill Kries, Laura Gwilliams·3 days ago
13
MLML

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $τ$, the privacy-relevant sample size is not $n$, but $nτ$; equivalently, the effective private tail sample size is $εnτ$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $Θ(B \min\{1,(nτ)^{-1/2}+(εnτ)^{-1}\})$, and finite classes of size $M$ have rate $Θ(B \min\{1,\sqrt{\log(2M)/(nτ)}+\log(2M)/(εnτ)\})$. These complete rates hold under pure DP, and the

El Mustapha Mansouri·3 days ago
14
NLPAINLPcs.IR

Argus: Evidence Assembly for Scalable Deep Research Agents

Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet deep research answers are composed of complementary pieces of evidence, which parallel rollouts often duplicate rather than complete, yielding diminishing returns while pushing the aggregation context toward the model's limit. We propose Argus, an agentic system in which a Searcher and a Navigator cooperate to treat deep research as

Zhen Zhang, Liangcai Su, Zhuo Chen·3 days ago
15
AINLPAI

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing parameters while withholding the data provenance, curation procedures, and generation pipelines that determine model behavior. Fully Open (FO) models, which expose the complete training stack end-to-end, do not currently exist in medicine. We introduce Fully Open Meditron, the first fully open pipeline for building LLM-CDSS, comprising a clinician-audited training corpu

Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan·3 days ago
16
MLML

Hypothesis-driven construction of mesoscopic dynamics

Traditional scientific modeling typically begins with fixed, instance-wise effective equations and then carries out equation-specific analysis and computation, a procedure that becomes exceptionally challenging in complex applications such as multiscale systems. We propose an alternative paradigm by learning mesoscopic dynamics within a mathematically constrained hypothesis class. Building upon a generalized Onsager principle, we introduce a unified framework encompassing both dissipative and conservative mesoscopic dynamics. We establish uniform and a priori theoretical guarantees, including

Zhuoyuan Li, Aiqing Zhu, Qianxiao Li·3 days ago
17
ML

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

Flexible continuous-time survival modeling is critical for capturing complex time-varying hazard dynamics in high-dimensional data; however, training such models remains challenging due to the intractable integral required for likelihood estimation. We introduce QSurv, a scalable deep learning framework that enables nonparametric continuous-time modeling without relying on time discretization or restrictive distributional assumptions. We propose a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard with high-order accuracy while facilitatin

Chaeyeon Lee, Sehwan Kim, Hyungrok Do·3 days ago
18
AINLPAI

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Effective tutoring requires distinguishing optimal, valid but suboptimal, and incorrect student solutions, a distinction central to intelligent tutoring systems (ITS) but untested for LLM-based tutors. As LLMs are increasingly explored as conversational complements to ITS, evaluating their diagnostic precision is essential. We present a benchmark of seven LLM feedback agents in propositional logic using knowledge-graph-derived ground truth across 10,836 solution--feedback pairs and three feedback conditions. Models achieved near-ceiling performance on optimal steps but systematically over-reje

Tahreem Yasir, Wenbo Li, Sam Gilson·3 days ago
19
AINLPMLcs.MA

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz·3 days ago
20
AIMLAIcs.CY

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques that enable AI-enabled product and service developers, as well as third party AI developers and evaluators, to perform offline auditing and online (runtime) monitoring of product-specific (temporally extended) behavioral constraints such as safety constraints, norms, rules and regulations with resp

Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith·3 days ago
21
AIcs.DL

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2),

Arquimedes Canedo·3 days ago
22
MLcs.DC

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by separating second-order optimization logic from the critical GPU training path. Rather than keeping all preconditioner state on the accelerator, Asteria dynamically distributes optimizer state across GPU memory, CPU memory, and optional NVMe storage according to architectural constraints and runtim

Yishun Lu, Junhao Zhang, Zeyu Yang·3 days ago
23
MLML

Imitation learning for clinical decision support in pediatric ECMO

Pediatric critical care is a dynamic, high-stakes process involving constant monitoring and adjustments in life-saving treatments. Modeling these interventions is crucial for effective decision support. To address the challenges of high complexity and data scarcity in pediatric Extracorporeal Membrane Oxygenation (ECMO), we frame clinical decision-making as learning to act from trajectories, i.e., imitation learning that learns action models from observational data, with a key feature that actions are not directly observed. We consider TabPFN, a recent transformer-based approach for tabular da

Fateme Golivand, Michael Skinner, Saurabh Mathur·3 days ago
24
MLML

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

Real-world control systems frequently operate under \emph{piecewise stationary} conditions, where dynamics remain stable for extended periods before undergoing abrupt regime changes. Standard robust RL methods face a fundamental dilemma: a globally conservative policy wastes performance during stable periods, while a locally adaptive policy risks catastrophic failure when the regime changes undetected. We propose \textbf{BAPR} (Bayesian Amnesic Piecewise-Robust SAC), which unifies Bayesian Online Change Detection (BOCD) with robust ensemble RL. The BAPR operator -- a convex combination of mode

Yifan Zhang, Liang Zheng·3 days ago
Hacker News AI24
1
1.7k

Claude Opus 4.7

HN·about 1 month ago
2
1.4k

Google Chrome silently installs a 4 GB AI model on your device without consent

HN·13 days ago
3
1.3k

GPT-5.5

HN·25 days ago
4
1.3k

Project Glasswing: Securing critical software for the AI era

HN·about 1 month ago
5
1.2k

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode

HN·about 2 months ago
6
1.1k

Claude Code refuses requests or charges extra if your commits mention "OpenClaw"

HN·18 days ago
7
1.1k

Tell HN: Docker pull fails in Spain due to football Cloudflare block

HN·about 1 month ago
8
1.1k

Issue: Claude Code is unusable for complex engineering tasks with Feb updates

HN·about 1 month ago
9
1.0k

Local AI needs to be the norm

HN·8 days ago
10
1.0k

Claude Design

HN·about 1 month ago
11
903

Canvas online again as ShinyHunters threatens to leak schools’ data

HN·11 days ago
12
883

Copy Fail

HN·19 days ago
13
864

I cancelled Claude: Token issues, declining quality, and poor support

HN·24 days ago
14
857

Microsoft and OpenAI end their exclusive and revenue-sharing deal

HN·21 days ago
15
810

Localsend: An open-source cross-platform alternative to AirDrop

HN·20 days ago
16
783

Postmortem: TanStack NPM supply-chain compromise

HN·7 days ago
17
772

Eight years of wanting, three months of building with AI

HN·about 1 month ago
18
754

ChatGPT Images 2.0

HN·27 days ago
19
742

Bitwarden CLI compromised in ongoing Checkmarx supply chain campaign

HN·25 days ago
20
708

An update on recent Claude Code quality reports

HN·25 days ago
21
696

Claude Code Routines

HN·about 1 month ago
22
676

System Card: Claude Mythos Preview [pdf]

HN·about 1 month ago
23
672

Show HN: Apfel – The free AI already on your Mac

HN·about 2 months ago
24
637

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

HN·about 1 month ago