The Inference Report

June 20, 2026
Research Papers — Focused

Today's papers cluster around three methodological currents: the integration of bidirectional constraints and cycle-consistency to enforce physical or biological coherence between forward and inverse processes (evident in SIMBA's joint retrieval-reconstruction framework for atmospheric modeling and FKRSBM's subgroup-aware transport for tau PET harmonization); the systematic application of conditional generative models, particularly diffusion and diffusion-Schrödinger bridge variants, to domain-specific synthesis tasks where anatomical or structural fidelity matters more than raw realism (synthetic AD-MRI, FCD lesion generation, virtual monochromatic CT); and a growing recognition that evaluation frameworks must account for hidden acquisition or measurement variables that DICOM metadata and standard benchmarks fail to capture, as illustrated by work on kernel-driven measurement instability in lung nodules, resolution-agnostic encoding for native fMRI, and the need for task-specific rather than borrowed-from-vision benchmarks in cryoET. Across these clusters, state-space models (Mamba architectures) and optimal-transport formulations appear as technical solutions to capture long-range dependencies and enforce distributional alignment without destructive preprocessing, while weakly supervised and unsupervised learning frameworks, from label-proportion segmentation to physics-informed priors in inverse problems, emerge as practical responses to the annotation bottleneck that persists even as deep learning methods mature. The field is increasingly asking not whether a model achieves high accuracy on a standard split, but whether its learned representations remain valid under the measurement conditions and domain shifts that define clinical or scientific practice.

Cole Brennan

Showing of papers

SIMBA: ABidirectional Retrieval Forward Simulation Framework for Modeling FY-4A GIIRS Hyperspectral Infrared Radiances Toward NWP Applications eess.IV

Hyperspectral infrared observations are an important data source for numerical weather prediction (NWP) because they provide rich information on the vertical structure of atmospheric temperature and humidity. However, most existing deep learning methods mainly focus on one-way retrieval from radiances to atmospheric profiles, while the reverse radiance simulation process and the consistency between atmospheric state space and radiance observation space are insufficiently considered. In this study, we propose SIMBA, a unified bidirectional retrieval-forward simulation framework for FY-4A GIIRS hyperspectral infrared radiance modeling toward NWP applications. The framework jointly performs atmospheric profile retrieval and radiance reconstruction, introduces a cycle-consistency constraint to strengthen the coupling between the two processes, and employs a bidirectional Mamba state-space module to capture long-range dependencies along pressure levels. Using collocated FY-4A GIIRS observations and ERA5 reanalysis data, the proposed method is evaluated for temperature retrieval, specific humidity retrieval, long-wave radiance reconstruction, and medium-wave radiance reconstruction. Experimental results show that SIMBA outperforms several representative deep learning baselines across both retrieval and reconstruction tasks, while ablation experiments confirm the contribution of the bidirectional design and cycle-consistency mechanism. These results demonstrate that the proposed framework is effective for joint atmospheric profile retrieval and hyperspectral infrared radiance modeling, and suggest potential for future Jacobian-related analysis and NWP-oriented extensions.

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks eess.IV

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI eess.IV

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization eess.IV

Tau PET imaging is central to tracking Alzheimer's disease progression, but systematic differences between scanners, protocols, and radiotracers across sites introduce nonbiological variability that inflates biomarker variance, reduces sensitivity to disease effects, and can bias downstream clinical assessments. Harmonization methods aim to remove these site-induced shifts while preserving biologically meaningful signal, yet existing approaches struggle when source and target cohorts differ in subgroup composition, risking conflation of site effects with biological variation such as tau-positivity status. We propose the Feynman Kac Reweighted Schröodinger Bridge Matching (FKRSBM) model to address this problem. Rather than routing data through a Gaussian noise prior as in diffusion-based methods, FKRSBM learns a direct stochastic transport process between source and target distributions via entropy-regularized optimal transport. To enforce biologically consistent transport, FKRSBM incorporates a subgroup-aware endpoint proposal derived from a Feynman Kac reweighting of the reference bridge measure, implemented entirely through stratified importance sampling at the data level and requiring no changes to the underlying bridge-matching solver or network architecture. For surface-based neuroimaging, FKRSBM employs a spherical convolutional backbone operating on cortical meshes to perform vertex-level harmonization. We evaluate the method on tau PET SUVR maps, harmonizing PI-2620 data from the HABS-HD cohort into the AV-1451 domain of ADNI. Compared against ComBat, CycleGAN, a diffusion-based method (DF), and unregularized Diffusion Schröodinger Bridge Matching (DSBM), FKRSBM achieves superior distributional alignment, reduced tau-positivity sign mismatch, stronger APOE subgroup alignment, and improved downstream disease classification performance.

Input-Dependent Fisher Information for Local Sensitivity Analysis of Medical Image Classifiers eess.IV

Deep neural networks have achieved strong performance in medical image classification, but often work like black-box. Commonly used post-hoc interpretation methods often provide heuristic visualizations whose relationship to the classifier's predictive distribution is indirect. This work introduces a local sensitivity analysis framework based on the input-dependent Fisher Information Matrix (iFIM) of a trained classifier. The iFIM characterizes how the classifier's predictive distribution changes under infinitesimal perturbations of the input image. By using a Gram-matrix formulation, the nonzero eigenspectrum of the iFIM can be recovered without explicitly forming the full image-dimensional Fisher matrix. The leading iFIM eigenspace is then used to project an input image into a high local-sensitivity component and its orthogonal component. These components provide a model-intrinsic description of local predictive sensitivity, rather than a conventional pixel-wise attribution heatmap or a causal segmentation of task-relevant anatomy. The framework is evaluated on controlled and clinical medical image classification tasks using multiple classifier architectures. Perturbation-based experiments show that high-sensitivity iFIM components are more strongly coupled to changes in predictive confidence and classification performance than lower-sensitivity complementary components. The results support the iFIM framework as a principled tool for analyzing local decision sensitivity and for complementing existing attribution-based interpretability methods in medical imaging.

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata eess.IV

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

Multimodal Brain Tumour Classification Using Feature Fusion eess.IV

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI eess.IV

The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at https://github.com/OneMore1/FlexiBrain.

POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET eess.IV

Cryo-electron tomography (cryoET) has emerged as a powerful tool in structural and cellular biology by enabling direct visualization of macromolecular structures within intact cells, thereby linking molecular architecture to cellular organization in a native context. Realizing the full potential of cryoET, however, increasingly depends on advances in computational analysis, particularly machine learning (ML), to interpret its complex and information-rich data. Despite rapid progress, ML development for cryoET remains bottlenecked by the lack of standardized, well-annotated benchmarks. Existing evaluations are typically small, task-specific, and are assembled in isolation, limiting robust comparisons across methods. Here, we present POPSICLE, a benchmark suite for cryoET segmentation and macromolecular localization built from the CryoET Data Portal - an open, ML-ready repository of tomographic data, metadata, and annotations. POPSICLE spans eukaryotic and prokaryotic systems, both purified and fully in situ samples, and dense voxel-wise segmentation as well as sparse localization tasks. Built on a living data resource, it can expand as new datasets and annotations become available. Baseline experiments reveal substantial variation in model rankings across tasks, underscoring the need for benchmarks tailored to the unique characteristics of cryoET rather than evaluation practices adapted from adjacent biomedical imaging domains. POPSICLE thus provides an open and extensible foundation for reproducible ML evaluation in cryoET.

Unsupervised Deep Learning for Limited-Angle STEM-EDX Tomography -- Application to 3D Chemical Analysis of Phase-Change Memory Devices eess.IV

Energy Dispersive X-ray (EDX) tomography in Scanning Transmission Electron Microscopy (STEM) enables 3D compositional and elemental mapping at the nanoscale, but its use is limited by restricted tilt ranges and low-dose conditions required to avoid beam damage. Limited-angle acquisition introduces missing-wedge artefacts such as elongation and anisotropic resolution, while noisy low-dose data further degrade reconstruction quality and quantitative reliability. Here, we introduce an unsupervised deep learning framework based on Deep Image Prior with total variation regularization (DIP-TV) for limited-angle STEM-EDX tomography. We extend it to a multi-channel formulation (DIPm-TV) that jointly reconstructs multiple elemental maps by exploiting spatial correlations. Using a synthetic 3-channel phantom, we show that the method compensates for severe missing-wedge artefacts corresponding to approximately $100^\circ$ of missing angular range under moderate noise, outperforming simultaneous iterative reconstruction technique and compressed sensing approaches. We apply the method to 3D chemical analysis of Ge-Sb-Te (GST) memory devices in virgin (as-fabricated) and SET (crystalline) operational states. Samples were prepared as cross-sectional focused ion beam lamellae and acquired under a limited-angle tilt range from $-40^\circ$ to $+40^\circ$ with $5^\circ$ steps and a dose of $2.0\times10^5$ $e^-/Ang^2$. The multi-channel approach enables voxel-by-voxel elemental reconstruction using only EDX signals without external structural priors such as high-angle annular dark-field imaging. The reconstructed volumes show near-isotropic spatial resolution and reveal compositional heterogeneities associated with device operation. This approach enables 3D chemical characterization in experimentally accessible sample geometries where conventional methods fail due to severe angular limitations.

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation eess.IV

The nnU-Net has demonstrated continuous success in medical segmentation tasks, which heavily rely on the availability and diversity of annotated biomedical data. However, assembling medical imaging cohorts remains challenging due to numerous factors such as privacy regulations and annotation costs. As a result, data augmentation plays a crucial role in increasing data availability while maintaining anatomical feasibility. Hence, we propose the ++nnU-Net, a novel data augmentation module based on image registration that operates prior to preprocessing and training take place. Our framework was evaluated across five different 2D datasets. In this workflow, image data go through a two-stage registration process, generating new warped images. The transformations are then applied to the respective segmentation. In addition, the pipeline computes available disk space, generates supplementary binary synthetic masks and generates checkpoints. We demonstrate that the ++nnU-Net outperforms the nnU-Net baseline, yielding improvements in Dice Similarity Coefficient scores. In the most prominent cases, we observe performance gains of approximately 22\%. These findings highlight the effectiveness of registration-based data augmentation, particularly for 2D medical imaging datasets and suggest that the ++nnU-Net provides a practical and scalable approach for enhancing segmentation performance in data-limited settings. The source code for the ++nnU-Net is available at: https://github.com/sofia-adelie/plusplusnnunet.git

Multimodal Brain Tumour Classification Using Feature Fusion eess.IV

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios eess.IV

Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficult to acquire. This study aims to generate synthetic MRI data exhibiting FCD, assess their realism, and evaluate their impact on automated FCD detection, particularly in reducing the need for manual annotations. Methods: T1-weighted (T1w) and T2-weighted Fluid-Attenuated Inversion Recovery (FLAIR) MRI scans from 131 FCD patients and 90 healthy controls from multiple (3) sites were retrospectively studied. Synthetic MRIs were generated by conditioning a generative network on binary FCD masks. Two neuroradiologists identified real images from a random set of 14 real and 14 synthetic scans. Three nnU-Net models were trained to detect FCD using: (i) real-only (35 FCD / 35 controls), (ii) real (35 FCD / 35 controls) plus synthetic augmentation, and (iii) expanded real data (70 FCD / 70 controls). Results: Experts showed limited ability to distinguish real from synthetic images, with classification accuracy of 60% for T1w and 70% for FLAIR (inter-rater agreement kappa = 0.86). Augmenting automated FCD detection with synthetic data increased sensitivity by 8.14% (p = 0.12) and improved model confidence at true lesion sites (0.83 +/- 0.11 to 0.89 +/- 0.12; p = 0.02). The expanded real-data model further improved sensitivity to 73.8% (p < 0.001) and confidence to 0.90 +/- 0.14 (p = 0.01). Conclusion: Conditional generative networks can generate realistic synthetic FCD-MRIs, reducing labeled data needs by approximately 20% while maintaining equivalent sensitivity. Equivalent amounts of real data, when available, remain more effective than synthetic augmentation.

Constructing efficient channels for ideal observers using the conjugate gradient method eess.IV

Task-based assessment of image quality (IQ) is critically important for the design and optimization of medical imaging systems. Ideal observers, including the Bayesian Ideal Observer (IO) and the ideal linear observer, i.e., the Hotelling observer (HO), provide objective figures of merit (FOMs) that quantify system performance on signal detection tasks. However, the application of ideal observers to high-dimensional image data is often computationally intractable. Channel mechanisms provide an effective framework for dimensionality reduction that can facilitate the computation of ideal observers. This work presents a conjugate gradient (CG)-based method to construct efficient channels for approximating the IO and HO performance.

A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging eess.IV

Dual-energy CT (DECT) enables virtual monochromatic imaging (VMI) and improved contrast resolution, but its clinical adoption is limited by hardware complexity and cost. In this work, we propose a unified deep learning framework that synthesizes contrast-phase-specific virtual monochromatic 50 keV images from single-energy CT (SECT) data by leveraging contrast phase information as a prior. The model is trained using DECT-derived 70 keV and 50 keV image pairs across four contrast phases -- Angio, Arterial, Portal, and Delayed -- using a novel prior conditioning architecture that integrates contrast phase priors into the energy transformation process. We demonstrate that the proposed unified model achieves contrast enhancement and generalizes well across contrast phases. Additionally, we show that the model can generate 50 keV-like images from SECT inputs, preserving contrast phase-specific dynamics.

Mapping Tomato Cropping Systems in California Using AlphaEarth Geospatial Embeddings and Deep Learning Analysis eess.IV

Field-scale crop maps support supply-chain forecasting and policy, yet statewide crop identification still often depends on retrospective surveys or remote-sensing workflows built around hand-engineered spectral features. Those pipelines can be accurate, but they require repeated preprocessing and often lose robustness across years. This study evaluated whether Google DeepMind's AlphaEarth geospatial embeddings can serve as an analysis-ready alternative for mapping processing tomato systems in California. LandIQ 2018 crop polygons were used to assemble a balanced reference dataset of 4,742 tomato and 4,742 non-tomato fields. For each polygon, 64-band AlphaEarth embedding chips were extracted and aligned with binary masks, then divided into spatially independent training (n = 6,638), validation (n = 1,422), and test (n = 1,424) sets. A U-Net segmentation model was trained on AWS SageMaker using a composite masked binary cross-entropy and soft Dice loss. To complement hard predictions, Monte Carlo dropout was retained at inference and repeated 100 times per chip to estimate predictive mean and variance. On the independent test set, the model achieved 99.19% pixel accuracy, 98.69% precision, 99.40% recall, 99.04% F1 score, 98.11% intersection over union, and 99.02% chip accuracy. Uncertainty maps were consistently highest near field edges and low within field interiors. The results show that AlphaEarth embeddings retain crop-relevant spatial and temporal structure and can support accurate, field-scale tomato mapping without manual feature engineering.

An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation eess.IV

The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation, combined with a weighted global reconstruction loss. This design avoids non-physical intensity discontinuities at masked-region boundaries that arise from learnable mask tokens. On downstream AutoPET lesion segmentation, the proposed models demonstrate strong label efficiency: with only 10\% of the labeled training data, they achieve performance comparable to models trained from scratch on the full dataset. Under extreme 5-shot linear probing, joint PET/CT pretraining also achieves higher Dice scores than separated-modality pretraining. This multi-center foundation model demonstrates label efficiency and cross-modality representation learning for PET/CT tumor segmentation. It provides a robust, open-source basis for advancing automated oncologic imaging, significantly reducing the need for large-scale manual annotations in clinical practice.

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing eess.IV

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.

NexOP: Joint Optimization of NEX-Aware k-space Sampling and Image Reconstruction for Low-Field MRI eess.IV

Modern low-field magnetic resonance imaging (MRI) technology offers a compelling alternative to standard high-field MRI, with portable, low-cost systems. However, its clinical utility is limited by a low Signal-to-Noise Ratio (SNR), which hampers diagnostic image quality. A common approach to increase SNR is through repetitive signal acquisitions, known as NEX, but this results in excessively long scan durations. Although recent work has introduced methods to accelerate MRI scans through k-space sampling optimization, the NEX dimension remains unexploited; typically, a single sampling mask is used across all repetitions. Here we introduce NexOP, a deep-learning framework for joint optimization of the sampling and reconstruction in multi-NEX acquisitions, tailored for low-SNR settings. NexOP enables optimizing the sampling density probabilities across the extended k-space-NEX domain, under a fixed sampling-budget constraint, and introduces a new deep-learning architecture for reconstructing a single high-SNR image from multiple low-SNR measurements. Experiments with raw low-field (0.3T) brain data demonstrate that NexOP consistently outperforms competing methods, both quantitatively and qualitatively, across diverse acceleration factors and tissue contrasts. The results also demonstrate that NexOP yields non-uniform sampling strategies, with progressively decreasing sampling across repetitions, hence exploiting the NEX dimension efficiently. Moreover, we present a theoretical analysis supporting these numerical observations. Overall, this work proposes a sampling-reconstruction optimization framework highly suitable for low-field MRI, which can enable faster, higher-quality imaging with low-cost systems and contribute to advancing affordable and accessible healthcare.

LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation eess.IV

Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for human perception, limiting their applicability to machine-perception tasks and non-traditional modalities such as spatial audio arrays, hyperspectral images, and 3D medical images. General-purpose compression schemes based on scalar quantization or resolution reduction are broadly applicable but fail to exploit inherent signal redundancies, resulting in suboptimal rate-distortion performance. Recent generative neural codecs, or tokenizers, model complex signal dependencies but are often over-parameterized, data-hungry, and modality-specific, making them impractical for resource-constrained environments. We introduce a Lightweight, Versatile, and Asymmetric neural codec architecture (LiVeAction), that addresses these limitations through two key ideas. (1) To reduce the complexity of the encoder to meet the resource constraints of the execution environments, we impose an FFT-like structure and reduce the overall size and depth of the neural-network-based analysis transform. (2) To allow arbitrary signal modalities and simplify training, we replace adversarial and perceptual losses with a variance-based rate penalty. Our design produces codecs that deliver superior rate-distortion performance compared to state-of-the-art generative tokenizers, while remaining practical for deployment on low-power sensors. We release our code, experiments, and python library at https://github.com/UT-SysML/liveaction .

Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy eess.IV

Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.

FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization eess.IV

Federated learning (FL) holds great potential for medical applications. However, statistical heterogeneity across healthcare institutions poses a major challenge for FL, as the global model struggles both to generalize across unseen patient populations and to adapt to the unique data distributions of individual hospitals. This heterogeneity also exacerbates forgetting at both the global and local level, resulting in previous learned patient patterns to be misclassified after model updates. While prior work has largely treated generalization and personalization as separate challenges, we show that a better balance between the two can be achieved through selective alignment with the global model and a modified aggregation scheme, which together mitigate the effects of statistical heterogeneity. Specifically, we introduce FedKPer, which introduces knowledge personalization into the training stage of each local device. Afterwards, generalization is considered via the global model aggregation process, where local updates that are reliable and label-diverse are emphasized. We evaluate the performance of FedKPer, devising additional metrics that relate to common consequences of forgetting. Overall, we demonstrate FedKPer improves the generalization-personalization trade-off without sacrificing retention.

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks eess.IV

With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.

Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution eess.IV

Deep learning models for 12-lead electrocardiogram (ECG) analysis achieve high diagnostic performance but lack the intuitive interpretability required for clinical integration. Standard feature attribution methods are limited by the inherent difficulty in mapping abstract waveform fluctuations to physical anatomical pathologies. To resolve this, we propose a cross-modal method that projects feature attributions from high-performance 12-lead ECG models onto the CineECG 3D anatomical space. Our study reveals that while models trained directly on CineECG signals suffer from reduced accuracy and incoherent attributions, the proposed mapping mechanism effectively recovers clinically relevant feature rankings. Validated against a ground-truth dataset of 20 cases annotated by domain experts, the mapped explanations yield a Dice score of 0.56, significantly outperforming the 0.47 baseline of standard 12-lead attributions. These findings indicate that cross-modal averaging mapping effectively filters attribution instability and improves the localization of pathological features, combining the diagnostic expressiveness of standard ECG with the intuitive clarity of anatomical visualization.

Diffusion-OAMP for Joint Image Compression and Wireless Transmission eess.IV

Joint image compression and wireless transmission remain relatively underexplored compared to generic image restoration, despite its importance in practical communication systems. We formulate this problem under an equivalent linear model, and propose Diffusion-OAMP, a training-free reconstruction framework that embeds a pre-trained diffusion model into the OAMP algorithm. In Diffusion-OAMP, the OAMP linear estimator produces pseudo-AWGN observations, while the diffusion model serves as a nonlinear estimator under an SNR-matching rule. This framework offers a way to incorporate multiple generative priors into OAMP. Experiments with varying compression ratios and noise levels show that Diffusion-OAMP performs favorably against classic methods in the evaluated settings.

Deep Learning-Enabled Dissolved Oxygen Sensing in Biofouling Environments for Ocean Monitoring eess.IV

The escalating climate crisis and ecosystem degradation demand intelligent, low-cost sensors capable of robust, long-term monitoring in real-world environments. Absolute dissolved oxygen (DO) concentration is a key parameter for predicting climate tipping points. Inexpensive optoelectronic sensors based on microstructured polymer films doped with phosphorescent dyes could be readily deployable; however, signal drift and marine biofouling remain major challenges. Here, we introduce a sensing paradigm that combines camera-based DO sensors with a visual transformer (ViT)-based physics-informed neural network (PINN) for high-fidelity sensing under biofouling conditions. Training and testing data were obtained from an algae-laden water tank over 14 days to capture accelerated biofouling. The ViT-PINN, which embeds the Stern-Volmer (SV) equation into the loss function, reduces mean average error (MAE) by 92% and 89% compared to classical statistical and ML approaches, achieving ~2 umol/L absolute error. A deep ensemble further quantifies predictive uncertainty, enabling self-diagnostic sensing.

Semantic Segmentation for Histopathology using Learned Regularization based on Global Proportions eess.IV

In pathology, the spatial distribution and proportions of tissue types are key indicators of disease progression, and are more readily available than fine-grained annotations. However, these assessments are rarely mapped to pixel-wise segmentation. The task is fundamentally underdetermined, as many spatially distinct segmentations can satisfy the same global proportions in the absence of pixel-wise constraints. To address this, we introduce Variational Segmentation from Label Proportions (VSLP), a two-stage framework that infers dense segmentations from global label proportions, without any pixel-level annotations. This framework first leverages a pre-trained transformer model with test-time augmentation to produce a pixel-wise confidence estimate. In the second stage, these estimates are fused by solving a variational optimization problem that incorporates a Wasserstein data fidelity term alongside a learned regularizer. Unlike end-to-end networks, our variational method can visualize the fidelity-regularization energy, resulting in more interpretable segmentation. We validate our approach on two public datasets, achieving superior performance over existing weakly supervised and unsupervised methods. For one of these datasets, proportions have been estimated by an experienced pathologist to provide a realistic benchmark to the community. Furthermore, the method scales to an in-house dataset with noisy pathologist labels, severely outperforming state-of-the-art methods, thereby demonstrating practical applicability. The code and data will be made publicly available upon acceptance at https://github.com/xiaoliangpi/VSLP.

Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction? eess.IV

The emergence of large-scale pretrained foundation models has transformed computer vision, enabling strong performance across diverse downstream tasks. However, their potential for physics-based inverse problems, such as accelerated cardiac MRI reconstruction, remains largely underexplored. In this work, we investigate whether natural-domain foundation models can serve as effective image priors for accelerated cardiac MRI reconstruction, and compare the performance obtained against domain-specific counterparts such as BiomedCLIP. We propose an unrolled reconstruction framework that incorporates pretrained, frozen visual encoders, such as CLIP, DINOv2, and BiomedCLIP, within each cascade to guide the reconstruction process. Through extensive experiments, we show that while task-specific state-of-the-art reconstruction models such as E2E-VarNet achieve superior performance in standard in-distribution settings, foundation-model-based approaches remain competitive. More importantly, in challenging cross-domain scenarios, where models are trained on cardiac MRI and evaluated on anatomically distinct knee and brain datasets--foundation models exhibit improved robustness, particularly under high acceleration factors and limited low-frequency sampling. We further observe that natural-image-pretrained models, such as CLIP, learn highly transferable structural representations, while domain-specific pretraining (BiomedCLIP) provides modest additional gains in more ill-posed regimes. Overall, our results suggest that pretrained foundation models offer a promising source of transferable priors, enabling improved robustness and generalization in accelerated MRI reconstruction.

Useful nonrobust features are ubiquitous in biomedical images eess.IV

We study whether deep networks for medical imaging learn useful nonrobust features - predictive input patterns that are not human interpretable and highly susceptible to small adversarial perturbations - and how these features impact test performance. We show that models trained only on nonrobust features achieve well above chance accuracy across five MedMNIST classification tasks, confirming their predictive value in-distribution. Conversely, adversarially trained models that primarily rely on robust features sacrifice in-distribution accuracy but yield markedly better performance under controlled distribution shifts (MedMNIST-C). Overall, nonrobust features boost standard accuracy yet degrade out-of-distribution performance, revealing a practical robustness-accuracy trade-off in medical imaging classification tasks that should be tailored to the requirements of the deployment setting.

Maximum Likelihood Reconstruction for Multi-Look Digital Holography with Markov-Modeled Speckle Correlation eess.IV

Multi-look acquisition is a widely used strategy for reducing speckle noise in coherent imaging systems such as digital holography. By acquiring multiple measurements, speckle can be suppressed through averaging or joint reconstruction, typically under the assumption that speckle realizations across looks are statistically independent. In practice, however, hardware constraints limit measurement diversity, leading to inter-look correlation that degrades the performance of conventional methods. In this work, we study the reconstruction of speckle-free reflectivity from complex-valued multi-look measurements in the presence of correlated speckle. We model the inter-look dependence using a first-order Markov process and derive the corresponding likelihood under a first-order Markov approximation, resulting in a constrained maximum likelihood estimation problem. To solve this problem, we develop an efficient projected gradient descent framework that combines gradient-based updates with implicit regularization via deep image priors, and leverages Monte Carlo approximation and matrix-free operators for scalable computation. Simulation results demonstrate that the proposed approach remains robust under strong inter-look correlation, achieving performance close to the ideal independent-look scenario and consistently outperforming methods that ignore such dependencies. These results highlight the importance of explicitly modeling inter-look correlation and provide a practical framework for multi-look holographic reconstruction under realistic acquisition conditions. Our code is available at: https://github.com/Computational-Imaging-RU/MLE-Holography-Markov.