This paper introduces LENS-DF, a novel framework for detecting and temporally localizing deepfake segments in long-form noisy speech. The model jointly analyzes temporal and acoustic cues to identify manipulated regions with higher granularity, advancing robustness in real-world noisy environments.