Unlocking the Potential of Diffusion Language Models through Template Infilling

Junhoo Lee1,Seungyeon Kim1,Nojun Kwak1
ACL 2026
1Seoul National University

The qualitative comparison highlights how generation trajectories diverge on complex reasoning tasks. Under pure parallel generation, naive generation and AR-style prompting suffer from repetitive corruption and logical drift, whereas Template Infilling uses structural anchors to keep the response aligned with a coherent reasoning path.

Template Infilling qualitative comparison figure.

Abstract

Diffusion Language Models (DLMs) have emerged as a promising alternative to autoregressive language models, yet their inference strategies remain limited to prefix-based prompting inherited from the autoregressive paradigm. In this paper, we propose Template Infilling (TI), a conditioning methodology tailored to DLMs.

Instead of treating the prompt as a single prefix, TI distributes structural anchors across the entire target response so the model can establish a global blueprint before filling the masked spans. We further introduce Dynamic Segment Allocation (DSA), which expands low-confidence regions to provide extra reasoning space while preserving the structural template.

Across mathematics, code generation, and trip planning benchmarks, TI delivers consistent gains over baseline prompting, remains robust under accelerated sampling, and improves reflective safety behavior by enforcing a structured draft-critique-refine process directly inside the generation trajectory.

Method Overview

TI reformulates generation as a structured sequence [c, A1, M1, A2, M2, ..., An, Mn], where anchors act as persistent boundary conditions throughout the response. This gives masked spans access to both previous context and future structural checkpoints.

DSA complements this structure by monitoring uncertainty during refinement and allocating more space to spans that still need room to complete their reasoning. Together, TI and DSA preserve the original DLM flexibility while making global planning more stable.

The method overview illustrates how Template Infilling distributes anchors throughout the sequence instead of relying on a single prefix, while Dynamic Segment Allocation expands low-confidence spans so the model can preserve both structure and flexibility.

Overview of Template Infilling and Dynamic Segment Allocation.

Main Results

TI improves reasoning, coding, and planning performance across both native diffusion models and adapted diffusion models. The paper reports an average gain of 9.40 percentage points over the baseline, with especially large improvements on instruction-following and Dream-7B base settings.

The main results compare vanilla decoding, prefix prompting, and Template Infilling across GSM8K, MATH500, HumanEval, and Trip Planning. The table shows that structural conditioning is consistently more effective than standard prefix prompting for both LLaDA-8B and Dream-7B families.

Benchmark results table for Template Infilling.

Safety Guardrails

Beyond benchmark accuracy, TI uses globally placed anchors to enforce a draft-critique-refine workflow. This makes refusal behavior more stable under malicious or deceptive prompts.

The safety comparison shows that reserving space for draft, critique, and final response stages inside the sequence helps TI preserve reflective safety behavior more reliably than prefix-only prompting.

Safety comparison figure.

Analysis

TI remains more stable as generation length increases and also preserves performance when the number of sampling steps is reduced. The same structural prior also reshapes the model's generation order into a more global planning pattern, where anchors are established first and intervening spans are filled in afterward.

The robustness plots show how performance changes across longer generation lengths and fewer sampling steps, and in both settings TI maintains a consistent advantage over vanilla decoding under parallel generation.

The generation-order plot further shows that TI prioritizes structural anchors early and then fills the remaining gaps, producing a more coherent planning trajectory than vanilla diffusion decoding.

Robustness to generation length and sampling acceleration.
Analysis of generation mechanism and relative generation order.

BibTeX

@misc{lee2026unlockingpotentialdiffusionlanguage,
      title={Unlocking the Potential of Diffusion Language Models through Template Infilling},
      author={Junhoo Lee and Seungyeon Kim and Nojun Kwak},
      year={2026},
      eprint={2510.13870},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.13870},
}