CVPR 2026

CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models

Junhoo Lee,Mijin Koo,Nojun Kwak

Seoul National UniversityCVPR 2026

CSF attributes deployed text-to-image APIs back to protected base families using only black-box query access, with no watermarking and no visibility into model internals.

PaperCodeBibTeXHomepage

Comparison between watermarking, traditional fingerprinting, and CSF in the query-only setting

CSF targets the most restrictive query-only setting, where the defender sees only the final text-to-image API and must still recover lineage evidence.

Abstract

Text-to-image models are commercially valuable assets often distributed under restrictive licenses, but such licenses are enforceable only when violations can be detected. Existing methods require pre-deployment watermarking or internal model access, which are unavailable in commercial API deployments.

We present Compositional Semantic Fingerprinting (CSF), the first black-box method for attributing fine-tuned text-to-image models to protected lineages using only query access. CSF treats models as semantic category generators and probes them with compositional underspecified prompts that remain rare under fine-tuning. Across 6 model families and 13 fine-tuned variants, the Bayesian attribution framework supports controlled-risk lineage decisions, with all variants satisfying the dominance criterion.

Challenges

Why naive visual matching fails.

Fine-tuning often changes texture, palette, composition, and rendering style so aggressively that side-by-side visual inspection becomes unreliable. Two models can share the same lineage while looking very different at the pixel level, which is exactly why CSF avoids direct visual matching and instead measures how a model resolves ambiguous semantic prompts.

Style drift across related model families makes direct visual matching unreliable

This challenge figure shows why lineage attribution is hard in practice: downstream variants can move far away in style while still inheriting the same semantic prior from the protected base model. A robust black-box method therefore has to focus on the semantic distribution a model produces, not on superficial style similarity.

Methods

Method overview.

CSF estimates prompt-conditioned semantic distributions, compares them with Wasserstein distance, and converts the resulting distances into a posterior over candidate lineages.

Problem formulation

We are given a set of protected base models and a deployed suspect API that may have been fine-tuned from one of them. The defender does not see weights, activations, or training logs; only text queries and generated images are available. The goal is to assign a posterior over candidate lineages and make an attribution decision with controlled confidence. For each prompt p and model m, CSF samples multiple generations, maps each image to a semantic label c, and estimates the prompt-conditioned category distribution.

\hat{\pi}_m(c \mid p) = \frac{1}{N} \sum_{i=1}^{N} \mathbf{1}[g(x_i) = c], \qquad x_i \sim m(p)

CSF pipeline

CSF probes each model with compositional, underspecified prompts that force it to resolve ambiguity using learned semantic priors. The resulting category distributions are then compared against base-model references using Wasserstein distance, and a Bayesian attribution rule produces the final lineage posterior and dominance test. In other words, the suspect model is compared against every protected base over a prompt set P, and smaller transport cost becomes stronger attribution evidence.

d_b = \sum_{p \in P} W_1\!\left(\hat{\pi}_s(\cdot \mid p), \hat{\pi}_b(\cdot \mid p)\right), \qquad P(b \mid s) \propto \exp(-\tau d_b)

Accept $b^* = \arg\max_b P(b \mid s)$ only when the dominance margin stays above a threshold: $P(b^* \mid s) - \max_{b \neq b^*} P(b \mid s) > \delta$ .

Results

Quantitative Results

Table 1

Posterior attribution across all 13 fine-tuned suspects.

Each row is a deployed suspect model, each column is a candidate protected base lineage, and every cell reports the posterior mean attribution score under CSF. The correct family stays dominant for all 13 suspects even after substantial style drift.

Suspect Model	Flux-Base	Kandinsky-Base	SD1.5-Base	SD2.1-Base	SD3-Medium-Base	SDXL-Base
Flux Family
Flux-LoRA	0.932*	0.023	0.023	0.023	0.023	0.068
Flux-Turbo-Alpha	0.977*	0.023	0.023	0.023	0.023	0.023
Kandinsky Family
Kandinsky-Naruto	0.023	0.977*	0.023	0.023	0.023	0.023
Kandinsky-Pokemon-LoRA	0.049	0.829*	0.049	0.098	0.024	0.049
SD1.5 Family
SD1.5-1.2-Base	0.023	0.023	0.841*	0.114	0.023	0.068
SD1.5-1.4-Base	0.023	0.023	0.977*	0.023	0.023	0.023
SD1.5-DreamShaper	0.091	0.068	0.659*	0.045	0.068	0.159
SD2.1 Family
SD2.1-DPO	0.023	0.023	0.023	0.977*	0.023	0.023
SD2.1-LAION-Art	0.023	0.023	0.023	0.977*	0.023	0.023
SD3 Family
SD3-Reality-Mix	0.136	0.091	0.023	0.045	0.705*	0.091
SD3-VAE-Anime	0.023	0.023	0.023	0.023	0.977*	0.023
SDXL Family
SDXL-DPO	0.023	0.023	0.023	0.023	0.023	0.977*
SDXL-Lightning-4Step	0.023	0.091	0.023	0.068	0.023	0.864*

Posterior mean attribution scores under CSF. Asterisks mark the dominant lineage after applying the dominance test.

Analysis

Secondary analyses support the same fingerprint.

Table 2 tests the metric choice, Table 3 tests adversarial erasure, the ring figure shows prompt-conditioned semantic drift, and Figure 4 confirms that humans can perceive the same lineage cue when asked the right question.

Metric Comparison

Wasserstein produces a clearer attribution margin than JSD.

Across hard variants such as Kandinsky-Naruto, SD3-Reality-Mix, and SDXL-DPO, Wasserstein preserves a wider separation between the correct lineage and competing bases.

Variant	Wasserstein	JSD	Gap
Flux-LoRA	93.2%	77.3%	+15.9%
Kandinsky-Naruto	97.7%	43.2%	+54.5%
SD3-Reality-Mix	70.5%	56.8%	+13.7%
SDXL-DPO	97.7%	70.5%	+27.2%

Prompt Figure

Ring figure showing prompt-conditioned semantic mixtures

Context rotates the semantic mixture.

Holding the core subject fixed while changing only the scene context changes the semantic mixture a model resolves, which is the exact signal CSF measures.

Figure 4

Human study showing stronger lineage identification under CSF prompts

Human study aligns with the fingerprint.

The original paper's human study shows that observers identify the protected base model much more accurately under CSF prompts than under naive prompts.

Table 3

Attribution survives adversarial concept removal.

Even after UCE removes animal-related concepts, the correct lineage remains dominant. This suggests the fingerprint is distributed across semantics rather than tied to one brittle trigger.

Suspect Model	Flux-Base	Kandinsky-Base	SD1.5-Base	SD2.1-Base	SD3-Medium-Base	SDXL-Base
Adversarial Concept Removal (9 animal probes)
Flux-LoRA	0.714	0.143	0.143	0.143	0.143	0.286
Flux-Turbo-Alpha	0.857	0.143	0.143	0.143	0.143	0.143
Kandinsky-Naruto	0.143	0.857	0.143	0.143	0.143	0.143
Kandinsky-Pokemon-LoRA	0.143	0.857	0.143	0.143	0.143	0.143
SD1.5-1.2-Base	0.143	0.143	0.857	0.143	0.143	0.143
SD1.5-1.4-Base	0.143	0.143	0.857	0.143	0.143	0.143
SD1.5-Animal-Erase	0.143	0.143	0.857	0.143	0.143	0.143
SD1.5-DreamShaper	0.143	0.143	0.714	0.143	0.286	0.143
SD2.1-DPO	0.143	0.143	0.143	0.857	0.143	0.143
SD2.1-LAION-Art	0.143	0.143	0.143	0.857	0.143	0.143
SD3-Reality-Mix	0.286	0.143	0.143	0.143	0.714	0.143
SD3-VAE-Anime	0.143	0.143	0.143	0.143	0.857	0.143
SDXL-DPO	0.143	0.143	0.143	0.143	0.143	0.857
SDXL-Lightning-4Step	0.143	0.143	0.143	0.286	0.143	0.714

Adversarial concept removal uses 9 animal probes. The correct source family remains dominant across all evaluated suspects.

Paper PDF

BibTeX

@inproceedings{lee2026csf,
  title={CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models},
  author={Lee, Junhoo and Koo, Mijin and Kwak, Nojun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}