DiffusionPerturber#

class nrtk.impls.perturb_image.generative.DiffusionPerturber(*, model_name: str = 'timbrooks/instruct-pix2pix', prompt: str = 'do not change the image', seed: int | None = None, is_static: bool = False, num_inference_steps: int = 50, text_guidance_scale: float = 8.0, image_guidance_scale: float = 2.0, device: str | None = None)#

Diffusion-based implementation of the PerturbImage interface for prompt-guided perturbations.

This class uses diffusion models (specifically the Instruct Pix2Pix model) to generate realistic perturbations on input images based on text prompts. The perturber can apply various effects and transformations guided by natural language descriptions.

Args:
model_name: Name of the pre-trained diffusion model from Hugging Face.

Default is “timbrooks/instruct-pix2pix”.

prompt: Text prompt describing the desired perturbation or transformation. seed: Random seed for reproducible results. Defaults to None for non-deterministic behavior. is_static: If True, resets RNG after each call for consistent results. num_inference_steps: Number of denoising steps. Default is 50. text_guidance_scale: Guidance scale for text prompt. Default is 8.0. image_guidance_scale: Guidance scale for image conditioning. Default is 2.0. device: Device for computation, e.g., “cpu” or “cuda”. If None, selects

CUDA if available, otherwise CPU. Default is None.

Note:

The model is loaded lazily on first use and cached for subsequent operations. Images are automatically resized to be compatible with the diffusion model. Images will be resized to a minimum dimension of 256 pixels and dimensions divisible by 8 for optimal diffusion model performance. Device selection is automatic: CUDA is used if available, otherwise CPU.

Methods

from_config

Instantiate a new instance of this class given the configuration JSON-compliant dictionary encapsulating initialization arguments.

get_config

Return the current configuration of the DiffusionPerturber.

get_default_config

Generate and return a default configuration dictionary for this class.

get_impls

Discover plugins, skipping any entrypoints that fail to load.

get_type_string

Returns the fully qualified type string of the PerturbImage class or its subclass.

is_usable

Check whether this class is available for use.

perturb

Generate a prompt-guided perturbed image using diffusion models.

__init__(*, model_name: str = 'timbrooks/instruct-pix2pix', prompt: str = 'do not change the image', seed: int | None = None, is_static: bool = False, num_inference_steps: int = 50, text_guidance_scale: float = 8.0, image_guidance_scale: float = 2.0, device: str | None = None) None#

Initialize the DiffusionPerturber with configuration parameters.

Args:
model_name:

Name of the pre-trained diffusion model. Default is “timbrooks/instruct-pix2pix”.

prompt:

Text prompt describing the desired perturbation. Examples include “add rain to the image”, “make it foggy”, “add snow”, “darken the scene”, etc. To apply a no-op, use “do not change the image”. Default is “do not change the image”.

seed:

Random seed for reproducible results. Defaults to None for non-deterministic behavior.

is_static:

If True and seed is provided, resets RNG after each perturb call for consistent results across multiple calls (useful for video frame processing).

num_inference_steps:

Number of denoising steps. Default is 50.

text_guidance_scale:

Guidance scale for text prompt. Default is 8.0.

image_guidance_scale:

Guidance scale for image conditioning. Default is 2.0.

device:

Device for computation, e.g., “cpu” or “cuda”. If None, selects CUDA if available, otherwise CPU. Default is None.

get_config() dict[str, Any]#

Return the current configuration of the DiffusionPerturber.

perturb(*, image: ndarray[Any, Any], boxes: Iterable[tuple[AxisAlignedBoundingBox, dict[Hashable, float]]] | None = None, **kwargs: Any) tuple[ndarray[Any, Any], Iterable[tuple[AxisAlignedBoundingBox, dict[Hashable, float]]] | None]#

Generate a prompt-guided perturbed image using diffusion models.

If the prompt is “do not change the image”, this method will perform a no-op and return the original image and bounding boxes.

Args:
image: Input image as a numpy array. PIL will handle format validation.

Common supported formats: (H, W) grayscale, (H, W, 3) RGB, (H, W, 4) RGBA. Input is automatically converted to RGB for processing.

boxes: Optional iterable of tuples containing AxisAlignedBoundingBox objects

and their corresponding detection confidence dictionaries.

kwargs: Additional perturbation keyword arguments (currently unused).

Returns:

A tuple containing: - Perturbed RGB image as uint8 numpy array (H, W, 3) at diffusion model resolution - Updated bounding boxes (currently returned unchanged)

Raises:
ValueError:

If the input image cannot be converted to PIL format.

RuntimeError:

If the diffusion model fails to load or process the image.