Deep integration module for a Sauna Builder platform. Handles Sketch-to-Render pipelines and complex post-processing.
Sauna Builder → Render Engine → Multi-Mode Editor
This system acts as the 'Intelligence Layer' for a dedicated Sauna Builder. It bridges the gap between a rigid CAD sketch and a marketing-ready visualization. The core challenge is not just generating an image, but generating the correct image—preserving the dimensions, layout, and material choices defined in the builder.
Beyond one-shot generation, the system features a robust 'Pro Editor Module'—a 7-mode post-processing suite allowing users to continuously refine the image. From replacing textures (e.g., swapping alder for cedar via reference image) to integrating new 3D objects, every action is guided by spatial inputs (arrows/pointers) and semantic intent.
The architecture prioritizes control over creativity. It doesn't just 'dream' a sauna; it operates directly on top of the builder’s structural constraints, respecting specific material choices and spatial constraints.
Builder inputs are messy ('dirty prompts'). The system uses a GPT-5 family model to analyze the sketch and material list, deciding what to keep, what to discard, and how to structure the technical prompt for the target editor model.
A unified state machine (General, Texture Swap, Background Replace, Style, 2D Insert, 3D Insert, Multi-Pointer). Each mode routes to a specialized AI pipeline, treating the generated image as a mutable canvas.
To insert new 3D objects (lamps, heaters) into a 2D render, the system opens a client-side Three.js 'Photobooth'. It renders the 3D asset with the correct camera perspective to generate a precision mask and composite pass for the AI.
The backend abstracts the model choice, selecting the best engine for the specific type of edit requested. Style transfers, inpainting, and object insertions are routed to different optimized pipelines (Flux, Qwen, etc.) seamlessly.
The pipeline ingests raw user text + builder JSON and normalizes it into a strict technical prompt, stripping marketing fluff and enforcing material terminology.
Users place spatial anchors (arrows) on the image. These coordinates are mapped to the model's attention mechanism (or mask generation), allowing for precise 'Change THIS to THAT' instructions.
When a user edits a specific area (e.g., a window), the system isolates the context to prevent 'bleed' or unwanted changes to surrounding geometry.
Centralized management for uploading 2D objects, textures, and 3D models (GLB/GLTF). These assets become immediately available in the Pro Editor for users to drag-and-drop.
Automated checks for texture tiling and 3D model scale to ensure assets integrate correctly into the render pipeline without manual adjustment.