AI-Orchestrated Data Structuring Engine · Vision-Based Extraction · Dynamic Schema
Internal Tool for Technical Product Catalog Reconstruction. An AI-orchestrated data structuring engine originally built for a 3D construction configurator. The system solves the 'Manufacturer Catalog' problem: transforming chaotic, inconsistent PDF layouts full of mixed tables and scattered specs into clean, structured business objects.
Unlike traditional OCR or fixed-schema parsers, it employs a 'Meta-Schema' architecture where the data structure is defined dynamically (manually or by AI) without code changes. The system then orchestrates a multi-stage pipeline—grouping pages by product, denoising images, and using Vision interaction to extract strict, type-safe data.
Product structure is not hardcoded in the application. Product parsing rules are stored as JSON ('product_schema'). Pydantic models are generated at runtime to validate extraction, allowing the system to switch from processing 'Ovens' to 'Cladding Panels' instantly without a deploy.
One AI agent generates the system prompt for another. The 'Architect' agent analyzes the active schema and compiles a strict, schema-compliant instruction set for the 'Worker' agent. This ensures the extraction model operates within clearly defined structural constraints.
Images in catalogs introduce semantic noise ('...looks modern...'). The system extracts all images and replaces them with tokens ('<image_1>') in the text stream. This keeps the model focused on technical specifications while preserving image references.
Catalogs are processed in empirically determined 6-page chunks (sliding window). This balance prevents context overload in the LLM while maintaining enough 'product boundary' context to capture items spanning multiple pages.
The model receives high-res page images, not just text. This allows it to correctly parse complex, broken, or multi-column tables where text-only extraction loses the row/column relationship.
Before extraction, a lightweight analysis pass groups pages into 'Product Units'. This ensures that a product spanning pages 4–5 is treated as a single extraction context, rather than splitting data across arbitrary page logic.
The system enforces a 'Strict Schema or Reject' policy. If the vision model returns a structure that doesn't validate against the dynamic Pydantic schema, the extraction result is rejected. No 'best guess' data enters the database.