The once celebrated role of the prompt engineer, the specialist who could "whisper the right combination of magic words" to a Large Language Model (LLM) to achieve specific results, has waned as LLMs have become "smarter and better understanding intent". However, a core computational challenge remains, as highlighted by a video from IBM Technology: the output of an LLM is "not predictable". LLMs are not "deterministic functions" like most elements in computing; they are "probabilistic", meaning "each token is sampled from a distribution condition[ed] on everything that came before". This variability means that making minor adjustments—such as changing the wording slightly, adding an example, or altering the temperature setting—can produce a "different response". While this non-deterministic nature might be "fine" in chat, it can become "a bit of a bug factory" when implemented in production software.
For instance, if an LLM is tasked with structuring free-form bug reports into precise JSON, requiring specific fields like a summary string, a severity string limited to 'low', 'medium', or 'high', and a list of steps, issues arise when the model deviates. Even when instructed to act as a "triage assistant [and] return JSON with this format", the LLM might sometimes fail to return only the JSON, perhaps wrapping it in conversational prose like, "Sure, here's the reformatted report". Alternatively, the model might "drift off schema," renaming a required key, such as changing summary to synopsis. When downstream software expects "precise JSON" in a "precise format" and encounters these variances, "that's when things start to break".
To robustly incorporate LLM output into applications, prompt engineering has evolved into rigorous software development, requiring three critical elements. First, establishing a contract, which involves deciding the "shape of the output up front," including which "Keys and Enums to use". Second, defining a control loop to "validate every response against the contract". If validation fails, this loop must automatically trigger a retry, perhaps using "tighter instructions or a constrained decode". Third, ensuring observability through tracing allows developers to "see exactly which prompts produced what," ensuring that "changes don't ship unless the numbers say they're safe".

Related article - Uphorial Shopify
Two key tools help execute this rigorous approach. Lang Chain is an "open-source framework for building LLM apps" structured as a "pipeline of composable steps". It is considered a "code first pipeline with runnable" that are wired together, defining what happens before and after the model call. For the bug report example, the user text is fed into a prompt template runnable which combines the consistent instructions ("output JSON only. Here's the shape...") with the variable user input. The output, or "candidate JSON," from the chat model runnable (the LLM) is then immediately checked by a validate runnable. If validation fails, the response enters a "fail path" to a retry or repair runnable, which can send new instructions or make "a small fix like strip out the extra pros". If all else fails, a "fallback path" might be attempted, such as trying a "stricter model". Throughout this process, traces and metrics are maintained, ensuring the application eventually receives "clean JSON".
The second approach is the Prompt Declaration Language (PDL), a "declarative spec for LLM workflows" based on the idea that most LLM interactions are "really about producing data". PDL is spec first; the user defines the data shape and production steps within a single YAML file, which is executed by a PDL interpreter. In this single file, the prompt, the desired contract, and the control loop all "live together". PDL runs top-down, assembling a background context for model blocks, and explicitly supports type declarations for both input and output on model steps. This allows the interpreter to "fail on shape violations" immediately. Control flow is explicit, featuring loops and conditionals. Tracing and a live explorer enable inspection of the exact context sent to the model.
As observed by Martin Keen, tools like Lang Chain and PDL are "really becoming the grownup toolbox". Whether by "wiring those runnable together" (Lang Chain) or executing a single, interpreted YAML specification where the prompt, types, and control flow reside together (PDL), these methodologies are successfully turning the ambiguity of "prompt whispering into real software engineering".