Don't use OpenAI's outdated tools. Also, don't rely on prompt engineering to force the output to conform. Instead, use a local LLM and something like jsonformer or parserllm which can provably output well-formed/parseable text.

[–] lledrtx@lemmy.world 8 points 10 months ago (1 children)

Agree this is better but neither of them actually seem "provable" though?

[–] Corbin@programming.dev 1 points 10 months ago

I'll be informal to boost your intuition. You know how a parser can reject invalid inputs? Parsers can be generated from grammars, so we can think of the grammars themselves as rejecting invalid inputs too. When we use a grammar for generation, every generated output will be a valid input when parsed, because the grammar can't build any invalid sentences (by definition!)

For example, suppose we want to generate a JSON object. The grammar for JSON objects starts with an opening curly brace "{". This means that every parser which accepts JSON objects (and rejects everything else) must start by accepting "{". So, our generator must start by emitting a "{" as well. Since our language-modeling generators work over probability distributions, this can be accomplished by setting the probability of every token which doesn't start with "{" to zero.