Who watches the watchers? LLM on LLM evaluations

—

While using LLMs to judge LLM outputs might seem like the fox guarding the henhouse, turns out it works pretty well (and scales better than humans).

​Who watches the watchers? LLM on LLM evaluations