๐ŸƒTime for Your Exploration!

How to Choose Reference Answer

Have you noticed that in both development and deployment stages, the quality of the reference answer is critical? In real-world cases, the source documents for your AI systems may be frequently updated. How do you plan to select reliable and up-to-date reference answer to ensure the quality of auto evaluation?

Here, Lady H. is intentionally not sharing the details, as they vary case by case. Time for you to explore! ๐Ÿ˜‰

Auto Evaluation Confidence Score

With the evaluation outputs generated by the auto evaluation, some people may want to know how confident each score is. However, when using a single LLM to produce confidence scores, the results often tend to be overly high and biased.

One solution Lady H. applied was to use multiple LLMs trained on different data sources (such as OpenAI's LLM, Google's LLM, Mistral AI's LLM, etc.) to generate evaluation scores. The confidence score is then calculated based on the consistency among these scores.

Itโ€™s important to note that the LLMs must be trained on diverse data sourcesโ€”otherwise, bias is more likely to persist.

๐ŸŒป Click to see Lady H.'s confidence score calculation >>

Now question to you, do you have better solutions on this?

Last updated