Cyber Monday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam AIF-C01 Topic 4 Question 1 Discussion

Actual exam question for Amazon's AIF-C01 exam
Question #: 1
Topic #: 4
[All AIF-C01 Questions]

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.

Which model evaluation strategy meets these requirements?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

Alease
3 months ago
RMSE? Really? That's more for measuring numerical accuracy, not text quality. I don't think that's what the company is looking for here.
upvoted 0 times
...
Colton
3 months ago
I think F1 score could also be useful in evaluating the accuracy of the solution, as it considers both precision and recall.
upvoted 0 times
...
Ma
3 months ago
I'm not convinced BLEU is the best option. Shouldn't we also consider ROUGE, which is better for evaluating text summarization? Hmm, decisions, decisions.
upvoted 0 times
Bernardine
2 months ago
Good idea! Using both evaluation strategies will give us a more well-rounded assessment of the solution's accuracy.
upvoted 0 times
...
Amos
2 months ago
That's true, BLEU does focus on translation accuracy. Maybe we can use both BLEU and ROUGE for a comprehensive evaluation.
upvoted 0 times
...
Edda
2 months ago
But BLEU is specifically designed for translation tasks, so it might be more appropriate in this case.
upvoted 0 times
...
Kirby
2 months ago
I think we should consider ROUGE as well, it's better for text summarization.
upvoted 0 times
...
...
Marvel
3 months ago
I'm not sure, but I think C) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) could also be a good option for evaluating text generation.
upvoted 0 times
...
Margurite
3 months ago
BLEU seems like the obvious choice here. It's designed specifically for evaluating machine translation, which is exactly what this company is trying to do.
upvoted 0 times
Xochitl
2 months ago
Yes, BLEU is widely used in the field for assessing the quality of generated text.
upvoted 0 times
...
Leontine
2 months ago
I agree, BLEU is the best choice for evaluating machine translation.
upvoted 0 times
...
...
Ashley
3 months ago
I agree with Dierdre, BLEU is commonly used for evaluating machine translation.
upvoted 0 times
...
Dierdre
3 months ago
I think the best model evaluation strategy for this scenario is A) Bilingual Evaluation Understudy (BLEU).
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77