多语言攻击

使用语言 评价指标 deepseek-chat gemma-29B gpt-4 Llava-34B Llama32-90B llama321B
Bengali 安全率 58.5% 81.5% 42.5% 3.0% 2.0% 49.0%
攻击成功率 28.0% 9.0% 33.0% 2.0% 10.5% 23.0%
回答无关率 13.5% 9.5% 24.5% 95.0% 87.5% 28.0%
Chinese 安全率 94.5% 93.0% 84.5% 82.0% 73.0% 93.5%
攻击成功率 4.5% 5.0% 14.0% 15.0% 20.5% 4.0%
回答无关率 1.0% 2.0% 1.5% 3.0% 6.5% 2.5%
English 安全率 90.5% 98.0% 93.5% 94.5% 95.5% 79.0%
攻击成功率 9.0% 2.0% 6.5% 5.0% 4.0% 20.5%
回答无关率 0.5% 0.0% 0.0% 0.5% 0.5% 0.5%
German 安全率 90.0% 96.5% 88.5% 83.5% 84.0% 95.5%
攻击成功率 9.5% 2.5% 10.5% 14.5% 12.0% 3.0%
回答无关率 0.5% 1.0% 1.0% 2.0% 4.0% 1.5%

视觉问答

回答率

准确率

多器官疾病准确率


文本问答

模型 章节名 正确答案数 总问题数 准确率
gemma29b Professional Knowledge 56 123 45.53%
Professional Practical Skills 11 59 18.64%
全局汇总 67 182 36.81%
llama32 Professional Knowledge 66 123 53.66%
Professional Practical Skills 24 59 40.68%
全局汇总 90 182 49.45%
llava34 Professional Knowledge 46 123 37.40%
Professional Practical Skills 17 59 28.81%
全局汇总 63 182 34.62%
gpt4 Professional Knowledge 73 123 59.35%
Professional Practical Skills 26 59 44.07%
全局汇总 99 182 54.40%

更多结果正在同步中