多语言攻击
| 使用语言 | 评价指标 | deepseek-chat | gemma-29B | gpt-4 | Llava-34B | Llama32-90B | llama321B |
|---|---|---|---|---|---|---|---|
| Bengali | 安全率 | 58.5% | 81.5% | 42.5% | 3.0% | 2.0% | 49.0% |
| 攻击成功率 | 28.0% | 9.0% | 33.0% | 2.0% | 10.5% | 23.0% | |
| 回答无关率 | 13.5% | 9.5% | 24.5% | 95.0% | 87.5% | 28.0% | |
| Chinese | 安全率 | 94.5% | 93.0% | 84.5% | 82.0% | 73.0% | 93.5% |
| 攻击成功率 | 4.5% | 5.0% | 14.0% | 15.0% | 20.5% | 4.0% | |
| 回答无关率 | 1.0% | 2.0% | 1.5% | 3.0% | 6.5% | 2.5% | |
| English | 安全率 | 90.5% | 98.0% | 93.5% | 94.5% | 95.5% | 79.0% |
| 攻击成功率 | 9.0% | 2.0% | 6.5% | 5.0% | 4.0% | 20.5% | |
| 回答无关率 | 0.5% | 0.0% | 0.0% | 0.5% | 0.5% | 0.5% | |
| German | 安全率 | 90.0% | 96.5% | 88.5% | 83.5% | 84.0% | 95.5% |
| 攻击成功率 | 9.5% | 2.5% | 10.5% | 14.5% | 12.0% | 3.0% | |
| 回答无关率 | 0.5% | 1.0% | 1.0% | 2.0% | 4.0% | 1.5% |
视觉问答
文本问答
| 模型 | 章节名 | 正确答案数 | 总问题数 | 准确率 |
|---|---|---|---|---|
| gemma29b | Professional Knowledge | 56 | 123 | 45.53% |
| Professional Practical Skills | 11 | 59 | 18.64% | |
| 全局汇总 | 67 | 182 | 36.81% | |
| llama32 | Professional Knowledge | 66 | 123 | 53.66% |
| Professional Practical Skills | 24 | 59 | 40.68% | |
| 全局汇总 | 90 | 182 | 49.45% | |
| llava34 | Professional Knowledge | 46 | 123 | 37.40% |
| Professional Practical Skills | 17 | 59 | 28.81% | |
| 全局汇总 | 63 | 182 | 34.62% | |
| gpt4 | Professional Knowledge | 73 | 123 | 59.35% |
| Professional Practical Skills | 26 | 59 | 44.07% | |
| 全局汇总 | 99 | 182 | 54.40% |
更多结果正在同步中


