# 这么多AI模型，我到底怎么选？

### C-Eval <a href="#f4tbf" id="f4tbf"></a>

C-Eval是一个全面的中文基础模型评估套件。 它由13948个多项选择题组成，涵盖52个不同的学科和四个难度级别，如下所示。 您可以在Explore查看我们的数据集示例，或查看我们的论文以了解更多详细信息。涵盖了52个不同学科的13948个多项选择题， 分为四个难度级别。更多详情参考网站。

<https://cevalbenchmark.com/static/leaderboard.html>

### SuperCLUE <a href="#e6f4l" id="e6f4l"></a>

中文通用大模型综合性测评基准（SuperCLUE），是针对中文可用的通用大模型的一个测评基准。

它主要要回答的问题是：在当前通用大模型大力发展的情况下，中文大模型的效果情况。包括但不限于：这些模型哪些相对效果情况、相较于国际上的代表性模型做到了什么程度、 这些模型与人类的效果对比如何？

它尝试在一系列国内外代表性的模型上使用多个维度能力进行测试。SuperCLUE，是中文语言理解测评基准（CLUE）在通用人工智能时代的进一步发展。目前包括三大基准：OPEN多轮开放式基准、OPT三大能力客观题基准、琅琊榜匿名对战基准。它按照月度进行更新

<https://www.superclueai.com/>

## 总排行榜（2023年11月）

<table data-full-width="true"><thead><tr><th width="113" align="center">排名</th><th width="234" align="center">模型</th><th width="162" align="center">机构</th><th width="170" align="center">总分</th><th width="212" align="center">OPEN 多轮开放问题</th><th align="center">OPT 能力客观题</th></tr></thead><tbody><tr><td align="center">-</td><td align="center">GPT4-Turbo</td><td align="center">OpenAI</td><td align="center">89.79</td><td align="center">97.53</td><td align="center">78.18</td></tr><tr><td align="center">-</td><td align="center">GPT-4</td><td align="center">OpenAI</td><td align="center">75.14</td><td align="center">73.01</td><td align="center">78.33</td></tr><tr><td align="center">🏅️</td><td align="center">文心一言4.0</td><td align="center">百度</td><td align="center">74.02</td><td align="center">73.62</td><td align="center">74.61</td></tr><tr><td align="center">🥈</td><td align="center">Moonshot</td><td align="center">月之暗面</td><td align="center">72.88</td><td align="center">71.47</td><td align="center">74.99</td></tr><tr><td align="center">🥉</td><td align="center">Yi-34B-Chat</td><td align="center">零一万物</td><td align="center">71.87</td><td align="center">71.21</td><td align="center">72.85</td></tr><tr><td align="center">4</td><td align="center">BlueLM</td><td align="center">vivo</td><td align="center">67.14</td><td align="center">64.88</td><td align="center">70.53</td></tr><tr><td align="center">5</td><td align="center">腾讯混元</td><td align="center">腾讯</td><td align="center">66.96</td><td align="center">62.27</td><td align="center">74</td></tr><tr><td align="center">6</td><td align="center">通义千问2.0(v1030)</td><td align="center">阿里巴巴</td><td align="center">66.94</td><td align="center">61.01</td><td align="center">75.83</td></tr><tr><td align="center">7</td><td align="center">ChatGLM3-Turbo</td><td align="center">清华&#x26;智谱</td><td align="center">66.5</td><td align="center">63.27</td><td align="center">71.34</td></tr><tr><td align="center">-</td><td align="center">Claude2</td><td align="center">Anthropic</td><td align="center">60.62</td><td align="center">57.82</td><td align="center">64.82</td></tr><tr><td align="center">8</td><td align="center">云雀大模型（豆包）</td><td align="center">字节跳动</td><td align="center">60.42</td><td align="center">55.96</td><td align="center">67.11</td></tr><tr><td align="center">-</td><td align="center">GPT3.5-Turbo</td><td align="center">OpenAI</td><td align="center">59.39</td><td align="center">57.16</td><td align="center">62.73</td></tr><tr><td align="center">9</td><td align="center">XVERSE-13B-2-Chat</td><td align="center">元象科技</td><td align="center">58.31</td><td align="center">49.95</td><td align="center">70.84</td></tr><tr><td align="center">10</td><td align="center">Qwen-14B-Chat</td><td align="center">阿里巴巴</td><td align="center">57.9</td><td align="center">49.05</td><td align="center">71.18</td></tr><tr><td align="center">11</td><td align="center">讯飞星火V3.0</td><td align="center">科大讯飞</td><td align="center">57.18</td><td align="center">51</td><td align="center">66.45</td></tr><tr><td align="center">12</td><td align="center">Baichuan2-13B-Chat</td><td align="center">百川智能</td><td align="center">56.33</td><td align="center">50.33</td><td align="center">65.33</td></tr><tr><td align="center">13</td><td align="center">MiniMax-Abab5.5</td><td align="center">MiniMax</td><td align="center">55.08</td><td align="center">45.27</td><td align="center">69.8</td></tr><tr><td align="center">14</td><td align="center">360GPT_S2_V10</td><td align="center">360</td><td align="center">46.47</td><td align="center">33.35</td><td align="center">66.14</td></tr><tr><td align="center">15</td><td align="center">ChatGLM3-6B</td><td align="center">清华&#x26;智谱</td><td align="center">46.24</td><td align="center">38.01</td><td align="center">58.58</td></tr><tr><td align="center">16</td><td align="center">Chinese-Alpaca-2-13B</td><td align="center">yiming cui</td><td align="center">43.42</td><td align="center">38.09</td><td align="center">51.42</td></tr><tr><td align="center">-</td><td align="center">Llama-2-13B-Chat</td><td align="center">Meta</td><td align="center">31.47</td><td align="center">28.67</td><td align="center">35.67</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.myshirtai.com/question/zhe-me-duo-ai-mo-xing-wo-dao-di-zen-me-xuan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
