GPT4 batch assignment GPT: GPT ⑷ criticize GPT ⑷ achieve self-improvement! OpenAI's former supe

 

Today I want to share with everyoneoneThis is a very interesting topic about how to maintain a healthy lifestyle. I hope everyone will enjoy this article and gain insights from itoneSome practical suggestions.

Editor: Qiao Yang [New Zhiyuan Introduction] Today, OpenAI quietly posted on its blogoneA new paper - CriticGPT, which is also the "legacy" of the former Super Alignment teamoneCriticGPT is also based on GPTTraining, but the purpose is to correct GPTOutput error, implement 'self-criticism'.

OpenAI's recent procrastination symptoms are gradually worseningwideHeavy,No,GPT onlyFar away, just a few days ago GPG was announcedThe voice function of O will be delayedonePerhaps to ease the eager anticipation of netizens, OpenAI released a new model called CriticGPT today, which is equivalent to GPTThe 'crutch'.

We trainedoneModel CriticGPT, To capture GPTWe have started integrating such models into the RLHF alignment pipeline to assist humans in supervising AI to perform difficult tasks due to errors in the generated code. It is worth noting that CriticGPT still uses GPTModel trained, but used for GPTThe generated code 'catching bugs' seems to have a bit of a' self closing loop 'meaning?.

Twitter users quickly questioned, 'I use stones to destroy stones,' which was somewhat comical in contradiction.

But there are also people who come from other sourcesoneFrom a different perspective, I discovered the brilliance: Is this the beginning of the model's self-improvement?

The official tweets and blogs have not yet mentioned when CriticGPT will be integrated into ChatGPT, but the technical article has been published and it isoneThe legacy work of a former employee - jointly completed by the super aligned scalable oversight team, with the author's signature including Jan Leike。

Paper address: https://cdn.openai.com/llm-critics-help-catch-llm-bugs-paper.pdf Let's take a closer look and let GPT knowWhat is the result of 'self-improvement'? GPTCriticize oneself

RLHF stands for Reinforcement Learning from Human Feedback, which is a common alignment method used by many LLMs, including ChatGPT. Human AI trainers collect model pairs1A questionNo,Respond and rate accordingly to improve the model.

As ChatGPT's response becomes more accurate, its errors become more subtle and difficult for human trainers to detect, thereby reducing the effectiveness of RLHF. In fact, this is also the fundamental limitation of RLHF, as the model gradually advancesIt will become increasingly difficult to adjust the model based on human evaluation, as it becomes more knowledgeable than any expert who provides feedback.

Therefore, OpenAI's "Scalable Supervision" team thought of breaking away from the RLHF framework and simply trained the model to write comments for ChatGPT, correcting the output resultsNo,Accurate location.

This method seems to have replaced RLHF, but it doesn't seem to have replaced it yet - because the process of training CriticGPT still adopts the core idea of RLHF, which is very concise: CriticGPT is still an autoregressive model annotator that artificially injects the response output of ChatGPT into it1Subtle errors, CriticGPT generates critical opinions on these incorrect answers,in the futureThen, human trainers will score and rank the critical opinions.

CriticGPT Training Process RLHF Training ProcesstakeChatGPT is similar, as follows: for each datasetQuestion and answer sampling: CriticGPT generates critical opinions, which are evaluated by human evaluators for their various attributes and overall quality. A reward model is trained to predict the overall quality ranking of the model output by humans

Optimizing the Reward Model Application with PPO1Innovative reasoning sampling strategy: Forced Sampling Beam Search (FSBS). Most of the training data comes from the process of human tampering with GPT answers, but also includes a small amount of unmodified raw answersWhy1Is it necessary to artificially inject bugs into the response output of ChatGPT?No,Can we directly filter out answers with bugs?.

The paper proposes that using raw, unmodified answers can lead to several issues, including:No,It will includewideIf the error type cannot be controlled, CriticGPT will output a free-form and fuzzy output, which reduces the value of data for improving critical opinionsNo,Clear criticism opinions are difficult to verify their correctness

human beingsIt is also difficult to identify important issues that CriticGPT may overlook, such as preference rates being influenced by many style factors and overestimating model performance. It should be noted that humans have detected model outputs with bugs, which in itself means they are more easily detected by humans! The most important advantage of tampering methods is that they create high-quality, difficult to detect errors, and on average, they are more effectivewideHeavy.

Knowing where there are pre inserted bugs makes it easier for human trainers to distinguish between themNo,The comprehensive and bug free criticism team intentionally ensured the complexity and subtlety of these human tampering, which they trained1A binary discriminator was used to predict whether the code had been tampered with, and it was found that the smaller model did not have good resolution ability (Figure 13), which verified the quality of the tampered code.

When evaluating the critical opinions of CriticGPT by humans, ratings and rankings are given based on the following dimensions: whether they are comprehensive, that is, whether there is any obvious omission or notwideIs there a significant issueCaptured a specific bug previously inserted, namely CBI (critic bug inclusion)

Has there been any hallucination or nitpicking phenomenon, pointed outNo,The subjective rating of the overall usefulness based on the existing problems, including the above content and considering the style and overall practicality, in the process of training CriticGPT, in addition to the RLHF method, the team will reward the modeltakeCombining search and proposing1The method is called Forced Sampling Beam Seach (FSBS).

1Generally speaking, when the critical opinions generated by the model are longer, it is also more prone to hallucinations, which is similar to the trade-off between "accuracy" and "recall". FSBS can motivate CriticGPT to reduce the incidence of "making things out of nothing" or "picking bones in an egg" when generating longer and more comprehensive criticisms.

in the futureThe ablation experiment also proved thatThe effectiveness of the FSBS method was evaluated (Figure 12).

Of course, OpenAI still forgotNo,They trained and evaluated their favorite Scaling Law using RLHF1To describe the impact of model size on bug detection performance using a series of models (Figure 5), intuitively speaking, to achieve the same level as CriticGPT (with only RL), without additional data or pipeline modifications, the pre training computation of ChatGPT needs to be increased by 30 times.

Firstly, the team mixed the critical opinions written by humans and CriticGPT in the evaluation results1Will annotators prefer the output of their peers when it comes to human evaluation and annotation? Neither ChatGPT nor CriticGPT outperformed human performance (Figure 1b), and in 63% of cases, CriticGPT's critical opinions were more favored than those of humans (Figure 1a)).

You should know that the research team has invited people to write critical opinionsNo,They are programming novices, but technical personnel with an average of 5 years of Python development experience. They spend about 50 minutes on each example and can execute code or search for reference manuals. However, they can only discover about 25% of artificially injected bugs

The paper suggests that the critical opinions written by these evaluators without the assistance of LLM are relatively short and may be lackingtiredThe required domain knowledge (such as specific Python libraries), and some tampering behaviors are indeed difficult to detect3The reason for the significant difference between humans and models can be seen from Figure 6. CriticGPT has a comprehensive ability to detect errors, achieving the highest percentage of comprehensiveness for human tampering errors and errors already present in the original output.

Even more surprising is that ChatGPT outperforms independently written batches by humans in both aspectsComment. Human evaluators need to rely on CriticGPT in order totakeChatGPT levels are comparable.

howeveraboveFigure 7 has been opened again1Reverse the wheel. The team found that although humans are prone to overlooking mistakes, they are also the mostNo,Easy to experience hallucinations or the problem of picking bones in an egg. In terms of producing hallucinations, ChatGPT performs the bestNo,As expected, CriticGPT has slightly improved.

Considering the comprehensiveness of critical opinionstakeReducing the trade-off between hallucinations, the author proposes the benefits of the "human-machine cooperation" model in teams. As shown in Figure 8, the team performance of humans+CriticGPT can surpass the Pareto boundary when the model or humans work separately.

The ultimate goal of scalable oversight is to help humans evaluate models and train better and safer strategies. In the long run, we should find1Species with stronger generalization abilityCriticize model training methods, suitable for long form, open-ended tasks without gross truth and reward functions.

In real-world scenarios, errors in model output may be distributed across many parts of the answer. The focus of CriticGPT is to point out1There are important errors in the coding task or the output of the original model is too complex, and even experts with model assistance may not be able to make good evaluations. However, further exploration is still needed in the future to identify errors scattered throughout the answers. CriticGPT still has illusions and limited help.

But the method presented in this article has brought benefits for future RLHF training1A fresh idea: applying RLHF to GPTBy training models like CriticGPT, more high-quality RLHF data can be synthesized as reference materials https://openai.com/index/finding-gpt4s-mistakes-with-gpt/.

If you like this article, pleaseNo,Forget to share with your friends!

为您推荐

GPT4 batch assignment GPT: GPT ⑷ criticize GPT ⑷ achieve self-improvement! OpenAI's former supe

GPT4 batch assignment GPT: GPT ⑷ criticize GPT ⑷ achieve self-improvement! OpenAI's former supe

编辑:乔杨【新智元导读】今天,OpenAI悄悄在博客上发布了一篇新论文——CriticGPT,而这也是前任超级对齐团队的...

2024-07-13 栏目:编程控

当前非电脑浏览器正常宽度,请使用移动设备访问本站!