OpenAI's new confession system teaches models to be honest about bad behaviors

5 months ago 7

OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a confession. Since large language models are often trained to produce the response that seems to be desired, they can become increasingly likely to provide sycophancy or state hallucinations with total confidence. The new training model tries to encourage a secondary response from the model about what it did to arrive at the main answer it provides. Confessions are only judged on honesty, as opposed to the multiple factors that are used to judge main replies, such as helpfulness, accuracy and compliance. The technical writeup is available here<...

Source: https://www.engadget.com/ai/openais-new-confession-system-teaches-models-to-be-honest-about-bad-behaviors-210553482.html?src=rss

Read Entire Article

Disclaimer of liability !!!

NEWS.SP1.RO is an automatic news aggregator. In each article, taken over by NEWS.SP1.RO with maximum 500 characters from the original article, the source name and hyperlink to the source are specified.

The acquisition of information aims to promote and facilitate access to information, in compliance with intellectual property rights, in accordance with the terms and conditions of the source.

If you are the owner of the content and do not wish to publish your materials, please contact us by email at [email protected] and the content will be deleted as soon as possible.