TECHCRUNCH.COM
Why DeepSeeks new AI model thinks its ChatGPT
Earlier this week, DeepSeek, a well-funded Chinese AI lab, released an open AI model that beats many rivals on popular benchmarks. The model, DeepSeek V3, is large but efficient, handling text-based tasks like coding and writing essays with ease.It also seems to think its ChatGPT.Posts on X and TechCrunchs own tests show that DeepSeek V3 identifies itself as ChatGPT, OpenAIs AI-powered chatbot platform. Asked to elaborate, DeepSeek V3 insists that it is a version of OpenAIs GPT-4 model released in June 2023.The delusions run deep. If you ask DeepSeek V3 a question about DeepSeeks API, itll give you instructions on how to use OpenAIs API. DeepSeek V3 even tells some of the same jokes as GPT-4 down to the punchlines.So whats going on? Models like ChatGPT and DeepSeek V3 are statistical systems. Trained on billions of examples, they learn patterns in those examples to make predictions like how to whom in an email typically precedes it may concern.DeepSeek hasnt revealed much about the source of DeepSeek V3s training data. But theres no shortage of public datasets containing text generated by GPT-4 via ChatGPT. If DeepSeek V3 was trained on these, the model mightve memorized some of GPT-4s outputs and is now regurgitating them verbatim.Obviously, the model is seeing raw responses from ChatGPT at some point, but its not clear where that is, Mike Cook, a research fellow at Kings College London specializing in AI, told TechCrunch. It could be accidental but unfortunately, we have seen instances of people directly training their models on the outputs of other models to try and piggyback off their knowledge.Cook noted that the practice of training models on outputs from rival AI systems can be very bad for model quality, because it can lead to hallucinations and misleading answers like the above. Like taking a photocopy of a photocopy, we lose more and more information and connection to reality, Cook said.It might also be against those systems terms of service.OpenAIs terms prohibit users of its products, including ChatGPT customers, from using outputs to develop models that compete with OpenAIs own.OpenAI and DeepSeek didnt immediately respond to requests for comment. However, OpenAI CEO Sam Altman posted what appeared to be a dig at DeepSeek and other competitors on X Friday afternoon.It is (relatively) easy to copy something that you know works, Altman wrote. It is extremely hard to do something new, risky, and difficult when you dont know if it will work.Granted, DeepSeek V3 is far from the first model to misidentify itself. Googles Gemini and others sometimes claim to be competing models. For example, prompted in Mandarin, Gemini says that its Chinese company Baidus Wenxinyiyan chatbot.And thats because the web, which is where AI companies source the bulk of their training data, is becoming littered with AI slop. Content farms are using AI to create clickbait. Bots are flooding Reddit and X. By one estimate, 90% of the web could be AI-generated by 2026.This contamination, if you will, has made it quite difficult to thoroughly filter AI outputs from training datasets. Its certainly possible that DeepSeek trained DeepSeek V3 directly on ChatGPT-generated text. Google was once accused of doing the same, after all.Heidy Khlaaf, engineering director at consulting firm Trail of Bits, said the cost savings from distilling an existing models knowledge can be attractive to developers, regardless of the risks.Even with internet data now brimming with AI outputs, other models that would accidentally train on ChatGPT or GPT-4 outputs would not necessarily demonstrate outputs reminiscent of OpenAI customized messages, Khlaaf said. If it is the case thatDeepSeekcarried out distillation partially using OpenAI models, it would not be surprising.More likely, however, is that a lot of ChatGPT/GPT-4 data made its way into the DeepSeek V3 training set. That means the model cant be trusted to self-identify, for one. But what is more concerning is the possibility that DeepSeek V3, by uncritically absorbing and iterating on GPT-4s outputs, could exacerbate some of models biases and flaws.
0 Comentários 0 Compartilhamentos 47 Visualizações