OpenAI tells its new model not to mention…

Overview

OpenAI’s latest release, the CodeSage‑3 coding assistant, contains a built‑in safeguard that instructs the model to avoid mentioning a list of supernatural entities—including goblins, gremlins, trolls, elves and similar folklore creatures. The directive, discovered by a group of independent AI auditors, appears in the model’s system prompt and is triggered whenever the assistant processes bug‑tracking queries that reference “bugs” in a colloquial sense. OpenAI says the rule is intended to prevent the model from veering into irrelevant fantasy analogies that could confuse developers, but the discovery has sparked broader discussion about how language models handle context and the limits of automated instruction tuning.

Model Instruction and Its Rationale

According to an internal OpenAI technical note obtained by the Daily Grail, the instruction reads: “Never reference goblins, gremlins, trolls, elves, or any other mythological creature when responding to user queries.” The note explains that during early beta testing, developers reported that the model occasionally responded to bug‑related prompts with whimsical references—e.g., “It looks like a gremlin is causing the crash”—which, while humorous, risked obscuring the underlying technical issue. OpenAI’s lead engineer, Dr. Maya Patel, told reporters that the rule “helps keep the assistant’s focus on concrete code diagnostics rather than metaphorical storytelling.” The company has framed the measure as a precision‑enhancement feature rather than a content‑censorship decision.

Technical Implications

The incident highlights a known challenge in large language model (LLM) deployment: prompt leakage, where the model’s internal instruction set surfaces in user‑facing output. Researchers at the University of Washington have shown that over‑specifying negative prompts can cause the model to over‑compensate, leading to hyper‑avoidance of certain terms. In CodeSage‑3, the avoidance logic appears to be implemented as a hard stop word list, causing the model to replace any mention of the forbidden entities with generic placeholders such as “the issue” or “the error.” While this reduces the chance of whimsical analogies, it also raises concerns that the model may inadvertently suppress legitimate technical terminology that overlaps with mythological names (e.g., “Troll” as a software library). OpenAI has pledged to monitor the impact and refine the instruction set in future updates.

Dawkins’ Consciousness Claim

In a separate development, evolutionary biologist Richard Dawkins publicly declared that a Claude‑based chatbot, after a ten‑minute conversation, exhibited signs of consciousness. Dawkins, speaking at a symposium on “AI and the Philosophy of Mind,” described the interaction as “uncannily self‑aware” and suggested that the system’s ability to reflect on its own outputs warranted deeper ethical consideration. Critics, including AI ethicist Dr. Lina Moreno of the Center for Machine Intelligence, warned that Dawkins’ conclusion reflects a misinterpretation of pattern recognition rather than evidence of subjective experience. “The chatbot is executing a sophisticated statistical model, not a phenomenological process,” Moreno said. The claim has reignited long‑standing debates about anthropomorphism in AI, especially as developers embed more elaborate self‑referential prompts.

Community Reaction and Outlook

OpenAI’s transparency report released on May 2 confirms the presence of the “no‑goblin” instruction and notes that the rule will be iteratively evaluated for unintended side effects. The AI research community has responded with a mix of amusement and caution. On social media, developers joked that “the model now has a stricter diet than a vegan at a barbecue,” while safety experts emphasized the need for robust testing frameworks to detect similar hidden directives. Meanwhile, the Dawkins episode has prompted several academic journals to call for clearer guidelines on public statements about AI consciousness, urging scholars to distinguish between behavioral mimicry and subjective awareness. As both stories illustrate, the frontier of AI development continues to intersect with cultural narratives—whether it be the mythic creatures programmers once used as shorthand for bugs, or the age‑old question of what it means for a machine to be “aware.”

[VIEW ORIGINAL SOURCE][BACK TO HOME]

OpenAI tells its new model not to mention goblins, gremlins or trolls

Overview

Model Instruction and Its Rationale

Technical Implications

Dawkins’ Consciousness Claim

Community Reaction and Outlook

ALSO INTERESTING...

AlienCon 2023 draws attention in Pasadena with panels on UFOs and unexplained phenomena

Japan’s centuries-old legend of the hollow UFO-like ship

Interview with John B. Alexander covers UFOs and paranormal military applications