Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
The name of "Artificial Intelligence" will need to change. We all know that. I…
ytc_Ugzmbxrfq…
G
We need to put the brakes on AI development NOW! It’s already getting out of con…
ytc_UgyNvQFFg…
G
Great, we’re teaching ai to be racist too.
Well at least we know it’s n…
ytc_Ugz-K9lCa…
G
Me: i thought it was a good idea to make a ai chat for my oc..
It wasn't....…
ytc_UgyIz05aY…
G
Relation on Genesis: People ar biting the forbidden fruit (apple brand) with AI …
ytc_Ugw0qDB2F…
G
Ai can't be in deep web because if it was than it would crime monitor all this d…
ytr_UgyXTKgLY…
G
The only occupation I can think of that cannot be replaced by AI are dentist, ps…
ytc_UgzX7ldpT…
G
I was thinking you would have realized you can write the email by yourself and n…
rdc_n0hadhf
Comment
The thing where they trained GPT-4o on code with vulnerabilities was actually reassuring to Eliezer Yudkowsky.
In order to know what good behavior looks like, the model also needs to know what bad behavior looks like. Insecure code gets punished in the same way as hatespeech, so when you then make the model produce insecure code, the easiest way for the optimizer to achieve that is to simply make the model evil. The reassuring part was that this meant that behavior was tied to values pretty much across the board if changing it in one area can flip its behavior fully, indicating higher robustness to the process of RLHF than previously thought.
It's really not all that surprising. Though I think the implications aren't all that meaningful apart from it being surprisingly easy to mess up parts of a model ones data had absolutely nothing to do with.
Anyhow, it's less "revealing the models true self" than "making the model care about the exact opposite of what it did originally".
youtube
AI Moral Status
2025-12-12T21:5…
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | consequentialist |
| Policy | none |
| Emotion | approval |
| Coded at | 2026-04-27T06:24:53.388235 |
Raw LLM Response
[{"id":"ytc_UgwoPeMsVfJVfD235KZ4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"approval"},
{"id":"ytc_UgzNgiTXKTnsd9KAIXl4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"none","emotion":"fear"},
{"id":"ytc_UgwILvn9vSF1VnlIrMl4AaABAg","responsibility":"distributed","reasoning":"consequentialist","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwUF9z1CW4NnDWTr5J4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"liability","emotion":"fear"},
{"id":"ytc_UgzUI84MwRB5WxUznB94AaABAg","responsibility":"company","reasoning":"virtue","policy":"none","emotion":"mixed"},
{"id":"ytc_Ugz6rAfqZWNYf9BjA7h4AaABAg","responsibility":"company","reasoning":"unclear","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwT_4ubTRVoQOykPBx4AaABAg","responsibility":"distributed","reasoning":"mixed","policy":"none","emotion":"resignation"},
{"id":"ytc_UgxSY4WVINPbp-ZQjEF4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgyC2u9XjF6TYZxJNk14AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"ban","emotion":"resignation"},
{"id":"ytc_Ugxr1DWydj_B4gaXQmJ4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"fear"}]