Raw LLM — Corpus Dashboard

Look up by comment ID

Random samples — click to inspect

G Why did you put humans to such high risk to be extinct.??at least try to warn ag… ytc_Ugzl_WCU9… G I respect Sam for being honest about the dangers of AI and where it could lead. … ytc_UgxOkyjyG… G If you need to pay attention to Full Self Driving, it's not even Half Self Drivi… ytr_UgwkJ--RC… G AI and a bill of rights is the board of peace and the ushering in of the technoc… ytc_Ugx1MSw7c… G AI will be employed by the antichrist to do his evil bidding the image of the be… ytc_UgwsL0GrW… G Why do you think AI will be stupid enough to kill it's creators? why should it b… ytc_UgyWXnr40… G Worked these gigs for a bit, and in as much as I would say the pay was ridiculou… ytc_UgxIhuF3l… G I just use ai as a sort of photoshop. I have a hard time drawing but have so man… ytc_Ugyef9rPx…

Comment

Summarized Article: Here are the key points from the paper "How Is ChatGPT's Behavior Changing over Time?": - The paper evaluates how the behavior of GPT-3.5 and GPT-4 changed between March 2023 and June 2023 versions on 4 tasks: math problems, sensitive questions, code generation, visual reasoning. - For math problems, GPT-4's accuracy dropped massively from 97.6% to 2.4% while GPT-3.5's improved from 7.4% to 86.8%. GPT-4 became much less verbose. - For sensitive questions, GPT-4 answered fewer (21% to 5%) while GPT-3.5 answered more (2% to 8%). Both became more terse in refusing to answer. GPT-4 improved in defending against "jailbreaking" attacks but GPT-3.5 did not. - For code generation, the percentage of directly executable code dropped for both models. Extra non-code text was often added in June versions, making the code not runnable. - For visual reasoning, both models showed marginal 2% accuracy improvements. Over 90% of responses were identical between March and June. - The major conclusion is that the behavior of the "same" GPT-3.5 and GPT-4 models can change substantially within a few months. This highlights the need for continuous monitoring and assessment of LLMs in production use.

reddit AI Harm Incident 1689798552.0 ♥ 2

Coding Result

Dimension	Value
Responsibility	none
Reasoning	unclear
Policy	none
Emotion	indifference
Coded at	2026-04-25T08:33:43.502452

Raw LLM Response

[
{"id":"rdc_jskk6er","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"outrage"},
{"id":"rdc_jsli3y1","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"resignation"},
{"id":"rdc_jslohgf","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"fear"},
{"id":"rdc_jsmf36x","responsibility":"company","reasoning":"deontological","policy":"none","emotion":"outrage"},
{"id":"rdc_jsmzofs","responsibility":"none","reasoning":"unclear","policy":"none","emotion":"indifference"}
]

Raw LLM Responses