Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
One of the big material problems we are facing already with ai data centers is t…
ytc_Ugxt9mag3…
G
I will absolutely never be interested in AI doing most kinds of work. I don't c…
ytc_UgxeBhAZi…
G
OH MY GOD I KNEW IT! as soon as I saw the video i was like- "blud this is ai gen…
ytc_UgzSMx2g8…
G
I'm undecided on this driverless technology. I see the possible benefits and pos…
ytc_UgzPOTowh…
G
Ok, if you are not too put off by ai, another kinda cool thing is capital of con…
ytc_UgxijY0x4…
G
Amazon laid off about 14k people 2 weeks ago to be replaced by ai. A large sum o…
ytc_Ugy0ct2Jl…
G
Great interview! Interesting though that you two seemed somehow surprised at the…
ytc_Ugw-JsBxG…
G
Venezuela is poor despite oil because communism and nationalized oil corps. we’…
ytc_Ugw18rEHl…
Comment
A pattern is developing with many posts explaining degradation of outputs and alignment issues with prompts relative to the LLM and index. A smaller, but still vocal group of ChatGPT users, lament quality of issues with prose, reasoning, and generally more semantic and syntax focused prompts. Yet, I have read very few, if any, examples where the posts compare the pre- and post-outputs after the rollback. That would be most helpful.
Rather than a pure self-inflicted injury, there are other logical causes. First, OpenAI prioritized saving into memory any specific call-outs by users who wanted outputs, prompts, or entire chats to be available for recall or context. Also the option to open all chats for access by an OpenAI model, and this influenced the experience. Second, there are not enough GPUs, and those available are throttled and made available on a prioritized basis, the top of the line are enterprise customers in the public and private sector. And third, which is my personal opinion, OpenAI realized other for-profit companies across the globe focus on reasoning and inference, and the optimal approach is RNN and neurosymbolic reasoning. This approach, may explain the change in infrastructure to provide what they can now, while they build for the future.
Until there are comparisons on a timeline of the same prompt, the same model, and settings, with different outputs, the experiences are anecdotal even if true, and may not be defining the problem accurately. So, any "fix" is likely not solving for the root cause. If an event can't be measured, its conjecture. The benchmarks for testing LLMs for hallucination propensity are there, but testing for hallucinations on the application or prompt layer, is not as mature. When that capability is ubiquitous, model performance for a specific domain will be instructive on defining the problem, exploring solutions, and improving the user experience.
reddit
AI Harm Incident
1747016690.0
♥ 3
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | unclear |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[
{"id":"rdc_mrv267f","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"rdc_mrut4mz","responsibility":"company","reasoning":"unclear","policy":"unclear","emotion":"mixed"},
{"id":"rdc_mru7bs2","responsibility":"company","reasoning":"consequentialist","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrum80h","responsibility":"unclear","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_mrvvwd5","responsibility":"company","reasoning":"deontological","policy":"unclear","emotion":"outrage"}
]