Raw LLM — Corpus Dashboard

Look up by comment ID

Random samples — click to inspect

G You realize it’s just pattern matching what people say about AI right? LLMs don’… ytc_UgxJPuf7T… G AI already cost many people their once secured jobs in 2025 , & yes , we really … ytc_UgySLPiu0… G I dont care if people use a.i. my issue is when people say there a "a.i artists"… ytc_Ugz-goop0… G It would be cool if A.I. was used to give us 1 month fo vacation for every 1yr o… ytc_Ugxlkub9T… G a.i. is a vibration which came from a frequency. an established frequency. the r… ytc_Ugwshv-0o… G All of the poor people will be left on Earth while the billionaires will be hang… rdc_nlw5ej3 G There are many unpopular and unhealthy jobs that should be replaced by AI asap. … ytc_UgwQJcKek… G Is AI going to build more data centers for itself and maintain those for itself … ytc_UgxA_dDjr…

Comment

Fine then I'll talk. 1: The title has nothing to do with the paper. This is not a quote, doesn't take into account what the paper says about the various improvements of the model, etc. 2: The quote used isn't in full. To quote: >Figure 4: Code generation. (a) Overall performance drifts. For GPT-4, the percentage of generations that are directly executable dropped from 52.0% in March to 10.0% in June. The drop was also large for GPT-3.5 (from 22.0% to 2.0%). GPT-4’s verbosity, measured by number of characters in the generations, also increased by 20%. (b) An example query and the corresponding responses. In March, both GPT-4 and GPT-3.5 followed the user instruction (“the code only”) and thus produced directly executable generation. **In June, however, they added extra triple quotes before and after the code snippet, rendering the code not executable.** Which means that by the paper's own admission, the problem is not the code given but that their test doesn't work. &#x200B; For the prime numbers, the problem was fixed in march notably because their prompt didn't work which means they didn't manage to test what they were trying to do. Quote: > Figure 2: Solving math problems. (a): monitored accuracy, verbosity (unit: character), and answer overlap of GPT-4 and GPT-3.5 between March and June 2023. Overall, a large performance drifts existed for both services. (b) an example query and corresponding responses over time. GPT-4 followed the chain-of-thought instruction to obtain the right answer in March, but ignored it in June with the wrong answer. GPT-3.5 always followed the chain-of-thought, but it insisted on generating a wrong answer (\[No\]) first in March. This issue was largely fixed in June. > >\[...\] This interesting phenomenon indicates that the same prompting approach, even these widely adopted such as chain-of-thought, could lead to substantially different performance due to LLM drifts. &#x200B; The "sensitive question

reddit AI Harm Incident 1689753378.0 ♥ 106

Coding Result

Dimension	Value
Responsibility	none
Reasoning	deontological
Policy	none
Emotion	outrage
Coded at	2026-04-25T08:33:43.502452

Raw LLM Response

[{"id":"rdc_jsm5wzy","responsibility":"company","reasoning":"deontological","policy":"none","emotion":"outrage"},
{"id":"rdc_jsl8ta1","responsibility":"company","reasoning":"consequentialist","policy":"none","emotion":"outrage"},
{"id":"rdc_jsl0p6a","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"indifference"},
{"id":"rdc_jskabl2","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"indifference"},
{"id":"rdc_jskaeh0","responsibility":"none","reasoning":"deontological","policy":"none","emotion":"outrage"}]

Raw LLM Responses