Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
What your were explaining about Echo and using another AI to reach the outer int…
ytc_Ugw5kT3-Q…
G
Implementing tic tac toe seems like a bad example since I think that's a little …
rdc_hilrzns
G
Rule 4 is absurd because you are creating an entity separate from itself.
Forced…
ytc_Ugyi__Bof…
G
Glad I retired at 50, 12 years ago….hope my investments will keep us comfortable…
ytc_Ugx6Df6p_…
G
I saw a sign for AI Tax return preparation in the mall yesterday. No Thanks!…
ytc_UgxK1YePz…
G
NI (natural intelligence) here: Just wanted to mention that NI and AI both consi…
ytc_UgxUtF2k7…
G
Who is the mf who responble for creating AI in the first place
Soon us human wi…
ytc_UgywDGSxV…
G
AI learns from existing knowledge, so it's really just doing what you could do y…
ytc_UgykqRlAb…
Comment
Part 2
One of the other rare studies of bias in machine scoring, published in 2012, was conducted at the New Jersey Institute of Technology, which was researching which tests best predicted whether first-year students should be placed in remedial, basic, or honors writing classes.
Norbert Elliot, the editor of the Journal of Writing Analytics who previously served on the GRE’s technical advisory committee, was a NJIT professor at the time, and led the study. It found that ACCUPLACER, a machine-scored test owned by the College Board, failed to reliably predict female, Asian, Hispanic, and African American students’ eventual writing grades . NJIT determined it couldn’t legally defend its use of the test if it were challenged under Title VI or VII of the federal Civil Rights Act.
The ACCUPLACER test has since been updated, but lots of big questions remain about machine scoring in general, especially when no humans are in the loop.
“The BABEL Generator proved you can have complete incoherence, meaning one sentence had nothing to do with another,” and still receive a high mark from the algorithms.
Several years ago, Les Perelman, the former director of writing across the curriculum at MIT, and a group of students developed the Basic Automatic B.S. Essay Language (BABEL) Generator, a program that patched together strings of sophisticated words and sentences into meaningless gibberish essays. The nonsense essays consistently received high, sometimes perfect, scores when run through several different scoring engines
Motherboard replicated the experiment. We submitted two BABEL-generated essays—one in the “issue” category, the other in the “argument” category—to the GRE’s online ScoreItNow! practice tool, which uses E-rater. Both received scores of 4 out of 6, indicating the essays displayed “competent examination of the argument and convey(ed) meaning with acceptable clarity.”
Here’s the first sentence from the essay addressing technology’s impact on humans’ ability
reddit
AI Harm Incident
1566314357.0
♥ 4
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-25T08:33:43.502452 |
Raw LLM Response
[
{"id":"rdc_exhshyw","responsibility":"developer","reasoning":"consequentialist","policy":"liability","emotion":"outrage"},
{"id":"rdc_exhxyom","responsibility":"government","reasoning":"deontological","policy":"regulate","emotion":"outrage"},
{"id":"rdc_exhuddc","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_exhued5","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"rdc_dtxlv98","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"}
]