Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
If you have a White guy behind the computer facial recognition it is possible th…
ytc_UgxSwlOtf…
G
“Oh, it’s so scary!”
You people wanted this. You accepted it into your homes wit…
ytc_Ugz1DYWBF…
G
Depending on how they generated the art they could have learned the entire proce…
ytc_UgzlhRyXu…
G
I am not so much worried about the AI we have today. I am more worried about th…
ytc_UgzdC-oQb…
G
My name is Christian Vance. Singer/songwriter/musician. My theory: this is the e…
ytc_UgxhPpx0h…
G
Jesus didn’t say: “Love your enemies in the form of people and don’t love your e…
ytc_UgwMOw_Uz…
G
@maturecornnut Once again I'll say it again because you definitely didn't read a…
ytr_UgxPHOQUa…
G
Hey, worked out pretty well for the US. Good luck to ya Russia. Don't forget to …
rdc_d2ximen
Comment
5:13 <Krystal> "And he would keep asking it [for a diagnosis based on the exact same data, and the evaluations would change] You get a B [..] You get a D [..] You get an F"
Yes: this is a core "design feature" of LLM / GPT-based chat tools.
There two inherent problems:
1) if you are asking for summary statistics of raw data - e.g. trend analysis, first and second derivative, etc - you might achieve good-enough results. However, as soon as you step into unbounded "future probabilities" prediction rather than historic analysis, your risk of a poor response increases substantially.
One way to reduce such problems might be to provide a verified set of known data profiles that result in a solid, expert-verified diagnosis that would act as known anchors or markers for your own analysis to be considered against.
2) all that said, you're essentially fighting against foundational design principles. If you attempt to eradicate response variation completely (exact repetition in responses based on a specific prompt and associated inputs), they essentially don't work (they don't produce responses humans find appealing).
Although you can tune "Temperature" - which increases or decreases the variability, randomness, or "creativity" of responses, you can only really adjust this so far before the results at either end of the scale are poor.
This parameter acts as a "weighting" mechanism on the probability distribution of the next predicted token (word or word part). Again, you can tune this a little bit).
youtube
2026-02-10T21:5…
♥ 1
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-27T06:26:44.938723 |
Raw LLM Response
[
{"id":"ytc_Ugy4ZsFeJBrwcIx7kiZ4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_Ugzraf-Jcx6fmEZc1Ad4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgxRQEijIaqAPMS-Dct4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgyZO_QLVDGzHcPAw914AaABAg","responsibility":"distributed","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_Ugw0VgCOin3q1KDQRG94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"unclear","emotion":"resignation"},
{"id":"ytc_Ugzg-7JyTAzWpeuOMNF4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgyXF3aM3c6sKh79EDx4AaABAg","responsibility":"government","reasoning":"deontological","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgxB35mhJyV5uGQxqV94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgxmlsQAeRWPEpbI65V4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_UgxnuHhJTIp0ZhUAhHp4AaABAg","responsibility":"ai_itself","reasoning":"deontological","policy":"ban","emotion":"outrage"}
]