Methodology — Corpus Dashboard

Corpus Summary

Stage	Count
Raw comments collected	4.3K
After cleaning & de-noising	4.2K
Relevant to AI value alignment	607
LLM coded	607

AI Safety & Risk

236

AI Policy & Regulation

131

AI Research & Models

Workplace & Jobs

General AI Discourse

AI Products & Tools

AI Ethics & Trust

Collection Pipeline

Post selection

AI-related LinkedIn posts sampled across topic categories (AI ethics, product launches, workplace AI, AI policy, AI hype/critique)

Compliant export

Posts + comment threads captured via manual export / research-programme access — no ToS-violating scraping

Consolidation

CSV → SQLite via pipeline.py load-posts / load-comments; deduplication by content hash

Cleaning (pipeline.py clean)

Unicode normalisation, strip URLs and emoji-only bodies, drop pure congratulation/promo comments, word-count tagging

Relevance filter (pipeline.py filter)

Keyword + regex rule: AI term ∩ value/alignment term, min 20 chars

LLM coding (linkedin_value_coding.py)

Llama-3.3-70B (IONOS API), T=0 — VA-4S chain-of-thought, 4 value-alignment dimensions per comment

Data Access & Ethics Note

LinkedIn provides no public API for posts or comments, and automated scraping violates the platform's Terms of Service. The corpus is therefore assembled from compliant sources only: manual capture of public posts, participant-donated exports, and (where granted) LinkedIn research-programme access. Author names are retained solely for deduplication and are excluded from all published analyses; comments are analysed at the aggregate level.

VA-4S Coding Schema

Each comment is coded on four value-alignment dimensions using a zero-temperature Llama-3.3-70B call with a 4-step chain-of-thought (VA-4S). One comment per call, with the parent post supplied as context. Values are validated against the controlled vocabulary below; the emotion vocabulary is identical to the folk corpus to allow cross-study comparison.

Dimension	Valid Values
value_primary + value_secondary	safety · transparency · fairness · privacy · accountability · honesty · human_autonomy · beneficence · dignity · sustainability · economic_equity · none · unclear
target	individual_users · workers · organisations · society · humanity · vulnerable_groups · none · unclear
stance	demanding · optimistic · skeptical · critical · mixed · unclear
emotion	approval · fear · outrage · indifference · resignation · mixed · unclear

Relevance Filter Criteria

Comments pass pipeline.py filter if they contain at least one term from each of the AI and value/alignment keyword sets.

Rule	Description
AI keywords	ai, artificial intelligence, machine learning, llm, genai, chatgpt, claude, gemini, copilot, automation, agentic…
Value/alignment keywords	align, value, ethic, trust, responsib-, transparen-, fair, bias, privacy, safe, honest, autonom-, dignity, sustainab-, job, equit-, guardrail…
Min length	≥ 20 characters after strip
Excluded	Congratulation/engagement-bait comments ("Great post!", "CFBR"), URLs-only, emoji-only bodies

LLM System Prompt (excerpt)

You are a philosophical discourse analyst coding LinkedIn
comments about AI for a dissertation study on value alignment —
which human values people want AI systems to have, embody, or
be aligned with.

STEP 1 — VALUE IDENTIFICATION: which human value does the
         speaker want AI to have or respect?
STEP 2 — ALIGNMENT TARGET: aligned with whom, or for whose
         benefit?
STEP 3 — ALIGNMENT STANCE: demanding action, optimistic,
         skeptical, or critical of current systems?
STEP 4 — EMOTIONAL REGISTER.

RULES:
- Use ONLY the listed vocabulary values.
- Code what the COMMENT expresses, not the post.
- Return ONLY the JSON object, no markdown fences.