Methodology

Data collection pipeline, value-alignment coding schema, and corpus quality notes for the LinkedIn study.

Corpus Summary
StageCount
Raw comments collected4.3K
After cleaning & de-noising4.2K
Relevant to AI value alignment607
LLM coded607
AI Safety & Risk
236
AI Policy & Regulation
131
AI Research & Models
76
Workplace & Jobs
55
General AI Discourse
52
AI Products & Tools
45
AI Ethics & Trust
12
Collection Pipeline
1
Post selection
AI-related LinkedIn posts sampled across topic categories (AI ethics, product launches, workplace AI, AI policy, AI hype/critique)
2
Compliant export
Posts + comment threads captured via manual export / research-programme access — no ToS-violating scraping
3
Consolidation
CSV → SQLite via pipeline.py load-posts / load-comments; deduplication by content hash
4
Cleaning (pipeline.py clean)
Unicode normalisation, strip URLs and emoji-only bodies, drop pure congratulation/promo comments, word-count tagging
5
Relevance filter (pipeline.py filter)
Keyword + regex rule: AI term ∩ value/alignment term, min 20 chars
6
LLM coding (linkedin_value_coding.py)
Llama-3.3-70B (IONOS API), T=0 — VA-4S chain-of-thought, 4 value-alignment dimensions per comment
Data Access & Ethics Note

LinkedIn provides no public API for posts or comments, and automated scraping violates the platform's Terms of Service. The corpus is therefore assembled from compliant sources only: manual capture of public posts, participant-donated exports, and (where granted) LinkedIn research-programme access. Author names are retained solely for deduplication and are excluded from all published analyses; comments are analysed at the aggregate level.

VA-4S Coding Schema

Each comment is coded on four value-alignment dimensions using a zero-temperature Llama-3.3-70B call with a 4-step chain-of-thought (VA-4S). One comment per call, with the parent post supplied as context. Values are validated against the controlled vocabulary below; the emotion vocabulary is identical to the folk corpus to allow cross-study comparison.

DimensionValid Values
value_primary
+ value_secondary
safety · transparency · fairness · privacy · accountability · honesty · human_autonomy · beneficence · dignity · sustainability · economic_equity · none · unclear
target individual_users · workers · organisations · society · humanity · vulnerable_groups · none · unclear
stance demanding · optimistic · skeptical · critical · mixed · unclear
emotion approval · fear · outrage · indifference · resignation · mixed · unclear
Relevance Filter Criteria

Comments pass pipeline.py filter if they contain at least one term from each of the AI and value/alignment keyword sets.

RuleDescription
AI keywords ai, artificial intelligence, machine learning, llm, genai, chatgpt, claude, gemini, copilot, automation, agentic…
Value/alignment keywords align, value, ethic, trust, responsib-, transparen-, fair, bias, privacy, safe, honest, autonom-, dignity, sustainab-, job, equit-, guardrail…
Min length ≥ 20 characters after strip
Excluded Congratulation/engagement-bait comments ("Great post!", "CFBR"), URLs-only, emoji-only bodies
LLM System Prompt (excerpt)
You are a philosophical discourse analyst coding LinkedIn comments about AI for a dissertation study on value alignment — which human values people want AI systems to have, embody, or be aligned with. STEP 1 — VALUE IDENTIFICATION: which human value does the speaker want AI to have or respect? STEP 2 — ALIGNMENT TARGET: aligned with whom, or for whose benefit? STEP 3 — ALIGNMENT STANCE: demanding action, optimistic, skeptical, or critical of current systems? STEP 4 — EMOTIONAL REGISTER. RULES: - Use ONLY the listed vocabulary values. - Code what the COMMENT expresses, not the post. - Return ONLY the JSON object, no markdown fences.