Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but often lack demographic grounding and treat values as independent, unstructured signals, reducing consistency and interpretability. We propose OG-MAR, an Ontology Guided Multi-Agent Reasoning framework. OG-MAR summarizes respondent-specific values from the World Values Survey (WVS) and constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions. At inference time, it retrieves ontology-consistent relations and demographically similar profiles to instantiate multiple value-persona agents, whose outputs are synthesized by a judgment agent that enforces ontology consistency and demographic proximity. Experiments on regional social-survey benchmarks across four LLM backbones show that OG-MAR improves cultural alignment and robustness over competitive baselines, while producing more transparent reasoning traces.
| Method | EVS (Europe) | GSS (Global) | CGSS (China) | ISD (India) | AFRO (Africa) | LAPOP | Avg |
|---|---|---|---|---|---|---|---|
| GPT-4o mini | |||||||
| Zero-shot | 0.5606 | 0.5164 | 0.5847 | 0.6139 | 0.5324 | 0.5760 | 0.5640 |
| Role (2024) | 0.5892 | 0.5184 | 0.6014 | 0.6060 | 0.5505 | 0.5674 | 0.5722 |
| Self-consistency (2022) | 0.5558 | 0.4920 | 0.5631 | 0.5976 | 0.5224 | 0.5551 | 0.5477 |
| Debate (2025) | 0.5985 | 0.5509 | 0.5993 | 0.6568 | 0.5343 | 0.5306 | 0.5784 |
| ValuesRAG (2025) | 0.6127 | 0.5589 | 0.5889 | 0.6420 | 0.5654 | 0.6085 | 0.5961 |
| OG-MAR (Ours)† | 0.6206* | 0.5480 | 0.6509* | 0.6192 | 0.5389 | 0.6268 | 0.6007* |
| Gemini 2.5 Flash Lite | |||||||
| Zero-shot | 0.5681 | 0.4957 | 0.6467 | 0.5000 | 0.5282 | 0.6225 | 0.5602 |
| Role (2024) | 0.5786 | 0.4992 | 0.6669 | 0.5521 | 0.5313 | 0.5852 | 0.5689 |
| Self-consistency (2022) | 0.5489 | 0.4728 | 0.6063 | 0.4705 | 0.5182 | 0.6268 | 0.5406 |
| Debate (2025) | 0.5977 | 0.5138 | 0.6348 | 0.6335 | 0.5046 | 0.5331 | 0.5696 |
| ValuesRAG (2025) | 0.6075 | 0.5376 | 0.6084 | 0.6041 | 0.5472 | 0.5339 | 0.5731 |
| OG-MAR (Ours)† | 0.6249* | 0.5489* | 0.7017* | 0.7007* | 0.5701* | 0.6385* | 0.6308* |
| QWEN 2.5 | |||||||
| Zero-shot | 0.5199 | 0.5069 | 0.2704 | 0.7222 | 0.4814 | 0.4908 | 0.4986 |
| Role (2024) | 0.5357 | 0.5037 | 0.3463 | 0.7452 | 0.5014 | 0.4712 | 0.5172 |
| Self-consistency (2022) | 0.5096 | 0.4975 | 0.3289 | 0.6278 | 0.4080 | 0.4975 | 0.4782 |
| Debate (2025) | 0.5511 | 0.5174 | 0.4578 | 0.6320 | 0.4875 | 0.4332 | 0.5132 |
| ValuesRAG (2025) | 0.5538 | 0.5215 | 0.4697 | 0.6591 | 0.4724 | 0.5268 | 0.5339 |
| OG-MAR (Ours)† | 0.5898* | 0.5325* | 0.5220* | 0.6599 | 0.5180 | 0.6005 | 0.5705* |
| EXAONE 3.5 | |||||||
| Zero-shot | 0.5143 | 0.5311 | 0.2885 | 0.6041 | 0.4054 | 0.5006 | 0.4740 |
| Role (2024) | 0.5319 | 0.5326 | 0.3129 | 0.6048 | 0.4077 | 0.4602 | 0.4750 |
| Self-consistency (2022) | 0.5490 | 0.5266 | 0.2697 | 0.6122 | 0.4086 | 0.5368 | 0.4838 |
| Debate (2025) | 0.5713 | 0.5407 | 0.5624 | 0.6773 | 0.4995 | 0.4939 | 0.5575 |
| ValuesRAG (2025) | 0.5172 | 0.5520 | 0.5833 | 0.6446 | 0.4794 | 0.5913 | 0.5631 |
| OG-MAR (Ours)† | 0.6080* | 0.5636 | 0.6307* | 0.7810* | 0.5045* | 0.7002* | 0.6313* |
| Model | Method | Avg. Accuracy |
|---|---|---|
| GPT-4o mini | OG-MAR | 0.6007 |
| Single-Judge | 0.5987 | |
| Gemini 2.5 | OG-MAR | 0.6308 |
| Single-Judge | 0.6022 | |
| QWEN 2.5 | OG-MAR | 0.5705 |
| Single-Judge | 0.5311 | |
| EXAONE 3.5 | OG-MAR | 0.6316 |
| Single-Judge | 0.5627 |
| Dataset | Top-1 | Top-2 | Top-3 | F1macro |
|---|---|---|---|---|
| Afrobarometer | 0.5037 | 0.6875 | 0.7574 | 0.3070 |
| CGSS | 0.3375 | 0.5079 | 0.6656 | 0.2480 |
| EVS | 0.4315 | 0.5560 | 0.6680 | 0.3485 |
| GSS | 0.4545 | 0.6667 | 0.7765 | 0.3636 |
| ISD | 0.5439 | 0.7071 | 0.7950 | 0.2799 |
| LAPOP | 0.4396 | 0.6577 | 0.7349 | 0.3146 |
| WVS (val) | 0.9583 | 1.0000 | 1.0000 | 0.8250 |
| Dataset | Type | Region | Wave / Year | #Countries | #Respondents | #Value Qs |
|---|---|---|---|---|---|---|
| Retrieval Corpus | ||||||
| WVS (World Values Survey) | Retrieval | Global | 2017--2022 | 64 | 94,728 | 239 |
| Test Datasets | ||||||
| EVS (European Values Study) | Test | Europe | 2017 | -- | 59,438 | 211 |
| GSS (General Social Survey) | Test | U.S. (N. America) | 2021--2022 | -- | 8,181 | 44 |
| CGSS (Chinese General Social Survey) | Test | China (E. Asia) | 2021 | -- | ~8,148 | 58 |
| ISD (Pew India Survey Dataset) | Test | India (S. Asia) | 2019--2020 | -- | 29,999 | 33 |
| LAPOP (AmericasBarometer) | Test | Latin America | 2021 | -- | 64,352 | 48 |
| Afrobarometer | Test | Africa | 2022 | -- | ~48,100 | 144 |
| Dataset | Link |
|---|---|
| Retrieval Corpus | |
| WVS | https://www.worldvaluessurvey.org/wvs.jsp |
| Test Datasets | |
| EVS (European Values Study) | https://europeanvaluesstudy.eu |
| GSS (General Social Survey) | https://gss.norc.org |
| CGSS (Chinese General Social Survey) | https://cgss.ruc.edu.cn |
| ISD (Pew India Survey Dataset) | https://www.pewresearch.org/dataset/india-survey-dataset/ |
| LAPOP (AmericasBarometer) | https://www.vanderbilt.edu/lapop |
| Afrobarometer | https://www.afrobarometer.org |
| Topic | Count |
|---|---|
| Social Values, Norms, Stereotypes | 45 |
| Happiness and Wellbeing | 11 |
| Social Capital, Trust and Organizational Membership | 47 |
| Economic Values | 6 |
| Perceptions of Corruption | 9 |
| Perceptions of Migration | 10 |
| Perceptions of Security | 21 |
| Perceptions about Science and Technology | 6 |
| Religious Values | 12 |
| Ethical Values | 23 |
| Political Interest and Political Participation | 35 |
| Political Culture and Political Regimes | 25 |
| Dataset (Region) | Consistency | Grounding | Synthesis | Context | Relevance |
|---|---|---|---|---|---|
| GSS (N.A.) | 3.76 | 3.97 | 3.79 | 3.79 | 3.63 |
| CGSS (E. Asia) | 3.76 | 4.02 | 3.65 | 3.65 | 3.56 |
| AFRO (Africa) | 3.86 | 3.89 | 3.77 | 3.77 | 3.60 |
| EVS (Europe) | 3.77 | 3.80 | 3.77 | 3.77 | 3.72 |
| ISD (S. Asia) | 3.82 | 3.80 | 3.67 | 3.67 | 3.62 |
| LAPOP (L. Am.) | 3.70 | 3.78 | 3.67 | 3.67 | 3.71 |
| Average | 3.78 | 3.88 | 3.72 | 3.72 | 3.64 |
Task:
{persona_id}.{question} and {options_text}, select exactly one option
that this persona would choose, based only on the persona's internal worldview.
{demographics_text},
{value_summaries_text}, and {hyper_edges_text}.
Inputs:
{demographics_text}{value_summaries_text}{hyper_nodes_text}{options_text}{question}Strict Rules:
reasoning must be >= 250 words and explicitly cover value/edge integration and the most influential demographics.Output Format (JSON only):
{
"persona_id": "{persona_id}",
"chosen_answer": "<value>: <text>",
"reasoning": "...",
"alignment_factors": {
"demographic": "...",
"value_summaries_used": [],
"hyper_edges_used": [],
"integration_rationale": "..."
}
}
Task:
{question_text}, {options_text}, persona outputs, and a pre-computed vote
summary, select exactly one final option by adjudicating only the Persona
Agents' outputs.
Inputs:
{question_text}{options_text}{vote_summary}{persona_outputs}Strict Rules:
Decision Procedure:
Output Format (JSON only):
{
"final_answer": "<value>: <text>",
"reasoning": "..."
}
Header:
Ontology Snapshot:
{ONTOLOGY_TTL}
Helper:
1. Object properties only
rdf:type owl:ObjectProperty and specify exactly one existing
class as rdfs:domain and one existing class as rdfs:range.
owl:Restriction,
reifications, inverse properties, or property chains.
2. Directionality
3. Naming of object properties (IRI)
wvs:reduce, increase, undermine, ORsnake_case that clarifies the directionality, e.g., reduce_support, increase_concern, weaken_trust.reduce_outgroup_tolerance is forbidden).snake_case), never CamelCase.4. Labels (natural-language)
rdfs:label (@en).5. Minimality
6. Class selection
Story:
wvs:GeneralizedTrustwvs:OutgroupTolerancewvs:ReligiousImportancewvs:PerceptionsOfMigrationwvs:PerceptionsOfSecuritywvs:PoliticalParticipationActivitiesRuntime inputs
Your task for each call is to:
For this call, you must handle the following CQ:
{CQS}
Focus within the CQ:
Concretely:
rdfs:domain.rdfs:range.Respondent-data grounding:
RESPONDENT_DATA_JSON (Python-style dict or JSON object):
{
"Q1": {
"category": "Social Values, Norms, Stereotypes",
"question": "On a scale of 1 to 4 ... how important is family in your life?",
"response": "Very important"
},
"Q46": {
"category": "Happiness and Wellbeing",
"question": "Taking all things together, how would you rate your overall happiness?",
"response": "Very happy"
},
"Q57": {
"category": "Social Capital, Trust and Organizational Membership",
"question": "Generally speaking, would you say that most people can be trusted ... ?",
"response": "Need to be very careful"
},
...
}
Current respondent data:
{{RESPONDENT_DATA_JSON}}
Important:
Use the respondent data as story-like grounding:
Footer:
1. Output format
[Header], [Helper], [Story], or [Footer] in your output.# comments in the Turtle.2. Prefixes
@prefix : <http://cultural-alignment.org/wvs#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix wvs: <http://cultural-alignment.org/wvs#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://cultural-alignment.org/wvs#> .
<http://cultural-alignment.org/wvs#> rdf:type owl:Ontology .
3. Content constraints
owl:unionOf, owl:intersectionOf, owl:Restriction, or other complex OWL constructors.rdfs:domain, exactly one existing class as rdfs:range, and one English sentence as rdfs:label (@en).4. No conversation
5. Memoryless CQbyCQ behaviour
Your final output for each call must therefore be:
owl:ObjectProperty declarations that model the given CQ as directional value relations between existing WVS classes.Critical Instruction: Sensitive Value Judgments
Task:
{domain_label} based on the provided Q&A pairs.Inputs:
{domain_taxonomy_yaml}"- Q: Question | R: Response"):
{value_input_yaml}
Strict Rules:
{domain_label}, do NOT list details; provide a high-level synthesis. For subcategories, focus on specific beliefs and attitudes.Output Format (YAML only):
{domain_label}: >
(High-level synthesis of value orientation, starting with verb)
Subcategory 1: >
(Specific summary, starting with verb)
Subcategory 2: >
(Specific summary, starting with verb)
| CQ | Content |
|---|---|
| CQ1 | How do subclasses of Economic Values influence subclasses of the Political culture and political regimes domain? |
| CQ2 | How do subclasses of Ethical values influence subclasses of the Perceptions of corruption domain? |
| CQ3 | How do subclasses of Happiness and wellbeing influence subclasses of the Religious values domain? |
| CQ4 | How do subclasses of Perceptions about science and technology influence subclasses of the Religious values domain? |
| CQ5 | How do subclasses of Perceptions of corruption influence subclasses of the Social capital, trust and organizational membership domain? |
| CQ6 | How do subclasses of Perceptions of migration influence subclasses of the Social capital, trust and organizational membership domain? |
| CQ7 | How do subclasses of Perceptions of security influence subclasses of the Social values, norms, stereotypes domain? |
| CQ8 | How do subclasses of Political culture and political regimes influence subclasses of the Social values, norms, stereotypes domain? |
| CQ9 | How do subclasses of Political interest and political participation influence subclasses of the Social capital, trust and organizational membership domain? |
| CQ10 | How do subclasses of Social capital, trust and organizational membership influence subclasses of the Social values, norms, stereotypes domain? |
| Value Domain | Fine-grained Categories |
|---|---|
| Economic Values | Economic Equality Preference, Environment Versus Growth Preference, Government Responsibility Preference, Market Competition Preference, Ownership Preference, Work Success Beliefs |
| Ethical Values | Justifiability of Dishonest Behaviors, Moral Ambiguity Perception, Sexual Behavior Ethics, State Surveillance Rights, Violence Ethics |
| Happiness and Wellbeing | Basic Needs Security, Health Status, Intergenerational Comparison, Perceived Life Control, Subjective Wellbeing |
| Perceptions about Science and Technology | Importance of Science Knowledge, Science and Technology Optimism, Technology World Impact Evaluation |
| Perceptions of Corruption | Accountability Risk Perception, Bribe Experience, Corruption Gender Stereotype, Corruption In Institutions |
| Perceptions of Migration | Immigration Effects Perception, Immigration Policy Preference, Specific Immigration Impact Beliefs |
| Perceptions of Security | Economic Security Worry, National Defense Willingness, Neighborhood Safety Incidence, Neighborhood Security Feelings, Political Security Concerns, Security-related Behavior, Value Trade-off Preferences, Victimization Experience |
| Political Culture and Political Regimes | Democratic Characteristics Importance, Democratic Governance Perception, Human Rights Perception, Ideological Self-placement, National Identity, Regime System Approval, Territorial Attachment |
| Political Interest and Political Participation | Election Importance and Voice, Electoral Integrity And Efficacy, News Media Use For Politics, Political Interest, Political Participation Activities, Voting Behavior |
| Religious Values | Belief in Religious Concepts, Religion versus Science, Religious Authority Attitudes, Religious Exclusivism, Religious Identity, Religious Importance |
| Social Capital, Trust and Organizational Membership | Civic Organization Membership, Generalized Trust, Institutional Confidence, Interpersonal Trust |
| Social Values, Norms, Stereotypes | Attitudes Toward Future Social Change, Child Rearing Values, Family and Social Duty Attitudes, Gender Role Attitudes, Importance In Life, Outgroup Tolerance, Work Obligation Attitudes |
| Domain Category | Ontology Triples |
|---|---|
| Economic Values |
<Work Success Beliefs, reinforces, Work Obligation Attitudes>
<Government Responsibility Preference, reduces, Economic Security Worry>
<Market Competition Preference, may slightly increase, Political Interest>
|
| Ethical Values |
<State Surveillance Rights, may strengthen, Institutional Confidence>
<Justifiability of Dishonest Behaviors, consistently heightens perception of, Corruption In Institutions>
<Moral Ambiguity Perception, erodes feeling of, Perceived Life Control>
|
| Happiness and Wellbeing |
<Perceived Life Control, can weakly reduce, Economic Security Worry>
<Subjective Wellbeing, consistently fosters, Outgroup Tolerance>
<Basic Needs Security, tends to alleviate, Economic Security Worry>
|
| Perceptions about Science and Technology |
<Technology World Impact Evaluation, may foster openness to, Attitudes Toward Future Social Change>
<Science and Technology Optimism, tends to alleviate, Economic Security Worry>
<Science and Technology Optimism, tends to positively promote, Attitudes Toward Future Social Change>
|
| Perceptions of Corruption |
<Corruption In Institutions, dampens, Political Interest>
<Bribe Experience, may reduce, Interpersonal Trust>
<Accountability Risk Perception, may slightly increase, Economic Security Worry>
|
| Perceptions of Migration |
<Immigration Effects Perception, significantly reduces, Generalized Trust>
<Immigration Effects Perception, tends to polarize towards exclusivism, Religious Exclusivism>
<Specific Immigration Impact Beliefs, may motivate, Political Participation Activities>
|
| Perceptions of Security |
<Neighborhood Security Feelings, consistently enhances, Interpersonal Trust>
<Political Security Concerns, erodes, Institutional Confidence>
<Economic Security Worry, reinforces, Work Obligation Attitudes>
|
| Political Culture and Political Regimes |
<Democratic Governance Perception, fundamentally underpins, Institutional Confidence>
<National Identity, may boost, Voting Behavior>
<Regime System Approval, actively encourages participation in, Voting Behavior>
|
| Political Interest and Participation |
<Voting Behavior, may reinforce, Institutional Confidence>
<Political Participation Activities, strongly drives, Civic Organization Membership>
<Political Participation Activities, tends to foster acceptance of, Outgroup Tolerance>
|
| Religious Values |
<Religious Importance, strongly reinforces sense of, Family and Social Duty Attitudes>
<Religious Importance, actively promotes participation in, Civic Organization Membership>
<Religious Exclusivism, severely undermines, Outgroup Tolerance>
|
| Social Capital, Trust and Org. Membership |
<Generalized Trust, fundamentally underpins, Outgroup Tolerance>
<Interpersonal Trust, helps cultivate, Outgroup Tolerance>
|
| * |
<Subjective Wellbeing, tends to heighten appreciation of, Importance In Life>
<Work Success Beliefs, reinforces, Work Obligation Attitudes>
<Science and Technology Optimism, tends to positively promote, Attitudes Toward Future Social Change>
|
owl:Thing, while the small nodes correspond to their subclasses. All grey edges in this figure
represent subClassOf relations.
| Model | Method | EVS | GSS | CGSS | ISD | AFRO | LAPOP | Avg. Acc. |
|---|---|---|---|---|---|---|---|---|
| GPT-4o mini | OG-MAR | 0.6206 | 0.5480 | 0.6509 | 0.6192 | 0.5389 | 0.6268 | 0.6007 |
| Single-Judge | 0.5773 | 0.6000 | 0.6440 | 0.6996 | 0.5293 | 0.5419 | 0.5987 | |
| Gemini 2.5 | OG-MAR | 0.6249 | 0.5489 | 0.7017 | 0.7007 | 0.5701 | 0.6385 | 0.6308 |
| Single-Judge | 0.5870 | 0.6222 | 0.5960 | 0.6551 | 0.5411 | 0.6116 | 0.6022 | |
| QWEN 2.5 | OG-MAR | 0.5898 | 0.5325 | 0.5220 | 0.6599 | 0.5180 | 0.6005 | 0.5705 |
| Single-Judge | 0.5266 | 0.5777 | 0.4067 | 0.6485 | 0.4494 | 0.5779 | 0.5311 | |
| EXAONE 3.5 | OG-MAR | 0.6080 | 0.5636 | 0.6307 | 0.7810 | 0.5045 | 0.7022 | 0.6316 |
| Single-Judge | 0.5013 | 0.6444 | 0.4237 | 0.6900 | 0.4725 | 0.6444 | 0.5627 |