Meta-Analysis Pipeline

C5-C6-C7 System: AI-Assisted Meta-Analysis with Human Oversight

"Machines calculate, researchers decide: A partnership for rigorous meta-analysis"

Step 1: Understanding the C5-C7 System

The meta-analysis pipeline consists of three specialized agents working together:

Meta-Analysis Master

Orchestrator & Decision Authority

Responsibilities:

▸Overall research question interpretation
▸Effect size hierarchy selection
▸Meta-analytic model choice (fixed/random/MASEM)
▸Final authority on all meta-analysis decisions

Trigger Keywords:

meta-analysiseffect size synthesispooled estimateMASEM

Data Integrity Guard

Extraction & Calculation Specialist

Responsibilities:

▸Extract effect sizes from papers (PDFs, tables, text)
▸Calculate Hedges' g with bias correction
▸SD recovery using multiple methods
▸Data completeness validation

Trigger Keywords:

extract effect sizecalculate Hedges gSD recovery

Error Prevention Engine

Validation & Quality Assurance

Responsibilities:

▸4-Gate validation system
▸Statistical anomaly detection
▸Warn about common meta-analysis pitfalls
▸Pre-publication quality checks

Trigger Keywords:

validationerror checkquality assurance

Agent Relationships

┌───────────────────────────────────────────┐
│  C5: Meta-Analysis Master                 │
│  (Orchestrator & Decision Authority)      │
└───────────┬───────────────────────────────┘
            │
            ├─── delegates to ───┐
            │                    │
            ▼                    ▼
┌─────────────────────┐  ┌──────────────────┐
│  C6: Data Integrity │  │  C7: Error       │
│  Guard              │  │  Prevention      │
│  (Extract & Calc)   │  │  Engine          │
└──────────┬──────────┘  └────┬─────────────┘
           │                  │
           └──── validates ◄──┘
                (4-Gate System)

Step 2: Starting a Meta-Analysis

C5 activates automatically when you mention meta-analysis intent:

User Input:

"I want to conduct a meta-analysis on AI tutoring effectiveness"

C5 Clarifying Questions:

Q1:What is your research question? (e.g., "Does AI tutoring improve learning outcomes?")
Q2:What type of effect size do you expect? (Cohen's d, correlation r, odds ratio)
Q3:Do you have studies identified already, or do you need help with systematic search?
Q4:Are you analyzing direct effects only, or mediation/moderation relationships?

🔴 META_ANALYSIS_PROTOCOL

REQUIRED

When: Before data extraction begins

Decision: Approve research question, ES hierarchy, and meta-analytic approach

⚠ C5 will NOT proceed with extraction until you approve the protocol

Step 3: Data Extraction with C6

Once protocol is approved, C6 extracts effect sizes from your studies:

PDF Upload

Upload PDFs of studies

C6 reads tables, text, and figures using OCR + LLM

Manual Entry

Provide study IDs with statistics

Study A: M1=5.2, SD1=1.1, n1=30, M2=4.8, SD2=1.3, n2=28

CSV Import

Upload codebook with extracted data

studyID, author, year, intervention, outcome, n1, M1, SD1, n2, M2, SD2

Hedges' g Calculation

C6 automatically converts all effect sizes to Hedges' g (bias-corrected Cohen's d):

g = d × (1 - 3/(4(n1+n2)-9))

Why Hedges' g:

✓Unbiased estimate for small samples
✓Comparable across studies with different sample sizes
✓Standard metric in education/psychology meta-analyses

SD Recovery Methods (C6 Automatic):

1.SE to SD conversion: SD = SE × √n
2.t-statistic back-calculation: d = t × √(1/n1 + 1/n2)
3.F-statistic to d: d = √(F × (n1+n2)/(n1×n2))
4.p-value approximation (last resort, flagged by C7)

Step 4: 4-Gate Validation with C7

C7 runs a rigorous 4-gate validation system on extracted data:

Extraction Validation

Checks:

▸Are all required fields present? (n, M, SD for each group)
▸Are values within plausible ranges? (SD > 0, n ≥ 2, |g| < 5)
▸Do reported statistics match calculated effect sizes?

Common Errors:

Missing SD → C7 flags for recovery method
Negative SD → Data entry error
Extreme g (|g| > 3) → Verify with original paper

Classification Validation

Checks:

▸Is the study correctly categorized by moderator variables?
▸Does the intervention match the meta-analysis scope?
▸Are outcome measures consistent across studies?

Common Errors:

Intervention mismatch → Exclude or reclassify
Outcome construct drift → Flag for sensitivity analysis

Statistical Validation

Checks:

▸Is heterogeneity (I²) within acceptable range?
▸Are there statistical outliers (Studentized residuals > ±3)?
▸Is publication bias evident (funnel plot asymmetry)?

Common Errors:

High heterogeneity (I² > 75%) → Suggest random-effects or moderator analysis
Outliers detected → Flag specific studies
Publication bias → Recommend trim-and-fill or selection models

Independence Validation

Checks:

▸Are multiple effect sizes from the same sample correctly handled?
▸Is nesting structure (students within classrooms) accounted for?
▸Are dependent effect sizes modeled appropriately?

Common Errors:

Non-independence detected → Suggest averaging or multilevel MA
Clustering ignored → Warning about Type I error inflation

Common Error Patterns Detected

Pattern	Severity	Recommendation
SD missing for >30% of studies	High	Request authors for raw data OR use imputation methods (flagged in report)
All effect sizes positive (no negative effects)	Medium	Check for publication bias using Egger test, trim-and-fill
Extreme heterogeneity (I² > 90%)	High	Do NOT pool. Conduct subgroup analysis or narrative synthesis

Step 5: Orchestration and Results

C5 coordinates the workflow and synthesizes findings:

C5:Define protocol

🔴 META_ANALYSIS_PROTOCOL

C6:Extract data from k studies

C7:Validate (Gates 1-2)

C6:Calculate Hedges g

C7:Validate (Gates 3-4)

C5:Choose meta-analytic model

C5:Generate forest plot, funnel plot

C5:Interpret results

🔴 META_ANALYSIS_RESULTS (RECOMMENDED)

Decision Point: Fixed vs. Random Effects

When I² < 25% (low heterogeneity)

C5 권장사항: Fixed-effects model (assumes single true effect size)

Your Choice: You can override if you expect population heterogeneity

Decision Point: Handling Outliers

When C7 flags 2 studies with extreme g values

C5 권장사항: Run sensitivity analysis (with/without outliers)

Your Choice: You decide whether to exclude permanently or report both analyses

Decision Point: Publication Bias Correction

When funnel plot shows asymmetry (Egger p < .05)

C5 권장사항: Report both unadjusted and trim-and-fill adjusted estimates

Your Choice: You choose which to emphasize in conclusions

🟠 META_ANALYSIS_RESULTS

RECOMMENDED

When: After pooled estimate calculated

Decision: Review and approve interpretation before finalizing manuscript

⚠ No - strongly recommended to review before writing Discussion section

Step 6: Export and Integration

Export results in multiple formats for publication and reproducibility:

Universal Meta-Analysis Codebook

4-layer codebook with AI provenance tracking

▸Layer 1: Identifiers (studyID, author, year, DOI)
▸Layer 2: Statistics (n, M, SD, g, SE, 95% CI)
▸Layer 3: AI Provenance (extraction_method, confidence_score, verification_status)
▸Layer 4: Human Verification (verified_by, verification_date, notes)

Use Case: Gold standard for transparent AI-assisted meta-analysis

R Script (metafor package)

Ready-to-run R code for replication

library(metafor)
res <- rma(yi = hedges_g, vi = variance,
           method = "REML",
           data = codebook)
forest(res)

Use Case: Reproducible analysis for journal submission

Stata .do file

Stata syntax for meta-analysis

metan hedges_g se_g, random label(namevar=study)
metabias hedges_g se_g, egger

Use Case: For researchers using Stata

CSV for CMA/RevMan

Import into Comprehensive Meta-Analysis or RevMan

Fields: studyID, author, year, n1, M1, SD1, n2, M2, SD2

Use Case: GUI-based meta-analysis tools

PRISMA Diagram Generation

C5 can generate PRISMA 2020 flow diagrams showing:

1.Identification: k studies from databases
2.Screening: Excluded studies (with reasons)
3.Eligibility: Full-text assessed
4.Included: Final k studies in meta-analysis

✓ If you started with the I-category pipeline (I0-I3), PRISMA is auto-populated

Code Examples

R (metafor)

library(metafor)

# Load codebook from C6 export
data <- read.csv("diverga_codebook_v2.2.csv")

# Random-effects meta-analysis
res <- rma(yi = hedges_g,
           vi = variance,
           method = "REML",
           data = data)

# Forest plot
forest(res,
       header = "Study",
       xlab = "Hedges' g")

# Funnel plot (publication bias)
funnel(res)
regtest(res)  # Egger test

Python (metaanalysis package)

import pandas as pd
from metaanalysis import MetaAnalysis

# Load codebook
df = pd.read_csv("diverga_codebook_v2.2.csv")

# Initialize meta-analysis
ma = MetaAnalysis(df,
                  effect_size="hedges_g",
                  variance="variance")

# Run random-effects model
results = ma.fit(method="REML")

# Generate forest plot
ma.plot_forest()

# Check heterogeneity
print(f"I²: {results.I2:.1f}%")
print(f"Q: {results.Q:.2f}, p={results.Q_pval:.3f}")

Common Pitfalls

⚠ Unit of Analysis Error

Description:

Treating multiple outcomes from the same study as independent

How C7 Catches This:

Gate 4 flags when same studyID appears multiple times

Solution: Average effect sizes within study OR use robust variance estimation

⚠ Apples and Oranges

Description:

Pooling incompatible outcome constructs (e.g., test scores + self-efficacy)

How C7 Catches This:

Gate 2 flags heterogeneous outcome_type values

Solution: Conduct separate meta-analyses by outcome category

⚠ Garbage In, Garbage Out

Description:

Including low-quality studies with biased effect sizes

How C7 Catches This:

Does NOT catch this automatically (requires domain knowledge)

Solution: You must quality-appraise studies before inclusion (use B2 agent)

⚠ File Drawer Problem

Description:

Missing unpublished studies with null results

How C7 Catches This:

Gate 3 flags funnel plot asymmetry

Solution: Search grey literature, contact authors, report bias-adjusted estimates

Ready to Start Your Meta-Analysis?

The C5-C6-C7 system handles the complexity while you maintain full control.

Browse Meta-Analysis Agents View Full Workflow Download Codebook Template