Back to Docs

Meta-Analysis Pipeline

C5-C6-C7 System: AI-Assisted Meta-Analysis with Human Oversight

"Machines calculate, researchers decide: A partnership for rigorous meta-analysis"

Step 1: Understanding the C5-C7 System

The meta-analysis pipeline consists of three specialized agents working together:

C5

Meta-Analysis Master

Orchestrator & Decision Authority

Responsibilities:

  • Overall research question interpretation
  • Effect size hierarchy selection
  • Meta-analytic model choice (fixed/random/MASEM)
  • Final authority on all meta-analysis decisions

Trigger Keywords:

meta-analysiseffect size synthesispooled estimateMASEM
C6

Data Integrity Guard

Extraction & Calculation Specialist

Responsibilities:

  • Extract effect sizes from papers (PDFs, tables, text)
  • Calculate Hedges' g with bias correction
  • SD recovery using multiple methods
  • Data completeness validation

Trigger Keywords:

extract effect sizecalculate Hedges gSD recovery
C7

Error Prevention Engine

Validation & Quality Assurance

Responsibilities:

  • 4-Gate validation system
  • Statistical anomaly detection
  • Warn about common meta-analysis pitfalls
  • Pre-publication quality checks

Trigger Keywords:

validationerror checkquality assurance
Agent Relationships
┌───────────────────────────────────────────┐
│  C5: Meta-Analysis Master                 │
│  (Orchestrator & Decision Authority)      │
└───────────┬───────────────────────────────┘
            │
            ├─── delegates to ───┐
            │                    │
            ▼                    ▼
┌─────────────────────┐  ┌──────────────────┐
│  C6: Data Integrity │  │  C7: Error       │
│  Guard              │  │  Prevention      │
│  (Extract & Calc)   │  │  Engine          │
└──────────┬──────────┘  └────┬─────────────┘
           │                  │
           └──── validates ◄──┘
                (4-Gate System)

Step 2: Starting a Meta-Analysis

C5 activates automatically when you mention meta-analysis intent:

User Input:

"I want to conduct a meta-analysis on AI tutoring effectiveness"

C5 Clarifying Questions:

  • Q1:What is your research question? (e.g., "Does AI tutoring improve learning outcomes?")
  • Q2:What type of effect size do you expect? (Cohen's d, correlation r, odds ratio)
  • Q3:Do you have studies identified already, or do you need help with systematic search?
  • Q4:Are you analyzing direct effects only, or mediation/moderation relationships?

🔴 META_ANALYSIS_PROTOCOL

REQUIRED

When: Before data extraction begins

Decision: Approve research question, ES hierarchy, and meta-analytic approach

C5 will NOT proceed with extraction until you approve the protocol

Step 3: Data Extraction with C6

Once protocol is approved, C6 extracts effect sizes from your studies:

PDF Upload

Upload PDFs of studies

C6 reads tables, text, and figures using OCR + LLM

Manual Entry

Provide study IDs with statistics

Study A: M1=5.2, SD1=1.1, n1=30, M2=4.8, SD2=1.3, n2=28

CSV Import

Upload codebook with extracted data

studyID, author, year, intervention, outcome, n1, M1, SD1, n2, M2, SD2

Hedges' g Calculation

C6 automatically converts all effect sizes to Hedges' g (bias-corrected Cohen's d):

g = d × (1 - 3/(4(n1+n2)-9))

Why Hedges' g:

  • Unbiased estimate for small samples
  • Comparable across studies with different sample sizes
  • Standard metric in education/psychology meta-analyses

SD Recovery Methods (C6 Automatic):

  • 1.SE to SD conversion: SD = SE × √n
  • 2.t-statistic back-calculation: d = t × √(1/n1 + 1/n2)
  • 3.F-statistic to d: d = √(F × (n1+n2)/(n1×n2))
  • 4.p-value approximation (last resort, flagged by C7)

Step 4: 4-Gate Validation with C7

C7 runs a rigorous 4-gate validation system on extracted data:

1

Extraction Validation

Checks:

  • Are all required fields present? (n, M, SD for each group)
  • Are values within plausible ranges? (SD > 0, n ≥ 2, |g| < 5)
  • Do reported statistics match calculated effect sizes?

Common Errors:

  • Missing SD → C7 flags for recovery method
  • Negative SD → Data entry error
  • Extreme g (|g| > 3) → Verify with original paper
2

Classification Validation

Checks:

  • Is the study correctly categorized by moderator variables?
  • Does the intervention match the meta-analysis scope?
  • Are outcome measures consistent across studies?

Common Errors:

  • Intervention mismatch → Exclude or reclassify
  • Outcome construct drift → Flag for sensitivity analysis
3

Statistical Validation

Checks:

  • Is heterogeneity (I²) within acceptable range?
  • Are there statistical outliers (Studentized residuals > ±3)?
  • Is publication bias evident (funnel plot asymmetry)?

Common Errors:

  • High heterogeneity (I² > 75%) → Suggest random-effects or moderator analysis
  • Outliers detected → Flag specific studies
  • Publication bias → Recommend trim-and-fill or selection models
4

Independence Validation

Checks:

  • Are multiple effect sizes from the same sample correctly handled?
  • Is nesting structure (students within classrooms) accounted for?
  • Are dependent effect sizes modeled appropriately?

Common Errors:

  • Non-independence detected → Suggest averaging or multilevel MA
  • Clustering ignored → Warning about Type I error inflation
Common Error Patterns Detected
PatternSeverityRecommendation
SD missing for >30% of studiesHighRequest authors for raw data OR use imputation methods (flagged in report)
All effect sizes positive (no negative effects)MediumCheck for publication bias using Egger test, trim-and-fill
Extreme heterogeneity (I² > 90%)HighDo NOT pool. Conduct subgroup analysis or narrative synthesis

Step 5: Orchestration and Results

C5 coordinates the workflow and synthesizes findings:

1
C5:Define protocol
🔴 META_ANALYSIS_PROTOCOL
2
C6:Extract data from k studies
3
C7:Validate (Gates 1-2)
4
C6:Calculate Hedges g
5
C7:Validate (Gates 3-4)
6
C5:Choose meta-analytic model
7
C5:Generate forest plot, funnel plot
8
C5:Interpret results
🔴 META_ANALYSIS_RESULTS (RECOMMENDED)

Decision Point: Fixed vs. Random Effects

When I² < 25% (low heterogeneity)

C5 권장사항: Fixed-effects model (assumes single true effect size)

Your Choice: You can override if you expect population heterogeneity

Decision Point: Handling Outliers

When C7 flags 2 studies with extreme g values

C5 권장사항: Run sensitivity analysis (with/without outliers)

Your Choice: You decide whether to exclude permanently or report both analyses

Decision Point: Publication Bias Correction

When funnel plot shows asymmetry (Egger p < .05)

C5 권장사항: Report both unadjusted and trim-and-fill adjusted estimates

Your Choice: You choose which to emphasize in conclusions

🟠 META_ANALYSIS_RESULTS

RECOMMENDED

When: After pooled estimate calculated

Decision: Review and approve interpretation before finalizing manuscript

No - strongly recommended to review before writing Discussion section

Step 6: Export and Integration

Export results in multiple formats for publication and reproducibility:

Universal Meta-Analysis Codebook

4-layer codebook with AI provenance tracking

  • Layer 1: Identifiers (studyID, author, year, DOI)
  • Layer 2: Statistics (n, M, SD, g, SE, 95% CI)
  • Layer 3: AI Provenance (extraction_method, confidence_score, verification_status)
  • Layer 4: Human Verification (verified_by, verification_date, notes)

Use Case: Gold standard for transparent AI-assisted meta-analysis

R Script (metafor package)

Ready-to-run R code for replication

library(metafor)
res <- rma(yi = hedges_g, vi = variance,
           method = "REML",
           data = codebook)
forest(res)

Use Case: Reproducible analysis for journal submission

Stata .do file

Stata syntax for meta-analysis

metan hedges_g se_g, random label(namevar=study)
metabias hedges_g se_g, egger

Use Case: For researchers using Stata

CSV for CMA/RevMan

Import into Comprehensive Meta-Analysis or RevMan

Fields: studyID, author, year, n1, M1, SD1, n2, M2, SD2

Use Case: GUI-based meta-analysis tools

PRISMA Diagram Generation

C5 can generate PRISMA 2020 flow diagrams showing:

  • 1.Identification: k studies from databases
  • 2.Screening: Excluded studies (with reasons)
  • 3.Eligibility: Full-text assessed
  • 4.Included: Final k studies in meta-analysis

If you started with the I-category pipeline (I0-I3), PRISMA is auto-populated

Code Examples

R (metafor)
library(metafor)

# Load codebook from C6 export
data <- read.csv("diverga_codebook_v2.2.csv")

# Random-effects meta-analysis
res <- rma(yi = hedges_g,
           vi = variance,
           method = "REML",
           data = data)

# Forest plot
forest(res,
       header = "Study",
       xlab = "Hedges' g")

# Funnel plot (publication bias)
funnel(res)
regtest(res)  # Egger test
Python (metaanalysis package)
import pandas as pd
from metaanalysis import MetaAnalysis

# Load codebook
df = pd.read_csv("diverga_codebook_v2.2.csv")

# Initialize meta-analysis
ma = MetaAnalysis(df,
                  effect_size="hedges_g",
                  variance="variance")

# Run random-effects model
results = ma.fit(method="REML")

# Generate forest plot
ma.plot_forest()

# Check heterogeneity
print(f"I²: {results.I2:.1f}%")
print(f"Q: {results.Q:.2f}, p={results.Q_pval:.3f}")

Common Pitfalls

Unit of Analysis Error

Description:

Treating multiple outcomes from the same study as independent

How C7 Catches This:

Gate 4 flags when same studyID appears multiple times

Solution: Average effect sizes within study OR use robust variance estimation

Apples and Oranges

Description:

Pooling incompatible outcome constructs (e.g., test scores + self-efficacy)

How C7 Catches This:

Gate 2 flags heterogeneous outcome_type values

Solution: Conduct separate meta-analyses by outcome category

Garbage In, Garbage Out

Description:

Including low-quality studies with biased effect sizes

How C7 Catches This:

Does NOT catch this automatically (requires domain knowledge)

Solution: You must quality-appraise studies before inclusion (use B2 agent)

File Drawer Problem

Description:

Missing unpublished studies with null results

How C7 Catches This:

Gate 3 flags funnel plot asymmetry

Solution: Search grey literature, contact authors, report bias-adjusted estimates

Ready to Start Your Meta-Analysis?

The C5-C6-C7 system handles the complexity while you maintain full control.