Meta-Analysis Pipeline
C5-C6-C7 System: AI-Assisted Meta-Analysis with Human Oversight
"Machines calculate, researchers decide: A partnership for rigorous meta-analysis"
Step 1: Understanding the C5-C7 System
The meta-analysis pipeline consists of three specialized agents working together:
Meta-Analysis Master
Orchestrator & Decision Authority
Responsibilities:
- ▸Overall research question interpretation
- ▸Effect size hierarchy selection
- ▸Meta-analytic model choice (fixed/random/MASEM)
- ▸Final authority on all meta-analysis decisions
Trigger Keywords:
Data Integrity Guard
Extraction & Calculation Specialist
Responsibilities:
- ▸Extract effect sizes from papers (PDFs, tables, text)
- ▸Calculate Hedges' g with bias correction
- ▸SD recovery using multiple methods
- ▸Data completeness validation
Trigger Keywords:
Error Prevention Engine
Validation & Quality Assurance
Responsibilities:
- ▸4-Gate validation system
- ▸Statistical anomaly detection
- ▸Warn about common meta-analysis pitfalls
- ▸Pre-publication quality checks
Trigger Keywords:
┌───────────────────────────────────────────┐
│ C5: Meta-Analysis Master │
│ (Orchestrator & Decision Authority) │
└───────────┬───────────────────────────────┘
│
├─── delegates to ───┐
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────────┐
│ C6: Data Integrity │ │ C7: Error │
│ Guard │ │ Prevention │
│ (Extract & Calc) │ │ Engine │
└──────────┬──────────┘ └────┬─────────────┘
│ │
└──── validates ◄──┘
(4-Gate System)Step 2: Starting a Meta-Analysis
C5 activates automatically when you mention meta-analysis intent:
User Input:
"I want to conduct a meta-analysis on AI tutoring effectiveness"
C5 Clarifying Questions:
- Q1:What is your research question? (e.g., "Does AI tutoring improve learning outcomes?")
- Q2:What type of effect size do you expect? (Cohen's d, correlation r, odds ratio)
- Q3:Do you have studies identified already, or do you need help with systematic search?
- Q4:Are you analyzing direct effects only, or mediation/moderation relationships?
🔴 META_ANALYSIS_PROTOCOL
REQUIRED
When: Before data extraction begins
Decision: Approve research question, ES hierarchy, and meta-analytic approach
⚠ C5 will NOT proceed with extraction until you approve the protocol
Step 3: Data Extraction with C6
Once protocol is approved, C6 extracts effect sizes from your studies:
PDF Upload
Upload PDFs of studies
C6 reads tables, text, and figures using OCR + LLM
Manual Entry
Provide study IDs with statistics
Study A: M1=5.2, SD1=1.1, n1=30, M2=4.8, SD2=1.3, n2=28
CSV Import
Upload codebook with extracted data
studyID, author, year, intervention, outcome, n1, M1, SD1, n2, M2, SD2
C6 automatically converts all effect sizes to Hedges' g (bias-corrected Cohen's d):
Why Hedges' g:
- ✓Unbiased estimate for small samples
- ✓Comparable across studies with different sample sizes
- ✓Standard metric in education/psychology meta-analyses
SD Recovery Methods (C6 Automatic):
- 1.SE to SD conversion: SD = SE × √n
- 2.t-statistic back-calculation: d = t × √(1/n1 + 1/n2)
- 3.F-statistic to d: d = √(F × (n1+n2)/(n1×n2))
- 4.p-value approximation (last resort, flagged by C7)
Step 4: 4-Gate Validation with C7
C7 runs a rigorous 4-gate validation system on extracted data:
Extraction Validation
Checks:
- ▸Are all required fields present? (n, M, SD for each group)
- ▸Are values within plausible ranges? (SD > 0, n ≥ 2, |g| < 5)
- ▸Do reported statistics match calculated effect sizes?
Common Errors:
- Missing SD → C7 flags for recovery method
- Negative SD → Data entry error
- Extreme g (|g| > 3) → Verify with original paper
Classification Validation
Checks:
- ▸Is the study correctly categorized by moderator variables?
- ▸Does the intervention match the meta-analysis scope?
- ▸Are outcome measures consistent across studies?
Common Errors:
- Intervention mismatch → Exclude or reclassify
- Outcome construct drift → Flag for sensitivity analysis
Statistical Validation
Checks:
- ▸Is heterogeneity (I²) within acceptable range?
- ▸Are there statistical outliers (Studentized residuals > ±3)?
- ▸Is publication bias evident (funnel plot asymmetry)?
Common Errors:
- High heterogeneity (I² > 75%) → Suggest random-effects or moderator analysis
- Outliers detected → Flag specific studies
- Publication bias → Recommend trim-and-fill or selection models
Independence Validation
Checks:
- ▸Are multiple effect sizes from the same sample correctly handled?
- ▸Is nesting structure (students within classrooms) accounted for?
- ▸Are dependent effect sizes modeled appropriately?
Common Errors:
- Non-independence detected → Suggest averaging or multilevel MA
- Clustering ignored → Warning about Type I error inflation
| Pattern | Severity | Recommendation |
|---|---|---|
| SD missing for >30% of studies | High | Request authors for raw data OR use imputation methods (flagged in report) |
| All effect sizes positive (no negative effects) | Medium | Check for publication bias using Egger test, trim-and-fill |
| Extreme heterogeneity (I² > 90%) | High | Do NOT pool. Conduct subgroup analysis or narrative synthesis |
Step 5: Orchestration and Results
C5 coordinates the workflow and synthesizes findings:
Decision Point: Fixed vs. Random Effects
When I² < 25% (low heterogeneity)
C5 권장사항: Fixed-effects model (assumes single true effect size)
Your Choice: You can override if you expect population heterogeneity
Decision Point: Handling Outliers
When C7 flags 2 studies with extreme g values
C5 권장사항: Run sensitivity analysis (with/without outliers)
Your Choice: You decide whether to exclude permanently or report both analyses
Decision Point: Publication Bias Correction
When funnel plot shows asymmetry (Egger p < .05)
C5 권장사항: Report both unadjusted and trim-and-fill adjusted estimates
Your Choice: You choose which to emphasize in conclusions
🟠 META_ANALYSIS_RESULTS
RECOMMENDED
When: After pooled estimate calculated
Decision: Review and approve interpretation before finalizing manuscript
⚠ No - strongly recommended to review before writing Discussion section
Step 6: Export and Integration
Export results in multiple formats for publication and reproducibility:
Universal Meta-Analysis Codebook
4-layer codebook with AI provenance tracking
- ▸Layer 1: Identifiers (studyID, author, year, DOI)
- ▸Layer 2: Statistics (n, M, SD, g, SE, 95% CI)
- ▸Layer 3: AI Provenance (extraction_method, confidence_score, verification_status)
- ▸Layer 4: Human Verification (verified_by, verification_date, notes)
Use Case: Gold standard for transparent AI-assisted meta-analysis
R Script (metafor package)
Ready-to-run R code for replication
library(metafor)
res <- rma(yi = hedges_g, vi = variance,
method = "REML",
data = codebook)
forest(res)Use Case: Reproducible analysis for journal submission
Stata .do file
Stata syntax for meta-analysis
metan hedges_g se_g, random label(namevar=study) metabias hedges_g se_g, egger
Use Case: For researchers using Stata
CSV for CMA/RevMan
Import into Comprehensive Meta-Analysis or RevMan
Fields: studyID, author, year, n1, M1, SD1, n2, M2, SD2
Use Case: GUI-based meta-analysis tools
PRISMA Diagram Generation
C5 can generate PRISMA 2020 flow diagrams showing:
- 1.Identification: k studies from databases
- 2.Screening: Excluded studies (with reasons)
- 3.Eligibility: Full-text assessed
- 4.Included: Final k studies in meta-analysis
✓ If you started with the I-category pipeline (I0-I3), PRISMA is auto-populated
Code Examples
library(metafor)
# Load codebook from C6 export
data <- read.csv("diverga_codebook_v2.2.csv")
# Random-effects meta-analysis
res <- rma(yi = hedges_g,
vi = variance,
method = "REML",
data = data)
# Forest plot
forest(res,
header = "Study",
xlab = "Hedges' g")
# Funnel plot (publication bias)
funnel(res)
regtest(res) # Egger testimport pandas as pd
from metaanalysis import MetaAnalysis
# Load codebook
df = pd.read_csv("diverga_codebook_v2.2.csv")
# Initialize meta-analysis
ma = MetaAnalysis(df,
effect_size="hedges_g",
variance="variance")
# Run random-effects model
results = ma.fit(method="REML")
# Generate forest plot
ma.plot_forest()
# Check heterogeneity
print(f"I²: {results.I2:.1f}%")
print(f"Q: {results.Q:.2f}, p={results.Q_pval:.3f}")Common Pitfalls
⚠ Unit of Analysis Error
Description:
Treating multiple outcomes from the same study as independent
How C7 Catches This:
Gate 4 flags when same studyID appears multiple times
Solution: Average effect sizes within study OR use robust variance estimation
⚠ Apples and Oranges
Description:
Pooling incompatible outcome constructs (e.g., test scores + self-efficacy)
How C7 Catches This:
Gate 2 flags heterogeneous outcome_type values
Solution: Conduct separate meta-analyses by outcome category
⚠ Garbage In, Garbage Out
Description:
Including low-quality studies with biased effect sizes
How C7 Catches This:
Does NOT catch this automatically (requires domain knowledge)
Solution: You must quality-appraise studies before inclusion (use B2 agent)
⚠ File Drawer Problem
Description:
Missing unpublished studies with null results
How C7 Catches This:
Gate 3 flags funnel plot asymmetry
Solution: Search grey literature, contact authors, report bias-adjusted estimates
Ready to Start Your Meta-Analysis?
The C5-C6-C7 system handles the complexity while you maintain full control.