IRT Assessment System

3-Parameter Logistic Model · SAT / UTBK-SNBT Compatible · Score 0–1000

How it works: Upload your binary response matrix (examinees × items), configure scoring options, and click Run Analysis. The system fits the 3PL IRT model using iterative MLE/EAP, scales scores to 0–1000, and produces charts + downloadable reports — entirely in the browser.

3PL IRT Model

P(θ) = c + (1−c) / (1 + e^−a(θ−b))
a = discrimination b = difficulty c = guessing

Question Weighting

w = a · (1 + |b|/3) · (1 − 2c)
Weights are normalized to sum to 1000.
Harder + more discriminating = higher weight.

Score Scaling

Perfect (all correct) → 1000
Near-perfect (1 wrong) → separated band below 1000
Others → linear map of θ to [0, minNearPerfect−1]
Zero raw score → lowest observed IRT score

Features

CSV upload or synthetic demo data
ICC, TIF, ability distribution charts
Per-question score matrix
Cronbach's alpha reliability
CSV download of all tables

Data Input

Upload Response File

Drop CSV here or click to browse

Rows = examinees · Columns = items (binary 0/1)

ID Column Name (auto-detect if blank)

Expected format: first row = header (item names), first column optional ID. Values: 0 / 1 / blank (omit) / "e" (omit).

Generate Synthetic Demo Data

Number of Examinees

200 students

Number of Items

30 items

Random Seed

Simulates UTBK/SAT-style responses using true 3PL parameters.
~5% responses randomly omitted (NA). Examinees from N(0,1).

Analysis Configuration

Scoring Parameters

Decimal Precision (0–4)

2 decimal places

Theta Lower Bound

Theta Upper Bound

Enable Near-Perfect Score Separation

Minimum Score for Near-Perfect (1 wrong)

900

Gap from Perfect Score (pts)

30 pts

Calculate Per-Question Scores & Weights

Parameter Descriptions

Decimal Precision: Controls decimal places in all output scores. 2 = SAT-style (e.g. 756.34).
Theta Bounds: Clamps latent ability θ. Default ±3 covers 99.7% of standard normal. Widen to ±4 for more extreme spread.
Near-Perfect Separation: Examinees with exactly 1 wrong answer receive distinct scores in [minNP, 1000 − gap], ranked by θ. Prevents near-perfect bunching.
Per-Question Weights: IRT weight formula: w = a·(1+|b|/3)·(1−2c). Normalized to sum = 1000. Correct answer earns full weight; wrong earns partial credit proportional to guessing.

Score Results

Mean Score

—

Perfect (1000)

—

Near-Perfect

—

Cronbach α

—

Student Score Table

Rank ↕	Examinee	Scaled Score ↕	Raw	%	θ Ability ↕	Percentile	Status

Visualizations

Score Distribution

Raw vs Scaled

θ Distribution

Test Info (TIF)

ICC Browser

Score Compare

Select Item:

IRT Scaled Score vs Question-Based Weighted Score (diagonal = perfect agreement)

Item Analysis (3PL Parameters)

Item Parameter Estimates (a, b, c)

Item	a (Discrimination) ↕	b (Difficulty) ↕	c (Guessing) ↕	p-value	N Answered	ICC

Difficulty (b) Distribution

Discrimination (a) Distribution

Question Weight Analysis

IRT-based weight: w = a·(1+|b|/3)·(1−2c), normalized so all weights sum to 1000. Higher-discrimination / harder questions are worth more.

Question Statistics & Weights

Item	Weight ↕	Weight %	p-value	a (disc.)	b (diff.)	c (guess.)	Mean Score

Weights by Item (sorted)

Weight vs Difficulty

Export Data

CSV Downloads

Methodology Reference

Scoring Method: 3PL IRT with Constrained Scaling
IRT Model: 3-Parameter Logistic (a, b, c)
Score Range: 0 – 1000
Perfect Score: Exactly 1000 (all correct)
Near-Perfect Handling: Separated in [minNP, 1000−gap] by θ rank
Theta Estimation: EAP / MLE approximation (browser JS)
Missing Data: Omitted (NA) — ignored in likelihood
Reliability: Cronbach's α (covariance formula)

Note: Because this runs in the browser without the R mirt package, item parameters are estimated using a simplified EM-based 3PL algorithm. Results match mirt closely for well-conditioned datasets. For very small samples (<50) or items with extreme p-values, results may differ from full mirt output.