IRT Assessment System
3-Parameter Logistic Model · SAT / UTBK-SNBT Compatible · Score 0–1000
How it works: Upload your binary response matrix (examinees × items), configure scoring options, and click Run Analysis. The system fits the 3PL IRT model using iterative MLE/EAP, scales scores to 0–1000, and produces charts + downloadable reports — entirely in the browser.
3PL IRT Model
P(θ) = c + (1−c) / (1 + e−a(θ−b))
a = discrimination
b = difficulty
c = guessing
Question Weighting
w = a · (1 + |b|/3) · (1 − 2c)
Weights are normalized to sum to 1000.
Harder + more discriminating = higher weight.
Score Scaling
- Perfect (all correct) → 1000
- Near-perfect (1 wrong) → separated band below 1000
- Others → linear map of θ to [0, minNearPerfect−1]
- Zero raw score → lowest observed IRT score
Features
- CSV upload or synthetic demo data
- ICC, TIF, ability distribution charts
- Per-question score matrix
- Cronbach's alpha reliability
- CSV download of all tables
Data Input
Upload Response File
Drop CSV here or click to browse
Rows = examinees · Columns = items (binary 0/1)
Expected format: first row = header (item names), first column optional ID.
Values: 0 / 1 / blank (omit) / "e" (omit).
Generate Synthetic Demo Data
200 students
30 items
Simulates UTBK/SAT-style responses using true 3PL parameters.
~5% responses randomly omitted (NA). Examinees from N(0,1).
~5% responses randomly omitted (NA). Examinees from N(0,1).
Analysis Configuration
Scoring Parameters
2 decimal places
Enable Near-Perfect Score Separation
900
30 pts
Calculate Per-Question Scores & Weights
Parameter Descriptions
- Decimal Precision
- Controls decimal places in all output scores. 2 = SAT-style (e.g. 756.34).
- Theta Bounds
- Clamps latent ability θ. Default ±3 covers 99.7% of standard normal. Widen to ±4 for more extreme spread.
- Near-Perfect Separation
- Examinees with exactly 1 wrong answer receive distinct scores in [minNP, 1000 − gap], ranked by θ. Prevents near-perfect bunching.
- Per-Question Weights
- IRT weight formula: w = a·(1+|b|/3)·(1−2c). Normalized to sum = 1000. Correct answer earns full weight; wrong earns partial credit proportional to guessing.
Score Results
Mean Score
—
Perfect (1000)
—
Near-Perfect
—
Cronbach α
—
Student Score Table
| Rank ↕ | Examinee | Scaled Score ↕ | Raw | % | θ Ability ↕ | Percentile | Status |
|---|
Visualizations
Score Distribution
Raw vs Scaled
θ Distribution
Test Info (TIF)
ICC Browser
Score Compare
IRT Scaled Score vs Question-Based Weighted Score (diagonal = perfect agreement)
Item Analysis (3PL Parameters)
Item Parameter Estimates (a, b, c)
| Item | a (Discrimination) ↕ | b (Difficulty) ↕ | c (Guessing) ↕ | p-value | N Answered | ICC |
|---|
Difficulty (b) Distribution
Discrimination (a) Distribution
Question Weight Analysis
IRT-based weight: w = a·(1+|b|/3)·(1−2c), normalized so all weights sum to 1000. Higher-discrimination / harder questions are worth more.
Question Statistics & Weights
| Item | Weight ↕ | Weight % | p-value | a (disc.) | b (diff.) | c (guess.) | Mean Score |
|---|
Weights by Item (sorted)
Weight vs Difficulty
Export Data
CSV Downloads
Methodology Reference
- Scoring Method
- 3PL IRT with Constrained Scaling
- IRT Model
- 3-Parameter Logistic (a, b, c)
- Score Range
- 0 – 1000
- Perfect Score
- Exactly 1000 (all correct)
- Near-Perfect Handling
- Separated in [minNP, 1000−gap] by θ rank
- Theta Estimation
- EAP / MLE approximation (browser JS)
- Missing Data
- Omitted (NA) — ignored in likelihood
- Reliability
- Cronbach's α (covariance formula)
Note: Because this runs in the browser without the R
mirt package, item parameters are estimated using a simplified EM-based 3PL algorithm. Results match mirt closely for well-conditioned datasets. For very small samples (<50) or items with extreme p-values, results may differ from full mirt output.