
Practice IELTS, TOEFL iBT, GRE, GMAT, SAT and 9 more standardized tests with AI-generated content that is 96% indistinguishable from official exams. Instant AI scoring for Speaking and Writing — calibrated to official band descriptors.
From test selection to detailed AI feedback — the entire loop runs in one session.
Select from 12+ standardized test types or build a fully custom test. Set your section preferences, timing mode (timed or untimed), and difficulty target.
Work through sections with authentic timing, AI-generated audio for Listening, and a clean test-day interface. Section-by-section progression mirrors the real exam flow.
Receive official-scale band scores, section breakdowns, per-criterion AI feedback for Speaking and Writing, multi-perspective feedback from 3 AI personas (Examiner, Study Coach, Fellow Student), model answers, and concrete improvement suggestions — all immediately after submission.
Every test follows official timing, section structure, question formats, and scoring scales. Updated for 2023 international standards.
One-click IELTS / TOEFL / PTE / OET / CELPIP replicas — real-exam section counts and durations auto-fill on page load. Pick a result format that fits your use case: numeric points, CEFR placement (A1–C2), or native band (IELTS Band 6.5, PTE 70/90, OET Grade B, TOEFL, CLB). Or go Fully Custom with your own passages, transcripts, and prompts.
Classic numeric/percentage score (e.g. 72 / 100). Best for corporate assessments, internal practice, and ungraded check-ins.
Student gets a CEFR placement: "CEFR B2 — Upper Intermediate", per-skill CEFR badges, A1–C2 band axis, and a reference table. Recommended for placement testing at Intermediate difficulty.
Real-exam scale: Band 6.5 (IELTS), 70 / 90 (PTE), Grade B (OET), 87 / 120 (TOEFL), CLB 8 (CELPIP). Per-skill native scores. Ideal for exam-prep cohorts.
MCQ · True/False/Not Given · Fill-blank · Summary completion · Short-answer. Topics auto-pulled from base-style pool (IELTS academic disciplines, OET healthcare, CELPIP everyday-Canada) for natural per-passage variety.
MCQ · Form completion · Matching · Fill-blank questions with AI-generated audio from your transcript using 23 TTS voices in the target language.
AI-evaluated tasks using your prompt or pool-pulled topics. Rubric calibrated to official scoring (IELTS, TOEFL, OET, PTE, CELPIP).
AI-evaluated spoken response to your topic. Scored on fluency, pronunciation, grammar, vocabulary, and (for OET) clinical communication.
Running a single-test cohort (e.g. CEFR placement on intake)? Set max_visible_tests=0 on the student's enrollment and they see only the test you assign them — your full Custom Test library stays hidden. Open it up later by removing the limit. No more "student sees five practice tests when you wanted them to take one placement."
Result page + PDF include an honest calibration disclosure: "internal placement — expect ~1 level lower on a real proctored sitting." No over-claiming.
Our 17B parameter evaluation model uses official band descriptors for each test type. No generic rubrics — every score mirrors what a certified human examiner would give.
Scored per criterion on official scale. Example below shows an IELTS response.
Rubric calibrated per test type. Example below shows a TOEFL Integrated task.
Instant results — no waiting. Example IELTS full test result.
Result includes: AI feedback per section · Model answers · Multi-perspective feedback · Improvement suggestions · Downloadable report
Writing and Speaking results include feedback from 3 distinct AI personas — each referencing your actual text and full test context.
Strict, rubric-focused
Identifies exactly which band descriptor you missed and how it affects your overall score. Quotes your actual response.
Supportive, actionable
Connects strengths from other sections to this one. Gives one before/after rewrite using your actual text.
Relatable, practical
Shows how fixing one area could raise your overall band. Shares a concrete study technique that works.
Available for all test types: IELTS, TOEFL, PTE, OET, CELPIP, GRE, Duolingo, Adaptive Language, and Custom tests.
Not a generic language test with a medical coat of paint. Real OET 2025 format, real 2025 rubrics, real examiner-grade feedback.
Listening 24/6/12 split (2 consultations × 12 gaps + 6 workplace MCQs + 2 presentations × 6 MCQs). Reading 20/6/16 with matching, sentence-completion, and short-answer across Part A's 4 texts. Writing = 1 profession-specific letter. Speaking = warm-up + 2 role-plays on canonical three-section cards.
Every Part A gap answer must appear verbatim in the generated transcript — a post-parse filter drops any gap whose answer isn't actually spoken. The "question asked for a date but the audio never mentioned one" bug can't exist.
Writing scored on Purpose (0-3) + Content / Conciseness & Clarity / Genre & Style / Organisation & Layout / Language (0-7 each) — max 38, anchored at raw 27 = Grade B floor. Speaking scored on 4 Linguistic criteria (0-6) + 5 Clinical Communication criteria (0-3) with a 60/40 weighted formula.
Every criterion score is backed by a verbatim quotation from the candidate's letter. The lowest-scoring paragraph gets a band-7 rewrite side-by-side with the original. 12-item examiner-gotcha checklist flags things like copy-paste from case notes, bullet points in the body, missing salutation, purpose buried past paragraph 1.
Every role-play card ships with a hidden patient brief (Ideas / Concerns / Expectations + verbal cues). The evaluator uses it to score whether the candidate actually elicited and incorporated the ICE framework — not just fluency. Moment-level feedback with timestamps shows exactly where empathy cues were missed or signposting broke down.
After a handful of sittings, students see their current grade, projected grade in 4 weeks at the current pace, weakest section with point-gap to the next grade, and an estimate of days to reach Grade B. Trend-based — no fake optimism when performance is flat or declining.
Intermediate matches real-OET rigor 1:1. Advanced is measurably harder (C1 medical vocabulary, near-miss distractors, co-morbid scenarios). Expert is two tiers above real (C2+ academic medical language, multi-system vignettes). Scoring stays real-exam calibrated — a Grade B at Advanced means a Grade A on the real test.
Medicine, nursing, pharmacy, dentistry, dietetics, occupational therapy, optometry, physiotherapy, podiatry, radiography, speech pathology, veterinary science — each with dedicated scenarios, workplace extracts, presentation topics, and role-play situations. Thousands of distinct mock tests before anything visibly repeats.
When institutes create a 30-test standardised OET pool, each test #N lands on a deterministic (profession × topic) pair. 12 professions × 30 themes = 360 distinct combinations before any repeat — students never see the same test twice.
Give teachers a 24-hour window to review every result before scores publish — or let it auto-publish if they don't act. Per-student opt-in. Never blocks the student indefinitely.
Org admin flips a "Teacher Review" switch on the student's enrollment in the Institute Student Management page. Some students get the hold (e.g. exam-bound IELTS students whose final mock needs teacher sign-off); others don't (e.g. self-paced learners on daily reading drills). Works on any language test the student takes — org-scoped or pool tests like IELTS / TOEFL — because the gate keys off the student, not the test.
When a held student submits a test, the score is computed normally but visibility is gated. Teachers get an in-app notification + email. They can release early or let the result auto-publish at the 24-hour mark via an hourly background sweep. The student never waits longer than 24 hours — even if the institute goes silent for a week.
The "Awaiting teacher review" page shows the test name, submission time, deadline, and a live ticking countdown ("Auto-release in 17h 28m 04s"). The page polls the deadline and reloads itself the moment auto-release fires — no manual refresh. An email lands the same second.
Teachers can save a free-text comment without releasing (mid-review pause), or attach it to the release action — either way the comment is shown to the student on the result page. Stored on the session itself, audited with reviewed_by and reviewed_at.
Teachers can highlight any text in a student response and attach a threaded comment ("Strong topic sentence — but the supporting example doesn't connect"). Once released, students can reply on the same thread — turning a one-shot score into an actual coaching dialogue. Teachers can mark threads resolved, audit-stamped with resolved_by / resolved_at.
A dedicated dashboard at /language-tests/reviews/ with two tabs: Pending (sorted by deadline, with per-row hours-left countdown) and Released (filterable by 30 / 90 / 365 days / all-time, showing who released and whether it was teacher-released or auto-released after 24h). Text search across student name, email, and test title.
When a held result lands, every active org admin and examiner gets two simultaneous nudges: an in-app bell-icon notification (with deep-link to the result page) and an email ("Result pending your review — Student Name / Test Title"). De-duplicated per session so retries never re-spam teachers.
Only active org admins and examiners on the test's organization (or the student's enrollment institute) can review. Site-wide super-admins are intentionally excluded from the teacher-review action — keeping reviews local to the institute that owns the student. Django staff/superuser retain inspection access via the standard authorization path.
Every reviewed session stamps review_status, review_deadline, teacher_comment, reviewed_by, reviewed_at, released_at, and released_auto (true when the 24h sweep released it, false when a teacher released early). Institutes can audit which sessions slipped past teacher review for staffing and SLA reporting.
Most AI test platforms either drop raw AI scores on the student instantly with no teacher in the loop, or they make the teacher a hard bottleneck for every result. This feature gives institutes the best of both: a real teacher review window with comments and inline annotations, but never at the cost of the student waiting indefinitely. The 24-hour ceiling is a hard guarantee — auto-release is unconditional and runs hourly. It's the difference between AI that replaces teachers and AI that gives teachers their time back.
Format-authentic UI, real partial credit, and the new SGD task — all 22 task types from the official Aug 2025 spec, end-to-end.
Reading (5): R&W FIB dropdown, Reading FIB drag-bank, Re-order Paragraphs, MC Single, MC Multi. Listening (8): SST, FIB inline, HCS, SMW, HIW, WFD, MC Single, MC Multi. Speaking (8 incl. Personal Intro): Read Aloud, Repeat Sentence, Describe Image, Re-tell Lecture, Answer Short Question, Summarize Group Discussion, Respond to a Situation. Writing (2): Summarize Written Text, Write Essay.
R&W FIB renders one passage with inline dropdowns per blank — not separate one-blank-per-question fragments. Reading FIB drag-bank uses one shared word bank (more words than blanks) to drag into inline drop-targets. Re-order Paragraphs is two-pane drag (source left → target right). The passage itself is the task for FIB — no separate read-only panel that would reveal the answers.
Reorder uses adjacent-pair scoring (1 point per correctly placed consecutive pair, max n−1). MC Multi and Highlight Incorrect Words apply +1 correct / −1 wrong, floored at 0. Multi-blank FIB and Listening FIB give per-blank credit. Write from Dictation scores per-word. None of this is binary all-or-nothing.
3-speaker discussion synthesised through our multi-voice TTS pipeline (10-voice US/UK pool, deterministic per-speaker assignment). Counts toward both Listening and Speaking. AI eval explicitly penalises memorised templates — content score depends on capturing each speaker's distinct viewpoint.
Per-session opt-in. When on: audio replay disabled, custom player with no scrubbing, per-task hard timers, no pause. Practice mode is forgiving (auto-submit on timer expiry, soft timing); real-exam mode mirrors Pearson's strict conditions for serious mock attempts. Toggle on the test-start screen.
Pre-save sanitiser strips leaked A) /B) letter prefixes and (correct)/(incorrect) tags from option text — both at write time and via a one-shot backfill on existing items. The validator also tolerates JSON-encoded full-text answers that legacy data may produce, so students aren't penalised for AI artifacts.
Each section converts net partial-credit points to the official 10–90 scale; overall = average of 4 sections. The result page shows per-question slot ratios (e.g. "2 / 3 blanks correct" plus the 0–1 normalised score) and a collapsible "How is PTE Academic scored?" explainer covering every task type's scoring rule. Honest about being indicative, not Pearson-calibrated.
Every answer routes through a resilient save queue with localStorage persistence and exponential-backoff retry. Multi-blank dropdowns, drag-bank chips, two-pane reorder, inline FIB inputs, clickable HIW transcripts — all restore correctly on resume. Mid-test pauses preserve progress on every question type.
Every question type has automated coverage from the JS wire format → submit_response → validator → DB → complete_test → 10–90 conversion → result-page render. Includes regression tests for JSON-string MC submissions, leaked answer markers, partial-credit display, and per-blank scoring across 13 distinct question types.
Item Response Theory (IRT) algorithm adjusts question difficulty in real time based on your performance — exactly as the official GRE, GMAT, and SAT do. SAT Math sections include the built-in Desmos graphing and scientific calculator, matching the real Bluebook testing experience.
4-Check verdict verification (April 2026): Every math question on SAT, GRE, and GMAT runs through a five-verdict audit by Qwen 3 32B (81.4% AIME 2024). The verifier doesn't just check "is the marked letter correct" — it verifies (1) the answer respects every stem qualifier ("larger", "least positive", "no solution"), (2) the marked letter matches the solution, (3) no other option also satisfies the question, and (4) the stem is internally consistent and the explanation matches the answer field. Verdicts: OK, WRONG_ANSWER, MULTIPLE_CORRECT, NO_VALID_OPTION, INCONSISTENT. On SAT, any non-OK verdict deletes the question and a top-up loop regenerates it; a final post-top-up sweep catches residual non-determinism. Verbal items use cross-model consensus (gpt-oss-120b + llama-3.3-70b in parallel) — flip only when both agree.
Independently hand-verified (April 2026): 128 / 128 math items across one fresh expert SAT, GMAT, and GRE test had correct answer keys when manually re-solved. The two questions with quality issues (one multi-correct GMAT LP, one inconsistent GRE word problem) had already been flagged in the verifier logs.
Also explore: Custom Exam SystemPreparing for an exam abroad? Practice with tests indistinguishable from the real thing and get targeted feedback to fix your weakest areas.
Run mock exams, track cohort progress, and deploy custom tests built from your own course materials — all from one platform.
Deploy standardized English proficiency screening or professional language certification at scale — self-hosted under your institution's domain.
/language-tests/reviews/. They can save comments without releasing, add inline annotations on any selection in the student's response (with threaded student replies once released), or release immediately with an optional final comment. If the teacher does nothing, results auto-publish at the 24-hour mark via an hourly background sweep — so students never wait longer than 24 hours, even if the teacher is silent. Every action is audited with reviewed_by, reviewed_at, released_at, and released_auto.
12+ test types · 60+ question formats · Instant AI speaking and writing scores
Book a Demo Sign Up Free