t-Test Calculator
Calculate t-statistic, degrees of freedom, and p-value for one-sample, two-sample (Welch), and paired t-tests. Includes Cohen's d effect size and confidence intervals.
t-Statistic
—
Degrees of Freedom (n−1) —
p-value (two-tailed) —
p-value (one-tailed) —
Extended More scenarios, charts & detailed breakdown ▾
t-Statistic
—
Degrees of Freedom —
p-value (two-tailed) —
Interpretation (α=0.05) —
Professional Full parameters & maximum detail ▾
Test Results
t-Statistic —
Degrees of Freedom —
p-value (two-tailed) —
Effect Size & Interval
Cohen's d (effect size) —
95% Confidence Interval for Mean —
Approximate Power (β) —
Decision
Decision —
How to Use This Calculator
- Enter your sample mean, hypothesized mean (μ₀), sample standard deviation, and sample size.
- Read the t-statistic, degrees of freedom, and p-values instantly.
- Use the One-Sample tab to test a mean against a benchmark.
- Use Two-Sample for comparing two independent groups (Welch correction applied).
- Use Paired for before-after or matched-pair designs.
- Professional tab adds Cohen's d effect size and 95% confidence interval.
Formula
One-sample t: t = (x̄ − μ₀) / (s / √n)
Welch two-sample t: t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
Paired t: t = d̄ / (s_d / √n)
Cohen's d: d = (x̄ − μ₀) / s
Example
Sample mean = 52, μ₀ = 50, s = 5, n = 25: t = (52−50)/(5/√25) = 2/1 = 2.00, df = 24, p ≈ 0.0569 (two-tailed). Not significant at α=0.05.
Frequently Asked Questions
- A t-test is a parametric statistical hypothesis test used to determine whether a statistically significant difference exists between means. William Sealy Gosset published the method in 1908 under the pseudonym 'Student,' hence 'Student's t-distribution.' You use a t-test when your data is continuous (interval or ratio scale), your population standard deviation is unknown (which is almost always the case in practice), and your sample size is small enough that the normal approximation is imprecise. The t-distribution has heavier tails than the normal distribution, which accounts for the extra uncertainty introduced by estimating the standard deviation from the sample. As sample size grows, the t-distribution converges to the standard normal. Use a t-test when comparing a sample mean to a known value, comparing means of two independent groups, or analyzing before-after measurements on the same subjects. For larger samples (n > 30) with known population variance, a z-test is technically more appropriate, though the difference is negligible in practice.
- These three variants address different research questions. A one-sample t-test compares a single sample mean to a known or hypothesized population mean — for example, testing whether a factory's output of 52 units per hour differs from the industry benchmark of 50. A two-sample (independent samples) t-test compares means from two separate, unrelated groups — for example, comparing exam scores of students taught by two different methods. The samples must be independent: knowing one observation from Group A tells you nothing about Group B. A paired t-test (also called a repeated measures or matched pairs t-test) is used when each observation in one group is linked to a corresponding observation in the other — for example, blood pressure measured before and after a drug on the same patients. Pairing removes subject-to-subject variability, making the paired test more powerful than the two-sample test when pairing is appropriate. Choosing the wrong test inflates Type I or Type II error rates.
- The p-value is the probability of observing a t-statistic at least as extreme as the one computed, assuming the null hypothesis is true. It does NOT tell you the probability that the null hypothesis is correct, nor the probability your result was due to chance. A common threshold is α = 0.05: if p < 0.05, you reject the null hypothesis and declare the result statistically significant. For a two-tailed test (testing whether the mean differs in either direction), you compare to the full α. For a one-tailed test (testing only whether the mean is greater or less), you halve α. Always report the exact p-value rather than just 'significant' or 'not significant.' A p-value of 0.049 and 0.051 are nearly identical in practical terms. Combine p-values with effect sizes (Cohen's d) and confidence intervals for a complete picture. A statistically significant result with a tiny effect size (d = 0.05) may have no practical importance whatsoever.
- The t-test assumes the population from which the sample is drawn is normally distributed (or that the sample size is large enough for the Central Limit Theorem to apply). If your data is clearly non-normal — highly skewed, heavy-tailed, or multimodal — and your sample size is small (n < 30), the t-test's p-values and confidence intervals may be inaccurate. Options when normality fails: (1) Transform your data — a log transformation often normalizes right-skewed data such as income or reaction times. (2) Use a non-parametric alternative — the Mann-Whitney U test replaces the two-sample t-test, the Wilcoxon signed-rank test replaces the paired t-test, and the sign test is the most distribution-free option. (3) Use a permutation test or bootstrap confidence interval, which make no distributional assumptions. (4) With larger samples (n > 30–50), the t-test is quite robust to non-normality due to the Central Limit Theorem. Always check normality with a Shapiro-Wilk test or Q-Q plot before applying a t-test to small samples.
- Welch's t-test is the preferred two-sample test in virtually all modern statistical practice. The standard (Student's) two-sample t-test assumes equal variances in both groups — an assumption called homogeneity of variance or homoscedasticity. Welch's correction relaxes this assumption by computing adjusted degrees of freedom using the Satterthwaite approximation: df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)]. This typically gives non-integer degrees of freedom. When variances ARE equal, Welch's test is only very slightly less powerful than Student's. When variances are unequal, Student's test produces incorrect p-values (usually anti-conservative, meaning too many false positives). Statistical guidelines from journals including Nature and APA now recommend Welch's t-test as the default two-sample test, with an equal-variance test only as a sensitivity check. Use Levene's test to formally assess variance equality if needed.
Related Calculators
Sources & References (5) ▾
- Student (Gosset) 1908 — 'The Probable Error of a Mean' — Biometrika
- Fisher — Statistical Methods for Research Workers — Oliver and Boyd
- NIST/SEMATECH Engineering Statistics Handbook — t-Test — NIST
- Khan Academy — Significance Tests (t-statistic) — Khan Academy
- Stanford CS109 — Probability for Computer Scientists — Stanford University