P-value Calculator - Z, t, Chi-square & F-test

P-value Calculator

Calculate p-values from Z-score, t-score, chi-square (χ²), or F-statistic. Choose one or two-tailed tests, set your significance level, and get an instant reject / fail-to-reject decision.

Ad content will appear here

Test statistic distribution

Use when your test statistic follows the standard normal distribution N(0, 1). Common for large-sample tests of population means or proportions.

Alternative hypothesis (tail)

H₁: parameter ≠ value (extreme in either direction)

Z-score

Significance level (α)

How to Use P-value Calculator

Pick the Distribution

Select the distribution your test statistic follows: Z (standard normal), t (Student’s t), χ² (chi-square), or F (Fisher–Snedecor). The choice depends on the test you ran.

Choose the Tail

Pick the alternative hypothesis: two-tailed (H₁: parameter ≠ value), left-tailed (parameter <), or right-tailed (parameter >). For χ² goodness-of-fit, independence, and ANOVA F-tests, use right-tailed.

Enter Statistic & Degrees of Freedom

Type your test statistic value. For t and χ² add the degrees of freedom; for F add both numerator (d₁) and denominator (d₂) degrees of freedom. Optionally adjust the significance level (default α = 0.05).

Read the Result

Get the exact p-value, the reject / fail-to-reject H₀ decision at your chosen α, plus a quick comparison against the three most common significance thresholds (0.05, 0.01, 0.001).

Frequently Asked Questions

The p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis (H₀) is true. A small p-value (typically ≤ 0.05) means your observed result would be unusual if H₀ were true, providing evidence to reject H₀ in favor of the alternative hypothesis (H₁). A large p-value means your data is consistent with H₀ and you fail to reject it.

For a standard normal distribution N(0, 1) with cumulative distribution function Φ(z): left-tailed p-value = Φ(z); right-tailed p-value = 1 − Φ(z); two-tailed p-value = 2 × Φ(−|z|), or equivalently 2 × (1 − Φ(|z|)). For example, a two-tailed Z = 1.96 gives p ≈ 0.0500 and Z = 2.96 gives p ≈ 0.0031.

It uses the cumulative distribution function of the Student’s t-distribution with the degrees of freedom you provide. Internally we use the standard relation cdfₜ(t, df) = 1 − 0.5 · Iₓ(df/2, 1/2) for t > 0 (where x = df / (df + t²) and Iₓ is the regularized incomplete beta function), and the symmetric reflection for t < 0. Two-tailed p-value = 2 × min(cdf, 1 − cdf).

Use a two-tailed test when you only care that the parameter differs from the null value, in either direction (this is the conservative default). Use a one-tailed test only when you have a specific directional hypothesis decided before looking at the data — for example, "the new treatment increases survival rate" (right-tailed) or "the failure rate is lower than 1%" (left-tailed). Picking the tail based on what you observed in the data inflates your false positive rate.

Rejecting H₀ means your data is unlikely to have occurred by chance alone if H₀ were true. It does NOT prove H₁ is true — it just means there is statistically significant evidence against H₀ at your chosen significance level (α). Failing to reject H₀ also doesn’t prove H₀ is true; it just means you don’t have enough evidence to reject it.

The conventional default is α = 0.05 (5% chance of a Type I error). Stricter thresholds like α = 0.01 or α = 0.001 are common in medical, pharmaceutical, and physics research where false positives are costly. The threshold should be chosen before running the test, based on the consequences of a wrong rejection. The calculator shows you significance at all three common thresholds simultaneously so you can see how your conclusion would change.

Chi-square (χ²) p-values are used for three common tests: (1) goodness-of-fit — does an observed distribution match an expected one? (2) test of independence — are two categorical variables related? (3) test of variance — does a sample variance match a hypothesized value? Goodness-of-fit and independence tests are right-tailed (large χ² = poor fit). The variance test can be one- or two-tailed.

F-test p-values are used in ANOVA (comparing means of three or more groups), regression overall significance tests, and tests for equality of two variances. ANOVA and regression F-tests are inherently right-tailed: large F values indicate that group means differ or that the regression model explains a significant amount of variance. Two degrees of freedom are required: numerator (d₁) and denominator (d₂).

No. The p-value is a probability and is always between 0 and 1 inclusive. If a calculator returns a value outside this range, there is a numerical error. For very extreme test statistics, the p-value can be astronomically small (e.g., 10⁻²⁰), but mathematically still positive. We display p-values smaller than 1 × 10⁻¹⁰ in scientific notation or as "< 1 × 10⁻¹⁰" since IEEE 754 floating-point precision becomes meaningful at that range.

No — this is a common misinterpretation. The p-value only tells you about statistical significance (probability of seeing this result by chance under H₀). It says nothing about the size of the effect or its practical importance. With a large enough sample size, even tiny, meaningless effects can produce very small p-values. Always report the effect size (e.g., difference in means, correlation coefficient, odds ratio) and a confidence interval alongside the p-value.

The calculator uses standard numerical methods from the literature: the Abramowitz & Stegun approximation of the error function (erf) for normal CDF (max error ~1.5×10⁻⁷), the Lanczos approximation for log-gamma, and continued fraction expansions for the regularized incomplete gamma and incomplete beta functions (per Numerical Recipes). Results match major statistical software (R, SciPy, GraphPad Prism) to at least 5–6 significant digits across normal use cases.

About P-value Calculator

The P-value Calculator computes the exact p-value for a hypothesis test using the four most common test-statistic distributions in classical statistics: the standard normal Z, the Student’s t, the chi-square χ², and the Fisher–Snedecor F. Whether you are interpreting a Z-test for a proportion, a t-test for a mean, a chi-square goodness-of-fit test, or an ANOVA F-test, this calculator gives you the p-value and the reject / fail-to-reject decision in one step.

Hypothesis testing follows a consistent five-step structure. First, state a null hypothesis (H₀) and an alternative hypothesis (H₁). The null hypothesis is the default position you want to test against — typically "no effect," "no difference," or "no relationship." The alternative is the claim you want to provide evidence for. Second, decide on a significance level α (commonly 0.05, 0.01, or 0.001) before looking at the data. Third, compute the test statistic from your data. Fourth, calculate the p-value — the probability of seeing a result at least as extreme as yours if H₀ were true. Fifth, compare: if p-value ≤ α, reject H₀; otherwise, fail to reject.

Choosing the correct distribution depends on the test you ran. Use a Z-distribution for large-sample tests of population means or proportions when the population standard deviation is known (or the sample is large enough that the central limit theorem applies, typically n ≥ 30). Use the t-distribution when testing means with an unknown population standard deviation, especially in small samples; the t-distribution looks like the normal distribution but with heavier tails to account for the additional uncertainty in estimating σ from the sample. Use the chi-square distribution for goodness-of-fit tests, tests of independence in contingency tables, and tests of variance. Use the F-distribution for ANOVA, regression overall significance, and ratio-of-variances tests — it requires two separate degrees-of-freedom parameters (numerator d₁ and denominator d₂).

The choice between one-tailed and two-tailed tests is one of the most consequential decisions in hypothesis testing. A two-tailed test asks "is the parameter different from the hypothesized value, in either direction?" A one-tailed test asks "is the parameter strictly greater (or strictly less) than the hypothesized value?" The two-tailed test is more conservative — it requires a more extreme statistic to achieve the same significance level. The convention in most fields is to use two-tailed tests by default and only use one-tailed tests when there is a strong a priori directional hypothesis. Critically, you must decide which to use BEFORE looking at the data; choosing the tail to match what you observed is a form of p-hacking that inflates Type I error rates.

The mathematical foundations are well-established. For a standard normal distribution N(0, 1) with cumulative distribution function Φ(z), the right-tailed p-value is 1 − Φ(z), the left-tailed is Φ(z), and the two-tailed is 2 × (1 − Φ(|z|)). For the t-distribution we use the relation cdfₜ(t, df) = 1 − 0.5 · Iₓ(df/2, 1/2) where x = df / (df + t²) and Iₓ is the regularized incomplete beta function. For chi-square the CDF is the regularized lower incomplete gamma function P(df/2, x/2). For the F-distribution, F(x, d₁, d₂) = I_{d₁x/(d₁x + d₂)}(d₁/2, d₂/2). These special functions are computed numerically using continued fraction expansions with sub-microscopic error tolerances; results match major statistical software (R, Python’s SciPy, GraphPad Prism, SAS) to five or six significant digits across all reasonable inputs.

Common pitfalls in p-value interpretation deserve special attention. First, a small p-value does not measure the size of an effect — it only measures the strength of evidence against the null hypothesis. With a large enough sample size, statistically significant p-values can correspond to effects so small they are practically meaningless. Always report effect sizes (Cohen’s d, correlation coefficients, odds ratios) and confidence intervals alongside p-values. Second, "p > 0.05" does not mean "the null hypothesis is true" — it just means you don’t have enough evidence to reject it. Absence of evidence is not evidence of absence. Third, p-hacking (running many tests, selecting the favorable ones, choosing the tail post-hoc, or stopping data collection when p drops below 0.05) destroys the meaning of the p-value entirely; pre-registration of analyses is the gold standard remedy. Fourth, the p-value depends on the assumed distribution — if your data violates the underlying assumptions (e.g., non-normal data analyzed with a t-test on a small sample), the p-value can be misleading.

The calculator displays your result against the three most common significance thresholds (α = 0.05, 0.01, and 0.001) at once. This gives you a quick sense of how robust your finding is. A result that is significant at α = 0.05 but not at α = 0.01 is borderline and warrants caution; a result significant at α = 0.001 is strong even by conservative standards. The exact p-value is also shown in scientific notation for extremely small values (where precision matters for meta-analyses or when comparing to literature thresholds).

For practitioners new to hypothesis testing, the workflow is straightforward: identify the test you ran (Z, t, χ², or F), enter the test statistic, enter the degrees of freedom (if applicable), pick the tail based on your alternative hypothesis, set the significance level, and read the result. The "Try example" button preloads a classic example for each distribution so you can see how the inputs and outputs relate. Together with effect sizes and confidence intervals, the p-value remains a foundational tool in scientific inference — imperfect, often misused, but irreplaceable when applied thoughtfully.

Ad content will appear here