Hash Collision Probability Calculator
Calculate the probability of hash collisions using the birthday problem formula. Compare MD5 (128-bit), SHA-1, SHA-256, and SHA-512 collision risk for your dataset size.
Collision Probability
—
Items for 50% Collision Chance —
Safety Assessment —
Extended More scenarios, charts & detailed breakdown ▾
Collision Probability
—
As Percentage (%) —
Professional Full parameters & maximum detail ▾
Collision Analysis
Hash Collision Probability —
GUID Collision Probability —
Recommendation
Security Recommendation —
How to Use This Calculator
- Select the hash bit length (128=MD5, 160=SHA-1, 256=SHA-256, 512=SHA-512).
- Enter the number of items you plan to hash.
- Read the collision probability and safety assessment instantly.
- Use Hash Comparison tab to compare all four algorithms at once.
- Use Probability Threshold tab to find how many items produce a given collision risk.
- Professional tab adds GUID comparison and NIST security recommendations.
Formula
P(collision) ≈ 1 − e^(−n²/(2·2^bits))
50% threshold n ≈ √(2 · 2^bits · ln 2)
Example
SHA-256 (256-bit), 1 billion items: P ≈ 10⁻⁶¹ (negligible). MD5 (128-bit) 50% threshold: ~2.2×10¹⁹ items.
Frequently Asked Questions
- A hash collision occurs when two different inputs produce the same hash output. Hash functions like MD5 and SHA-256 map arbitrary data to a fixed-length fingerprint — MD5 produces 128 bits (32-character hex string), SHA-256 produces 256 bits. Because infinitely many inputs map to finitely many outputs, collisions must mathematically exist (by the pigeonhole principle). In security contexts, collisions matter enormously: digital signatures rely on hash uniqueness, so if an attacker crafts a malicious document with the same hash as a legitimate one, the signature becomes worthless. Collisions also break hash-based data structures and checksums. The severity depends on whether collisions can be deliberately crafted (cryptographic weakness) versus arising randomly (birthday probability), with the former being far more dangerous. SHA-1 collision attacks were demonstrated by Google's SHAttered project in 2017.
- The birthday problem asks: how many people in a room before there is a 50% chance two share a birthday? The answer — just 23 people for 365 days — surprises most people because we compare all pairs, not one person against all others. Pairs grow as n(n−1)/2 ≈ n²/2, so 23 people yield 253 pairs, giving reasonable collision odds. Hash collisions follow the same math: with a hash space of H = 2^bits, roughly √(2H·ln 2) hashed items are needed for a 50% collision chance. For MD5 (2^128), that threshold is roughly 2.2×10¹⁹ — enormous but exponentially smaller than the full 2^128 space. This "birthday bound" is why hash security requires approximately twice the bit length of the desired security level. A 128-bit hash provides only 64-bit birthday security — fine for most applications but why SHA-256 (128-bit birthday security) is preferred for sensitive use cases.
- MD5 is cryptographically broken and must not be used for security purposes. In 2004, Xiaoyun Wang demonstrated practical MD5 collision attacks in hours. By 2008, researchers created rogue SSL certificates exploiting MD5 collisions. Today MD5 collision pairs can be generated in seconds on consumer hardware. RFC 6151 (2011) formally deprecated MD5 for HMAC and all security applications. NIST SP 800-131A similarly prohibits MD5 for digital signatures. However, MD5 remains acceptable for non-security purposes: detecting accidental file corruption, verifying download integrity where tampering is not a threat model, or as a fast non-cryptographic hash in hash tables and deduplication. The key distinction is accidental corruption (MD5 adequate) versus adversarial attack (MD5 broken). For passwords, never use MD5 or SHA-256 directly — use bcrypt, Argon2id, or scrypt, which are designed to be slow and GPU-resistant.
- The 50% collision threshold scales as √(2^bits × ln 2), doubling with every added bit of hash length. For MD5 (128-bit): ~2.2×10¹⁹ items. For SHA-1 (160-bit): ~1.4×10²⁴ items. For SHA-256 (256-bit): ~4.8×10³⁸ items — more than the number of atoms in the observable universe. In practice, typical applications hash far fewer items. A database with 1 billion (10⁹) records using SHA-256 has a collision probability of approximately 10⁻⁶¹ — completely negligible. UUID/GUID v4 uses 122 random bits; you would need about 2.7×10¹⁸ GUIDs before reaching 50% collision probability, far beyond any realistic system. Only applications hashing astronomical numbers of items need to worry about birthday collisions with SHA-256 or stronger hash functions.
- Choice depends heavily on the security requirement. For cryptographic integrity (digital signatures, TLS, Git objects): SHA-256 is the current NIST standard and universally trusted. SHA-3 (Keccak) offers an alternative mathematical design useful as a hedge if SHA-2 weaknesses emerge. For password storage: never use raw SHA-256 — use bcrypt, Argon2id, or scrypt, which are deliberately slow and memory-hard to thwart GPU brute-force. For checksums and non-adversarial integrity: MD5 and CRC32 are fast and adequate. For high-performance applications (hash tables, deduplication): xxHash, MurmurHash3, and CityHash achieve several GB/s throughput. For message authentication: HMAC-SHA-256 provides both integrity and authenticity. BLAKE3 is an excellent modern choice combining cryptographic security with very high performance, outperforming SHA-256 on modern CPUs while offering the same security level.