Email addresses added to HIBP are extracted from data breaches via a regular expression published in the open source Email Address Extractor app. The only requirements for an email address to be added to HIBP are that it appears in a breach and adheres to the standard email address format documented in the aforementioned project. This means that any of the following addresses could appear against a domain in a data breach:
- Addresses belonging to accounts that currently exist on the domain
- Addresses belonging to accounts that previously existed on the domain
- Addresses that adhere to the correct email address pattern but never actually existed
As there is no automated way of validating the existence of a mailbox on a domain at scale, it's possible that some domains will show higher counts than expected. As well as those belonging to previous employees, addresses that don't currently exist could be due to someone mistyping the alias or in some cases, aliases being fabricated for spam purposes. In the latter case, some breaches are flagged as spam lists and the count excluded from the calculations used for the domain search subscription size.
The total size of a domain is the single practical metric we have available to calculate the subscription tier it belongs to. Knowing the total size of the domain across all data breaches is a key component of assigning it to a commensurate product.