Skip to main content

Normalization and Encoding

This page provides information about normalizing and encoding personal data. It's important that, in working with EUID, normalizing and encoding are performed correctly.

Introduction

When you're taking user information such as an email address, and following the steps to create a raw EUID and/or an EUID advertising token, it's very important that you follow all the required steps. Whether you normalize the information or not, whether you hash it or not, follow the steps exactly. By doing so, you can ensure that the EUID value you create can be securely and anonymously matched up with other instances of online behavior by the same user.

important
  • Raw EUIDs, and their associated EUID tokens, are case sensitive. When working with EUID, it's important to pass all IDs and tokens without changing the case. Mismatched IDs can cause ID parsing or token decryption errors.
  • If you miss any of the required steps—for example, you hash without first normalizing—the result will not be the correct valid EUID value for the input data.
    For example, let's say a data provider wants to generate an EUID from JANESaoirse@gmail.com. This normalizes to janesaoirse@gmail.com, and the hashed and Base64-encoded value is ku4mBX7Z3qJTXWyLFB1INzkyR2WZGW4ANSJUiW21iI8=.
    The publisher, with the same email address, by mistake does not normalize. The hashed and Base64-encoded value for the un-normalized email, JANESaoirse@gmail.com, is f8upG1hJazYKK8aEtAMq3j7loeAf5aA4lSq6qYOBR/w=. These two different values result in two different EUIDs. The first, processed correctly, matches other instances generated from the same original data. The second, incorrectly processed, does not.
    In this scenario, because the EUID does not match other instances for the same user, the publisher misses the opportunity to benefit from targeted advertising.

Types of Personal Data

EUID supports the following type of personal data:

  • Email address

Email Address Normalization

If you send unhashed email addresses to the EUID Operator Service, the service normalizes the email addresses and then hashes them. If you want to hash the email addresses yourself before sending them, you must normalize them before you hash them.

important

Normalizing before hashing ensures that the generated EUID value will always be the same, so that the data can be matched. If you do not normalize before hashing, this might result in a different EUID, reducing the effectiveness of targeted advertising.

To normalize an email address, complete the following steps:

  1. Remove leading and trailing spaces.
  2. If there are uppercase characters, convert them to lowercase.
  3. In gmail.com addresses only:
    1. If there is a period (.) in the address (ASCII decimal code 46/UTF-8 hexadecimal code 2E), remove it.

      For example, normalize jane.doe@gmail.com to janedoe@gmail.com.

    2. If there is a plus sign (+) with an additional string after it, before the @gmail.com, remove the plus sign (+) (ASCII decimal code 43/UTF-8 hexadecimal code 2B) and all subsequent characters.

      For example, normalize janedoe+home@gmail.com to janedoe@gmail.com.

warning

Make sure that the normalized email is UTF-8, not another encoding system such as UTF-16.

For examples of various scenarios, see Normalization Examples for Email.

Email Address Hash Encoding

An email hash is a Base64-encoded SHA-256 hash of a normalized email address. The email address is first normalized, then hashed using the SHA-256 hashing algorithm, and then the resulting bytes of the hash value are encoded using Base64 encoding. Note that the Base64 encoding is applied to the bytes of the hash value, not the hex-encoded string representation.

TypeExampleComments and Usage
Normalized email addressuser@example.comNormalization is always the first step.
SHA-256 hash of normalized email addressb4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514This 64-character string is a hex-encoded representation of the 32-byte SHA-256.
Hex to Base64 SHA-256 encoding of normalized email addresstMmiiTI7IaAcPpQPFQ65uMVCWH8av9jw4cwf/F5HVRQ=This 44-character string is a Base64-encoded representation of the 32-byte SHA-256.
WARNING: The SHA-256 hash string in the example above is a hex-encoded representation of the hash value. You must Base64-encode the raw bytes of the hash or use a Base64 encoder that takes a hex-encoded value as input.
Use this encoding for email_hash values sent in the request body.
important

When applying Base64 encoding, be sure to Base64-encode the raw bytes of the hash or use a Base64 encoder that takes a hex-encoded value as input.

For additional examples, see Normalization Examples for Email.

Normalization Examples for Email

The following table shows examples of original email addresses and the normalized and hashed values.

Some of the examples show email addresses that include the plus sign (+), with different domains. For gmail addresses, the plus sign and following characters, up to the @ sign, are ignored in normalization. For other domains, these characters are included in the normalized value.

Original ValueNormalizedHashed and Base64-Encoded
MyEmail@example.com
MYEMAIL@example.com
myemail@example.comHashed: 16c18d336f0b250f0e2d907452ceb9658a74ecdae8bc94864c23122a72cc27a5
Base64-Encoded: FsGNM28LJQ8OLZB0Us65ZYp07NrovJSGTCMSKnLMJ6U=
My.Email@example.commy.email@example.comHashed: e22b53bc6f871274f3a62ab37a3caed7214fc14d676215a96a242fcfada1c81f
Base64-Encoded: 4itTvG+HEnTzpiqzejyu1yFPwU1nYhWpaiQvz62hyB8=
JANESAOIRSE@example.com
JaneSaoirse@example.com
janesaoirse@example.comHashed: d6670e7a92007f1b5ff785f1fc81e53aa6d3d7bd06bdf5c473cdc7286c284b6d
Base64-Encoded: 1mcOepIAfxtf94Xx/IHlOqbT170GvfXEc83HKGwoS20=
jane.saoirse@example.com
Jane.Saoirse@example.com
jane.saoirse@example.comHashed: b196432c7b989a2ca91c83799957c515da53e6c13abf20b78fea94f117e90bf8
Base64-Encoded: sZZDLHuYmiypHIN5mVfFFdpT5sE6vyC3j+qU8RfpC/g=
JaneSaoirse+Work@example.comjanesaoirse+work@example.comHashed: 28aaee4815230cd3b4ebd88c515226550666e91ac019929e3adac3f66c288180
Base64-Encoded: KKruSBUjDNO069iMUVImVQZm6RrAGZKeOtrD9mwogYA=
JANE.SAOIRSE@gmail.com
Jane.Saoirse@gmail.com
JaneSaoirse+Work@gmail.com
janesaoirse@gmail.comHashed: 92ee26057ed9dea2535d6c8b141d48373932476599196e00352254896db5888f
Base64-Encoded: ku4mBX7Z3qJTXWyLFB1INzkyR2WZGW4ANSJUiW21iI8=