How To Validate Email Addresses in C#?

Numerous applications rely on email addresses to recognize users. Since email addresses are unique, they serve as a reliable way of verifying the authenticity of a user. However, sending emails to incorrect addresses can result in errors. This is why the format of an email address is typically checked after it is entered. While some developers utilize RegEx for this purpose, is it truly the best practice?

A Tour Around The World Wide Web

Obviously, a lot of programmers have reached out to the community for help with email address validation, and the web is filled with potential solutions.

The RegEx Approach

Many of these responses rely on RegEx. An old article from Wired lists four different expressions for this purpose.

Dirt-simple approach:

.+\@.+\..+

Slightly more strict (but still simple) approach:

A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}

Specify all the domain extensions approach:

([a-z0-9][-a-z0-9_\+\.]*[a-z0-9])@([a-z0-9][-a-z0-9\.]*[a-z0-9]\.(arpa|root|aero|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)|([0-9]{1,3}\.{3}[0-9]{1,3}))

The first two expressions are quite simple but lack strictness. The third option is stricter, though it requires updates every time a new domain extension is released. Since the article was written in 2008, many new domain extensions have been introduced. Additionally, the second expression needs to be updated, as domain extensions can now be longer than four characters.

The article also mentions an expression from a Perl module with almost 6500 characters. Just for the fun of it, I tried to use the RegEx Source Generator in .net to generate C# code for the expression:

[GeneratedRegex("(? ... )?;\\s*)", RegexOptions.Compiled)]
private static partial Regex IsEmailAddressRegex();

The final output was around 30000 lines of C# code, which included roughly 1300 lines of XML docs that described the process. Generation and execution were surprisingly fast. However, I would advise against using this, as the ServiceHub.RoslynCodeAnalysisService.exe task seems to have a memory leak, causing memory usage to grow by 100 MB per second with around 10% CPU usage, and it doesn’t seem to stop.

These regular expressions are quite old, and it’s possible that more effective alternatives have been created since then. Nevertheless, we can take away that they require regular maintenance. Just because a particular expression was effective in the past doesn’t mean it’s still suitable for use today. You also don’t want to risk a ReDoS attack (regular expression denial of service) given the disappointing results it can produce.

Using the MailAddress Class

Who knows better if an input is an email address than the System.Net.Mail.MailAddress class itself. This approach is mentioned in several cases. This class has a TryCreate method which parses a given input and returns whether the input is an email address or not. The method is even able to split a display name from an email address (what you get when an address is copied in Outlook). The input "John Doe" <john@doe.com> would result in a MailAddress object with DisplayName “John Doe” and EmailAddress “john@doe.com”.

Unfortunately, this extended parsing logic leads to an unexpected result. While john@doe.com is obviously an email address "John Doe" <john@doe.com>, the actual input, isn’t. MailMessage.TryCreate would return true anyway. By comparing the input with the Address property, we can solve this problem:

public static bool IsEmailAddress(string? input)
{
    return (MailAddress.TryCreate(input, out MailAddress? mailAddress) &&
            mailAddress.Address == input);
}

Using the EmailAddressAttribute Class

A more creative solution is using System.ComponentModel.DataAnnotations.EmailAddressAttribute. This is the Attribute that is used to validate a property value in the model binding process.

public static bool IsEmailAddress(string? input)
{
    if (input == null)
    {
        return false;
    }

    EmailAddressAttribute attribute = new EmailAddressAttribute();
    return attribute.IsValid(input);
}

Although it works, an Attribute is not intended to be used that way.

This class has a long history. Initially, in the .NET Framework, it used RegEx for input validation. However, this changed in 2015 due to the vulnerability advisory CVE-2015-2526, which pointed out that the expression could trigger denial-of-service attacks, as I previously mentioned. The validation was then changed to a simpler logic: the input must contain at least one @, and it cannot be at the start or end. This basic logic is still in use today, but got some performance improvements.

Not everyone is happy with this simplified solution, since it allows email addresses like x@y or x!x@x.x, but this brings us to the next chapter.

What is a Valid Email Address?

The @ symbol to separate the local part with the host was already introduced in the year 1971, to send messages between two computers on the Arpanet. The format of email addresses as we know it today was, based on other RFCs, specified in the Section 3.4 of the RFC 2822 in the year 2001 and improved with RFC 5322 in 2008. Since then, an email address is made up from a local part, the @ symbol, and a domain. But an email address can be more complicated than expected.

Valid Domain Parts

The domain part may be a domain like gmail.com or outlook.com, surprisingly it can also be an IP address. IP addresses must be surrounded by square brackets. Therefore, the following email addresses are possible:

  • info@[123.123.123.123]
  • info@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]

I tried to send an email to my outlook.com address but used only the IPv4 address. The Office Outlook app didn’t even let me send the email because the format of the email address is not supported. The Outlook online app in the browser on the other hand was ok with the IP address. However, the sending provider (also outlook.com) was not able to process the email. I didn’t try it with other providers, but I expect to get similar results.

Knowing that the domain could also be an IP address makes the validation more complicated. Of course, there are good reasons to disallow email addresses with IPs in the domain part, probably most mail providers won’t support IP domains. But at least the domain part is case-insensitive, unlike the local part.

Valid Local Parts

The local part of the email address is everything before the last @ symbol. And yes, it can be case-sensitive. The local part can contain letters, digits and a set of special characters. The local part can also contain spaces, horizontal tabs, brackets and even @ symbols. However, these symbols must be in quotes. These are possible email addresses:

  • john/doe@example.com
  • " "@example.org
  • c#@example.org
  • "b@man & robin! ;-)"@example.com

Only ASCII characters are permitted for email addresses, but some providers also support SMTPUTF8 which allows UTF-8 characters (including emojis).

Just because the specification allows email addresses to contain spaces and special characters, doesn’t mean that mail providers support it. Handling the local part is entirely done by the provider. Most providers obviously don’t implement the local part as case-sensitive. Gmail even ignores dots and the following email addresses go all in the same mailbox:

  • john.doe@gmail.com
  • JohnDoe@gmail.com
  • J.o.h.n.D.o.e@gmail.com

Gmail and other providers also support sub-addressing which allows you to add a tag to the email address with a + symbol: john.doe+amazon@gmail.com. This can be used to identify where you entered an email address. The specification also allows comments in brackets in the local part and the domain: john.doe(amazon)@(this)example.com. This is again something that most providers don’t support, and the Outlook app doesn’t even allow us to send emails to domains with comments.

The local part of email addresses can be rather complex, with each provider having its own unique set of rules. It’s unlikely you’ll encounter a genuine email address that includes spaces, tabs, or quotes, as most people would likely abandon such an address because of the problems they encounter. However, this article focuses on the validation of email addresses, and indeed, unusual email addresses can exist. This Wikipedia article describes more rules and has numerous examples as well.

Length

The length of an email address is easy to validate and required to check if the email address is stored in a fixed-size SQL database column. There are several different specifications for the length. A StackOverflow answer references RFC 3696 which specifies a maximum length of 254 characters. Another possible maximum length is 320 characters, according to RFC 5321. The length of 320 is the sum of 64 for the local part, 1 @ symbol and 255 for the domain.

Checking the length is recommended, either 254 or 320 characters. As a reference, the local part of outlook.com addresses are limited to 65 characters (total size 77), gmail.com only allows 30 characters (total size 40).

Applying the New Insights

Can we make sure that an email address is valid? No, but we can make good guesses. Depending on your use case, you can check for the widely used rules (no IP domain and limited character set for the local part) or you can just check for the existence of an @ symbol. To really validate an email address, you must be able to send an email to that address and the expected receiver must be able to receive the email. Even a simple email address could have a typo and therefore reach the wrong receiver. If you can send it and the correct receiver can receive it, the format shouldn’t matter.

With Postel’s Law in mind (“be conservative in what you do, be liberal in what you accept from others”), I would use something simple like the check in EmailAddressAttribute, but as a single method in a static class:

public static bool IsEmailAddress([NotNullWhen(true)] string? input)
{
    if (input == null)
    {
        return false;
    }

    int inputLength = input.Length;

    if (inputLength < 3 || inputLength > 254)
    {
        return false;
    }

    ReadOnlySpan<char> inputAsSpan = input.AsSpan();

    if (inputAsSpan.ContainsAny('\r', '\n'))
    {
        return false;
    }

    int indexOfAtSymbol = inputAsSpan.IndexOf('@');

    return (indexOfAtSymbol > 0 && indexOfAtSymbol < inputLength - 1);
}

This method can be easily implemented in any programming language, runs efficiently, and is immune to RegEx attacks. Yes, it is trivial to enter an invalid email address, but it also won’t block an unexpected, valid address.

When an email address is stored, it is important not to alter its case to either lower or upper, as this may affect the intended recipient. However, when searching or comparing email addresses, make sure to handle them in a case-insensitive manner.

Conclusion

In this article, I discussed the complexity of email addresses. Clearly, validating them can be difficult. The final validation method is not very sophisticated, but it accomplishes its goal. Depending on the use case, more checks can be implemented. However, users still can make typos or enter non-existent email addresses. If an email can be sent and received, the specific format shouldn’t be a concern.