Question 1

How many bytes is a character?

Accepted Answer

It depends on the character and encoding. In UTF-8, ASCII characters (English letters, digits, basic punctuation) are 1 byte each. Accented Latin characters and many other scripts are 2 bytes. CJK characters (Chinese, Japanese, Korean) are 3 bytes. Emojis are typically 4 bytes. In UTF-16, most characters are 2 bytes, and characters outside the Basic Multilingual Plane (including most emojis) are 4 bytes.

Question 2

What is UTF-8?

Accepted Answer

UTF-8 is a variable-width character encoding that can represent every Unicode character. It uses 1 to 4 bytes per character, with ASCII characters taking just 1 byte. UTF-8 is the dominant encoding on the web, used by over 98% of all web pages. It is backwards compatible with ASCII and efficient for English-heavy text while still supporting every writing system in the world.

Question 3

Why do some characters take more bytes?

Accepted Answer

UTF-8 is a variable-width encoding designed to be efficient for common characters while supporting the full Unicode range of over 140,000 characters. ASCII characters (the most common in English text) use just 1 byte. Less common characters need more bytes to address their higher Unicode code points. This is a space-time tradeoff: fixed-width encodings waste space on common characters, while variable-width encodings are compact for typical text.

Question 4

What's the difference between UTF-8 and ASCII?

Accepted Answer

ASCII is a 7-bit encoding that defines 128 characters: English letters, digits, punctuation, and control characters. It cannot represent accented characters, non-Latin scripts, or emojis. UTF-8 is a superset of ASCII — the first 128 characters are identical and use the same single byte. But UTF-8 extends to support all 140,000+ Unicode characters using multi-byte sequences.

Question 5

How do emojis affect byte size?

Accepted Answer

Emojis significantly increase byte size. A single emoji is typically 4 bytes in UTF-8. Compound emojis (like family emojis or flags) use Zero Width Joiner sequences, combining multiple code points into one visible character — these can be 15-25+ bytes for a single displayed emoji. This is why a short message with emojis can be much larger in bytes than a longer plain-text message.

Byte Size Calculator

What is Byte Size?

Character Encoding: UTF-8 vs UTF-16

ASCII and Unicode

How to Use This Calculator

Frequently Asked Questions