Unicode in Computer Network - GeeksforGeeks (2024)

Last Updated : 22 Mar, 2023

Improve

Unicode is a universal encoding system to provide a comprehensive character set and was created by the Unicode Consortium (a group of multilingual software manufacturers). Unicode simplifies software localization and improves multilingual text processing. It overcomes the difficulty inherent in ASCII and extended ASCII. Unicode has standardizes script behavior which allows any combination of characters, drawn from any combination of scripts and languages, to co-exist in a single document. Unicode defines multiple encodings of its single character set: UTF-7, UTF-8, UTF-16, and UTF-32. Conversion of data among these encodings is lossless. Unicode was originally a 2-byte character set. Unicode version 3, however, is a 4-byte code and is fully compatible with ASCII and extended ASCII. These all support encoding the same set of characters.

  • UTF-8 uses anywhere from 1 to 4 bytes per character depending on character, but ASCII take only 1 byte and 4 bytes for unusual ones.
  • UTF-16 uses 2 bytes for most characters, while very unusual characters take 4.
  • UTF-32 uses 4 bytes per character. We can calculate the number of characters in a UTF-32 string by only counting bytes.

The notation uses hexadecimal digits in format as follows. U-XXXXXXXX – The numbering goes from U-00000000 to U-FFFFFFFF. Unicode divides the available space codes into planes. A plane is a continuous group of 65,536 code points. The most significant 16 bits define the plane (i.e. number of planes = 65,535) and each plane can define up to 65,536 characters or symbols. Types of Plane –

  1. Basic multilingual plane (BMP) – Plane 0000, the basic multilingual plane is designed to be compatible with the previous 16-bit Unicode. The most significant 16-bits in this plane are all zeroes. It mostly defines character sets in different languages with the exception of some control and special characters. It is represented as U+XXXX where XXXX is the least significant 16-bits, eig.,: U+0900 to U+09FF reserved for Devanagari, Bengali U+2200 to U+22FF reserved for a mathematical operation etc.
  2. Supplementary multilingual plane (SMP) – Plane 0001, the supplementary multilingual plane, is designed to provide more codes for those multilingual characters that are excluded in the BMP. Example: 10140-1018F are reserved for Ancient Greek Numbers.
  3. Supplementary ideography plane (SIP) – Plane 0002, the supplementary ideography plane, is designed to provide codes for ideographic symbols, symbols that provide an idea in contrast to a sound, e.g., 20000-2A6DF are reserved for CJK Unified Extension B
  4. Supplementary special plane (SSP) – 000E, the supplementary special plane, is used for special characters, e.g., E0000-E007F are reserved for tags.
  5. Private use planes (PUPs) – Planes 000F and 0010, private use planes are for private use. They are used by fonts internally to refer to auxiliary glyphs.

Advantages:

Universal character set: Unicode supports almost all the characters and symbols used in the world’s writing systems, making it a universal character set that can be used to represent text in any language.

Interoperability: Unicode provides interoperability between different computing systems, platforms, and software applications. This means that text encoded in Unicode can be exchanged and displayed correctly across different systems, regardless of the language or script used.

Compatibility: Unicode is compatible with all the major computing platforms, including Windows, macOS, Linux, and mobile devices. This makes it easy to share and display text across different devices and platforms.

Efficient storage: Unicode uses a fixed-length encoding scheme, which makes it more efficient in terms of storage and memory usage than other encoding standards.

Disadvantages:

Complexity: Unicode is a complex encoding standard that can be difficult to implement and use correctly. It requires a significant amount of knowledge and expertise to correctly encode, store, and display text in Unicode.

Compatibility issues with legacy systems: Some legacy systems and software applications may not support Unicode or may not display Unicode characters correctly. This can cause compatibility issues when exchanging text across different systems.

Large character set: Unicode’s large character set can be a disadvantage in some applications, where only a small subset of characters is needed. This can result in larger file sizes and increased memory usage.

Localization: While Unicode supports most of the world’s writing systems, it may not be sufficient for some localization requirements, such as the need for specialized symbols or characters that are unique to a particular language or culture.

Reference – Unicode – msdn.microsoft Data Communication and Networking – Forounzan


Unicode in Computer Network - GeeksforGeeks (1)

GeeksforGeeks

Improve

Next Article

Computer Networks | Set 9

Please Login to comment...

Unicode in Computer Network - GeeksforGeeks (2024)
Top Articles
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 6253

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.