Unicode in Computer Network - GeeksforGeeks (2024)

Last Updated : 22 Mar, 2023

Improve

Unicode is a universal encoding system to provide a comprehensive character set and was created by the Unicode Consortium (a group of multilingual software manufacturers). Unicode simplifies software localization and improves multilingual text processing. It overcomes the difficulty inherent in ASCII and extended ASCII. Unicode has standardizes script behavior which allows any combination of characters, drawn from any combination of scripts and languages, to co-exist in a single document. Unicode defines multiple encodings of its single character set: UTF-7, UTF-8, UTF-16, and UTF-32. Conversion of data among these encodings is lossless. Unicode was originally a 2-byte character set. Unicode version 3, however, is a 4-byte code and is fully compatible with ASCII and extended ASCII. These all support encoding the same set of characters.

UTF-8 uses anywhere from 1 to 4 bytes per character depending on character, but ASCII take only 1 byte and 4 bytes for unusual ones.
UTF-16 uses 2 bytes for most characters, while very unusual characters take 4.
UTF-32 uses 4 bytes per character. We can calculate the number of characters in a UTF-32 string by only counting bytes.

The notation uses hexadecimal digits in format as follows. U-XXXXXXXX – The numbering goes from U-00000000 to U-FFFFFFFF. Unicode divides the available space codes into planes. A plane is a continuous group of 65,536 code points. The most significant 16 bits define the plane (i.e. number of planes = 65,535) and each plane can define up to 65,536 characters or symbols. Types of Plane –

Advantages:

Universal character set: Unicode supports almost all the characters and symbols used in the world’s writing systems, making it a universal character set that can be used to represent text in any language.

Interoperability: Unicode provides interoperability between different computing systems, platforms, and software applications. This means that text encoded in Unicode can be exchanged and displayed correctly across different systems, regardless of the language or script used.

Compatibility: Unicode is compatible with all the major computing platforms, including Windows, macOS, Linux, and mobile devices. This makes it easy to share and display text across different devices and platforms.

Efficient storage: Unicode uses a fixed-length encoding scheme, which makes it more efficient in terms of storage and memory usage than other encoding standards.

Disadvantages:

Complexity: Unicode is a complex encoding standard that can be difficult to implement and use correctly. It requires a significant amount of knowledge and expertise to correctly encode, store, and display text in Unicode.

Compatibility issues with legacy systems: Some legacy systems and software applications may not support Unicode or may not display Unicode characters correctly. This can cause compatibility issues when exchanging text across different systems.

Large character set: Unicode’s large character set can be a disadvantage in some applications, where only a small subset of characters is needed. This can result in larger file sizes and increased memory usage.

Localization: While Unicode supports most of the world’s writing systems, it may not be sufficient for some localization requirements, such as the need for specialized symbols or characters that are unique to a particular language or culture.

Reference – Unicode – msdn.microsoft Data Communication and Networking – Forounzan

GeeksforGeeks

Improve

Computer Networks | Set 9

Unicode in Computer Network - GeeksforGeeks (2024)

Advantages:

Disadvantages:

Please Login to comment...