Character & Numeral Encoding Systems In Computing:
Character and numeral encoding in computing refers to the process of representing characters and numbers in a format that can be understood and processed by a computer. It involves assigning unique codes or binary representations to each character or numeral to facilitate storage, retrieval, and manipulation of textual or numerical data within a computer system.Character encoding:
Character encoding is a system that defines how characters are represented and stored in computers. It is a way of assigning unique numeric codes to characters so they can be processed and displayed correctly. Computers use binary code (0s and 1s) to represent and manipulate data, and character encoding provides a mapping between these binary values and human-readable characters. ASCII (American Standard Code for Information Interchange): ASCII is one of the earlist and most well-known standards. Its character encoding scheme represents basic Latin characters (English alphabet, numbers, punctuation) using 7-bit binary codes. Over time, as computing systems expanded globally, ASCII has been extended and replaced by more comprehensive character encoding schemes like Unicode. Unicode: Unicode is a character standard that maps a unique code to each character from various writing systems. Unicode itself is not an encoding; - It is a standard for defining the characters and their numerical representation. Unicode is the de-facto standard for character representation in modern computing. Below are some of encoding schemes that implement Unicode. UTF-8: (Unicode Transformation Format - 8 bit) Public encoding standard that encodes any Unicode character into binary. It has a unique binary code that is mapped to each character. If a character is common, like "a" or "1", UTF-8 uses just one byte (8 bits) to represent it. If a character is less common it may more bytes (up to 4). UTF-8 encoding allows conversion back and forth between the characters we recognize and the binary code computers use. It is designed to be backwards compatible with ASCII. UTF-16: While UTF-8 uses 1 - 4 bytes for each character, UTF-16 uses 2 or 4 bytes. This can be more space efficient for languages where most characters are not represented withn one byte in UTF-8 (many Asian languages). Note is it less efficient for languages where most characters can be represented within one byte like English. UTF-32: Each character in UTF-32 uses 4 bytes. It is more space-consuming compared to UTF-8 and UTF-16. It can be good to calculate the space usage, since every character uses the same amount of space. It also makes it easy to find the Nth character in a string because every character has the same 4 byte size.Numeral encoding:
These numeral systems are used to represent numbers in different ways. But they can also be used to represent text - or any data, really - because at the end of the day, all data in a computer are numbers. Base2 (Binary) Fundemental numeral system that computers use. A computer can only understand 0s and 1s because of their design - they store and process information using electrical signals that can either be on(1) or off(0). Base8 (Octal) Uses eight symbols (0-7). It is less commonly used today, but you still might encounter it in certain context, such as file permissions in Unix-based systems. Base16 (Hexadecimal) Hexidecimal uses sixteen symbols 0-9 to represent 0-9, and A-F to represent 10-15. It is commonly used in memory addressing, color codes, and debugging because it can represent large binary values compactly. One digit in octal or hexadecimal corresponds to exactly 3 or 4 bits in binary, respectively, which makes conversion between these systems and binary very straightforward. Base32: Uses 32 distinct characters, often a-z and 2-7. commonly used in situations where data needs to be stored and transferred over systems that are designed to handle text. They can encode any binary data into printable strings, and decode those strings back into the original binary data. Base64: Uses 64 symbols, typically uppercase A-Z, the lowercase a-z, the digits 0-9, and the symbols + and /. In a nutshell, both numeral encodings and character encoding schemes deal with representing data in a specific way, but they're used for different purposes and work at different levels. Character encoding schemes are about converting characters to bytes and back. The way these bytes are represented in memory is irrelevant to the encoding scheme. For example, the byte 10000001 can be represented as '81' in hexadecimal. However, '81' in hexadecimal doesn't say anything about what character or data it represents, unless you know how to interpret it (using, for example, an encoding scheme like UTF-8). An encoding like UTF-8 works at a higher level than base16, translating between human languages and machine data, while Base16 is a lower-level representation of the raw data. To be clear, you can convert hexadecimal data to readable text using UTF-8 encoding, but only if that data was originally text that was encoded with UTF-8. Remember that hexadecimal is just a representation of binary data, and UTF-8 is a mapping between binary data and characters. So, if you know that some hexadecimal data is the result of text that was encoded with UTF-8, you can convert it back into text.