I want to investigate how do UTF16 and UTF32 work by looking specific characters in binary.
I know we can user
"string".encoding("(encoding name)")
to check its hex value in specific encoding and it works fine with UTF8.
but when it comes to UTF16 or 32, I found the result is different from the encodnig value it supposed to be.
for example, the first letter “あ” in Japanese, accordting to https://www.compart.com/en/unicode/U+3042
the hex value of UTF8,16,32 are
E38182, 3042, 00003042
so if I execute the following code
print("あ".encode('utf-8'))
print("あ".encode('utf-16BE'))
print("あ".encode('utf-32BE'))
I will get
b'\xe3\x81\x82'
b'0B'
b'\x00\x000B'
as you can see, utf8 is identical with the code table, but 16 and 32 are wired…
No idea how can 000B convert to 3042, do I misunderstand something of the encode method?