- Last Updated: [[2021-01-26]] - [[Converting between byte code and unicode in Python]] - [[Unicode]] characters and strings - [[ASCII]]: American Standard Code for Information Interchange - assigned numbers to characters so that computers can store them - `ord()` tells us the number corresponding to a character. - Unicode is a universal code so that (for example) American and Japanese computers can talk to each other. Previously, they had used different representations of characters (and different characters as well). - UTF-8, -16, -32 - UTF-8 is getting awesomer and awesomer - it's now the best way to represent characters. - In Python 3, all strings are Unicode. This was different in Python 2. - In Python 3, byte strings are a different class (in 2, byte strings were the same as strings). - When we talk to a network socket, the response is in bytes, so we have to decode the bytes into string before we can do something about it. - ```python while True: data = mysock.recv(512) if ( len(data) < 1): break mystring = data.decode() print(mystring)``` - `decode()` will convert bytes to unicode. - ```javascript import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('nicolevanderhoeven.com', 80)) cmd = 'GET https://nicolevanderhoeven.com'.encode() mysock.send(cmd)``` - [[UTF-8]] is what's used when talking to other machines; within our own machines (using Python) we use [[Unicode]]. - We need to `encode()` requests we send and `decode()` responses we get back. - `range()` prints numbers in a sequence (zero-based) - ```python x = range(5) for n in x: print(n)``` - will yield: - ```python 0 1 2 3 4``` - `split()` will, by default, split a string by spaces, unless a character is specified. - `len()` counts how many elements are in an array. 1-based