- Last Updated: [[2021-01-26]]
- [[Converting between byte code and unicode in Python]]
- [[Unicode]] characters and strings
- [[ASCII]]: American Standard Code for Information Interchange
- assigned numbers to characters so that computers can store them
- `ord()` tells us the number corresponding to a character.
- Unicode is a universal code so that (for example) American and Japanese computers can talk to each other. Previously, they had used different representations of characters (and different characters as well).
- UTF-8, -16, -32
- UTF-8 is getting awesomer and awesomer - it's now the best way to represent characters.
- In Python 3, all strings are Unicode. This was different in Python 2.
- In Python 3, byte strings are a different class (in 2, byte strings were the same as strings).
- When we talk to a network socket, the response is in bytes, so we have to decode the bytes into string before we can do something about it.
- ```python
while True:
data = mysock.recv(512)
if ( len(data) < 1):
break
mystring = data.decode()
print(mystring)```
- `decode()` will convert bytes to unicode.
- ```javascript
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('nicolevanderhoeven.com', 80))
cmd = 'GET https://nicolevanderhoeven.com'.encode()
mysock.send(cmd)```
- [[UTF-8]] is what's used when talking to other machines; within our own machines (using Python) we use [[Unicode]].
- We need to `encode()` requests we send and `decode()` responses we get back.
- `range()` prints numbers in a sequence (zero-based)
- ```python
x = range(5)
for n in x:
print(n)```
- will yield:
- ```python
0
1
2
3
4```
- `split()` will, by default, split a string by spaces, unless a character is specified.
- `len()` counts how many elements are in an array. 1-based