Second Edition
Brandon Rhodes and John Goerzen
Contents at a Glance
Introduction to Client/Server Networking
UDP
TCP
Socket Names and DNS
Network Data and Network Errors
TLS and SSL
Server Architecture
Caches, Message Queues, and Map-Reduce
HTTP
Screen Scraping
Web Applications
E-mail Composition and Decoding
SMTP
POP
IMAP
Telnet and SSH
FTP
RPC
Book Details
Price
|
2.00 USD |
---|---|
Pages
| 370 p |
File Size
|
5,083 KB |
File Type
|
PDF format |
ISBN-13 (pbk) ISBN-13 (electronic)
| 978-1-4302-3003-8 978-1-4302-3004-5 |
Copyright
| 2010 by Brandon Rhodes and John Goerzen |
Brandon Craig Rhodes has been an avid Python programmer since the 1990s, and a
professional Python developer for a decade. He released his PyEphem astronomy
library in the same year that Python 1.5 was released, and has maintained it ever since.
As a writer and speaker, Brandon enjoys teaching and touting Python, whether as
the volunteer organizer of Python Atlanta or on stage at conferences like PyCon. He
was editor of the monthly Python Magazine, was pleased to serve as technical
reviewer for the excellent Natural Language Processing with Python, and has helped
several open source projects by contributing documentation.
Today Brandon operates the Rhodes Mill Studios consultancy in Atlanta, Georgia,
which provides Python programming expertise and web development services to customers both local
and out-of-state. He believes that the future of programming is light, concise, agile, test-driven, and
enjoyable, and that Python will be a big part of it.
John Goerzen is an accomplished author, system administrator, and Python
programmer. He has been a Debian developer since 1996 and is currently president of
Software in the Public Interest, Inc. His previously published books include the Linux
Programming Bible, Debian Unleashed, and Linux Unleashed.
About the Technical Reviewer
Michael Bernstein is a web designer and developer, specializing in usable, simple, standards-based
web applications, living in Albuquerque, New Mexico.
Introduction
You have chosen an exciting moment in computing history to embark on a study of network
programming. Machine room networks can carry data at speeds comparable to those at which machines
access their own memory, and broadband now reaches hundreds of millions of homes worldwide. Many
casual computer users spend their entire digital lives speaking exclusively to network services; they are
only vaguely aware that their computer is even capable of running local applications.
This is also a moment when, after 20 solid years of growth and improvement, interest in Python
really seems to be taking off. This is different from the trajectory of other popular languages, many of
which experience their heyday and go into decline long before the threshold of their third decade. The
Python community is not only strong and growing, but its members seem to have a much better feel for
the language itself than they did a decade ago. The advice we can share with new Python programmers
about how to test, write, and structure applications is vastly more mature than what passed for Pythonic
design a mere decade ago.
Both networking and Python programming are large topics, and their intersection is a rich and
fertile domain. I wish you great success! Whether you just need to connect to a single network port, or
are setting out to architect a complex network service, I hope that you will remember that the Internet is
an ecosystem that remains healthy so long as individual programmers honor public protocols and
support interoperability so that solutions can grow, compete, and thrive.
Writing even the simplest network program inducts you into the grand tradition started by the
inventors of the Internet, and I hope you enjoy the tools and the power that they have placed in our
hands. I like the encouragement that John Goerzen, the author of the first edition of this book, gave his
readers in his own introduction: “I want this to be your lab manual—your guide for inventing things that make the Internet better.”
Assumptions
This book assumes that you know how to program in Python, but does not assume that you know
anything about networking. If you have used something like a web browser before, and are vaguely
aware that your computer talks to other computers in order to display web pages, then you should be
ready to start reading this book.
This book targets Python versions 2.5, 2.6, and 2.7, and in the text I have tried to note any differences
that you will encounter between these three versions of Python when writing network code.
As of this writing, the Python 2 series is still the workaday version of the language for programmers
who use Python in production. In fact, the pinnacle of that line of language development—Python 2.7—was released just a few months ago, and a second bugfix release is now in testing. Interest in the
futuristic Python 3 version of the language is still mostly limited to framework authors and library
maintainers, as they embark on the community's several-year effort to port our code over to the new
version of the language.
If you are entirely new to programming, then an Amazon search will suggest several highly rated
books that use Python itself to teach you the basics. A long list of online resources, some of which are
complete e-books, is maintained at this link: wiki.python.org/moin/BeginnersGuide/NonProgrammers.
If you do know something about Python and programming but run across unfamiliar syntax or
conventions in my program listings, then there are several sources of help. Re-reading the Python
Tutorial—the document from which I myself once learned the language—can be a great way to review
all of the language's basic features. Numerous books are, of course, available. And asking questions on
Stack Overflow, a mailing list, or a forum might help you answer questions that none of your printed
materials seem to answer directly.
The best source of knowledge, however, is often the community. I used Python more or less alone
for a full decade, thinking that blogs and documentation could keep me abreast of the latest
developments. Then a friend convinced me to try visiting a local Python users group, and I have never
been the same. My expertise started to grow by leaps and bounds. There is no substitute for a real, live,
knowledgeable person listening to your problem and helping you find the way to a solution.
Table of Contents
■Contents at a Glance ............................................................................................ iv
■About the Authors ............................................................................................... xv
■About the Technical Reviewer ............................................................................. xv
■Acknowledgments .............................................................................................. xvi
■Introduction ....................................................................................................... xvii
■Chapter 1: Introduction to Client/Server Networking ............................................ 1
The Building Blocks: Stacks and Libraries ...................................................................... 1
Application Layers ........................................................................................................... 4
Speaking a Protocol ........................................................................................................ 5
A Raw Network Conversation ......................................................................................... 6
Turtles All the Way Down ................................................................................................ 8
The Internet Protocol ....................................................................................................... 9
IP Addresses ................................................................................................................. 10
Routing .......................................................................................................................... 11
Packet Fragmentation ................................................................................................... 13
Learning More About IP ................................................................................................. 14
■Chapter 2: UDP .................................................................................................... 15
Should You Read This Chapter? .................................................................................... 16
Addresses and Port Numbers ....................................................................................... 16
Port Number Ranges ..................................................................................................... 17
Sockets ......................................................................................................................... 19
Unreliability, Backoff, Blocking, Timeouts .................................................................... 22
Connecting UDP Sockets ............................................................................................... 25
Request IDs: A Good Idea .............................................................................................. 27
Binding to Interfaces ..................................................................................................... 28
UDP Fragmentation ....................................................................................................... 30
Socket Options .............................................................................................................. 31
Broadcast ...................................................................................................................... 32
When to Use UDP .......................................................................................................... 33
Summary ....................................................................................................................... 34
■Chapter 3: TCP ..................................................................................................... 35
How TCP Works ............................................................................................................. 35
When to Use TCP ........................................................................................................... 36
What TCP Sockets Mean ............................................................................................... 37
A Simple TCP Client and Server .................................................................................... 38
One Socket per Conversation ........................................................................................ 41
Address Already in Use ................................................................................................. 42
Binding to Interfaces ..................................................................................................... 43
Deadlock ....................................................................................................................... 44
Closed Connections, Half-Open Connections ................................................................ 48
Using TCP Streams like Files ........................................................................................ 49
Summary ....................................................................................................................... 49
■Chapter 4: Socket Names and DNS ..................................................................... 51
Hostnames and Domain Names .................................................................................... 51
Socket Names ............................................................................................................... 52
Five Socket Coordinates ............................................................................................... 53
IPv6 ............................................................................................................................... 54
Modern Address Resolution .......................................................................................... 55
Asking getaddrinfo() Where to Bind .............................................................................. 56
Asking getaddrinfo() About Services ............................................................................. 56
Asking getaddrinfo() for Pretty Hostnames ................................................................... 57
Other getaddrinfo() Flags .............................................................................................. 58
Primitive Name Service Routines .................................................................................. 59
Using getsockaddr() in Your Own Code ......................................................................... 60
Better Living Through Paranoia .................................................................................... 61
A Sketch of How DNS Works ......................................................................................... 63
Why Not to Use DNS ...................................................................................................... 65
Why to Use DNS ............................................................................................................ 66
Resolving Mail Domains ................................................................................................ 68
Zeroconf and Dynamic DNS .......................................................................................... 70
Summary ....................................................................................................................... 70
■Chapter 5: etwork Data and Network Errors ....................................................... 71
Text and Encodings ....................................................................................................... 71
Network Byte Order ...................................................................................................... 73
Framing and Quoting .................................................................................................... 75
Pickles and Self-Delimiting Formats ............................................................................. 79
XML, JSON, Etc. ............................................................................................................ 80
Compression ................................................................................................................. 81
Network Exceptions ...................................................................................................... 82
Handling Exceptions ..................................................................................................... 83
Summary ....................................................................................................................... 85
■Chapter 6: TLS and SSL ....................................................................................... 87
Computer Security ........................................................................................................ 87
IP Access Rules ............................................................................................................. 88
Cleartext on the Network .............................................................................................. 90
TLS Encrypts Your Conversations ................................................................................. 92
TLS Verifies Identities ................................................................................................... 93
Supporting TLS in Python .............................................................................................. 94
The Standard SSL Module ............................................................................................. 95
Loose Ends .................................................................................................................... 98
Summary ....................................................................................................................... 98
■Chapter 7: Server Architecture ........................................................................... 99
Daemons and Logging .................................................................................................. 99
Our Example: Sir Launcelot ......................................................................................... 100
An Elementary Client ................................................................................................... 102
The Waiting Game ....................................................................................................... 103
Running a Benchmark ................................................................................................. 106
Event-Driven Servers .................................................................................................. 109
Poll vs. Select .............................................................................................................. 112
The Semantics of Non-blocking .................................................................................. 113
Event-Driven Servers Are Blocking and Synchronous ................................................ 114
Twisted Python ........................................................................................................... 114
Load Balancing and Proxies ........................................................................................ 117
Threading and Multi-processing ................................................................................. 117
Threading and Multi-processing Frameworks ............................................................ 120
Process and Thread Coordination ............................................................................... 122
Running Inside inetd ................................................................................................... 123
Summary ..................................................................................................................... 124
■Chapter 8: Caches, Message Queues, and Map-Reduce ................................... 125
Using Memcached ...................................................................................................... 126
Memcached and Sharding .......................................................................................... 128
Message Queues ......................................................................................................... 130
Using Message Queues from Python .......................................................................... 131
How Message Queues Change Programming ............................................................. 133
Map-Reduce ................................................................................................................ 134
Summary ..................................................................................................................... 136
■Chapter 9: HTTP ................................................................................................. 137
URL Anatomy ............................................................................................................... 138
Relative URLs .............................................................................................................. 141
Instrumenting urllib2 ................................................................................................... 141
The GET Method .......................................................................................................... 142
The Host Header ......................................................................................................... 144
Codes, Errors, and Redirection ................................................................................... 144
Payloads and Persistent Connections ......................................................................... 147
POST And Forms ......................................................................................................... 148
Successful Form POSTs Should Always Redirect ....................................................... 150
POST And APIs ............................................................................................................ 151
REST And More HTTP Methods ................................................................................... 151
Identifying User Agents and Web Servers ................................................................... 152
Content Type Negotiation ............................................................................................ 153
Compression ............................................................................................................... 154
HTTP Caching .............................................................................................................. 155
The HEAD Method ....................................................................................................... 156
HTTPS Encryption ........................................................................................................ 156
HTTP Authentication .................................................................................................... 157
Cookies ....................................................................................................................... 158
HTTP Session Hijacking .............................................................................................. 160
Cross-Site Scripting Attacks ....................................................................................... 160
WebOb ......................................................................................................................... 161
Summary ..................................................................................................................... 161
■Chapter 10: Screen Scraping . .......................................................................... 163
Fetching Web Pages ................................................................................................... 163
Downloading Pages Through Form Submission ......................................................... 164
The Structure of Web Pages ....................................................................................... 167
Three Axes . ................................................................................................................ 168
Diving into an HTML Document . ................................................................................. 169
Selectors . ................................................................................................................... 173
Summary ..................................................................................................................... 177
■Chapter 11: Web Applications . ......................................................................... 179
Web Servers and Python . ........................................................................................... 180
Two Tiers . .................................................................................................................. 180
Choosing a Web Server . ............................................................................................. 182
WSGI . .......................................................................................................................... 183
WSGI Middleware . ...................................................................................................... 185
Python Web Frameworks ............................................................................................ 187
URL Dispatch Techniques ........................................................................................... 189
Templates ................................................................................................................... 190
Final Considerations ................................................................................................... 191
Pure-Python Web Servers ........................................................................................... 192
CGI . ............................................................................................................................. 193
mod_python ................................................................................................................ 194
Summary ..................................................................................................................... 195
■Chapter 12: E-mail Composition and Decoding ................................................ 197
E-mail Messages ........................................................................................................ 198
Composing Traditional Messages ............................................................................... 200
Parsing Traditional Messages . ................................................................................... 202
Parsing Dates .............................................................................................................. 203
Understanding MIME ................................................................................................... 205
How MIME Works ........................................................................................................ 206
Composing MIME Attachments ................................................................................... 206
MIME Alternative Parts ................................................................................................ 208
Composing Non-English Headers ............................................................................... 210
Composing Nested Multiparts ..................................................................................... 211
Parsing MIME Messages ............................................................................................. 213
Decoding Headers ....................................................................................................... 215
Summary ..................................................................................................................... 216
■Chapter 13: SMTP .............................................................................................. 217
E-mail Clients, Webmail Services ............................................................................... 217
In the Beginning Was the Command Line .......................................................................................... 218
The Rise of Clients ............................................................................................................................. 218
The Move to Webmail ......................................................................................................................... 220
How SMTP Is Used ...................................................................................................... 221
Sending E-Mail ............................................................................................................ 221
Headers and the Envelope Recipient .......................................................................... 222
Multiple Hops .............................................................................................................. 223
Introducing the SMTP Library ..................................................................................... 224
Error Handling and Conversation Debugging .............................................................. 225
Getting Information from EHLO ................................................................................... 228
Using Secure Sockets Layer and Transport Layer Security ........................................ 230
Authenticated SMTP .................................................................................................... 232
SMTP Tips ................................................................................................................... 233
Summary ..................................................................................................................... 234
■Chapter 14: POP ................................................................................................ 235
Compatibility Between POP Servers ........................................................................... 235
Connecting and Authenticating ................................................................................... 235
Obtaining Mailbox Information .................................................................................... 238
Downloading and Deleting Messages ......................................................................... 239
Summary ..................................................................................................................... 241
■Chapter 15: IMAP .............................................................................................. 243
Understanding IMAP in Python .................................................................................... 244
IMAPClient ................................................................................................................... 246
Examining Folders ...................................................................................................... 248
Message Numbers vs. UIDs ........................................................................................ 248
Message Ranges ......................................................................................................... 249
Summary Information ................................................................................................. 249
Downloading an Entire Mailbox .................................................................................. 250
Downloading Messages Individually ........................................................................... 252
Flagging and Deleting Messages ................................................................................ 257
Deleting Messages ...................................................................................................... 258
Searching .................................................................................................................... 259
Manipulating Folders and Messages .......................................................................... 260
Asynchrony ................................................................................................................. 261
Summary ..................................................................................................................... 261
■Chapter 16: Telnet and SSH ............................................................................... 263
Command-Line Automation ........................................................................................ 263
Command-Line Expansion and Quoting ...................................................................... 265
Unix Has No Special Characters .................................................................................. 266
Quoting Characters for Protection ............................................................................... 268
The Terrible Windows Command Line ........................................................................ 269
Things Are Different in a Terminal .............................................................................. 270
Terminals Do Buffering ............................................................................................... 273
Telnet .......................................................................................................................... 274
SSH: The Secure Shell ................................................................................................ 278
An Overview of SSH .................................................................................................... 279
SSH Host Keys ............................................................................................................ 280
SSH Authentication ..................................................................................................... 282
Shell Sessions and Individual Commands .................................................................. 283
SFTP: File Transfer Over SSH ...................................................................................... 286
Other Features ............................................................................................................ 289
Summary ..................................................................................................................... 290
■Chapter 17: FTP ................................................................................................. 291
What to Use Instead of FTP ......................................................................................... 291
Communication Channels ........................................................................................... 292
Using FTP in Python .................................................................................................... 293
ASCII and Binary Files ................................................................................................. 294
Advanced Binary Downloading ................................................................................... 295
Uploading Data ............................................................................................................ 297
Advanced Binary Uploading ........................................................................................ 298
Handling Errors ........................................................................................................... 299
Detecting Directories and Recursive Download .......................................................... 301
Creating Directories, Deleting Things ......................................................................... 302
Doing FTP Securely ..................................................................................................... 303
Summary ..................................................................................................................... 303
■Chapter 18: RPC ................................................................................................ 305
Features of RPC .......................................................................................................... 306
XML-RPC ..................................................................................................................... 307
JSON-RPC ................................................................................................................... 313
Self-documenting Data ............................................................................................... 315
Talking About Objects: Pyro and RPyC ........................................................................ 316
An RPyC Example ........................................................................................................ 317
RPC, Web Frameworks, Message Queues .................................................................. 319
Recovering From Network Errors ................................................................................ 320
Binary Options: Thrift and Protocol Buffers ................................................................. 320
Summary ..................................................................................................................... 321
■Index ................................................................................................................. 323
Networking
This book teaches network programming by focusing on the Internet protocols—the kind of network in which most programmers are interested these days, and the protocols that are best supported by the
Python Standard Library. Their design and operation is a good introduction to networking in general, so
you might find this book useful even if you intend to target other networks from Python; but the code
listings will be directly useful only if you plan on speaking an Internet protocol.
The Internet protocols are not secret or closed conventions; you do not have to sign non-disclosure
agreements to learn the details of how they operate, nor pay license fees to test your programs against
them. Instead, they are open and public, in the best traditions of programming and of computing more
broadly. They are defined in documents that are each named, for historical reasons, a Request For
Comments (RFC), and many RFCs are referred to throughout this book.
When an RFC is referenced in the text, I will generally give the URL to the official copy of each RFC,
at the web site of the Internet Engineering Task Force (IETF). But some readers prefer to look up the
same RFCs on faqs.org since that site adds highlighting and hyperlinks to the text of each RFC; here is a link to their archive, in case you might find a richer presentation helpful: www.faqs.org/rfcs/.
Organization The first of this book's four parts is the foundation for all of the rest: it explains the basic Internet protocols on which all higher forms of communication are built. If you are writing a network client, then you can probably read Chapters 1 through 6 and then jump ahead to the chapter on the protocol that interests you. Programmers interested in writing servers, however, should continue on through Chapter 7—and maybe even Chapter 8—before jumping into their specific protocol.
The middle parts of the book each cover a single big topic: the second part covers the Web, while the
third looks at all of the different protocols surrounding e-mail access and transmission. It is upon
reaching its fourth part that this book finally devolves into miscellany; the chapters bounce around
between protocols for interacting with command prompts, transferring files, and performing remote
procedure calls.
I want to draw particular attention to Chapter 6 and the issue of privacy online. For too many years,
encryption was viewed as an exotic and expensive proposition that could be justified only for
information of the very highest security. But with today's fast processors, SSL can be turned on for nearlany service without your users necessarily seeing any perceptible effect. And modern Python libraries make it easy to establish a secure connection! Become informed about SSL and security, and consider deploying it with all externally facing services that you write for public use.