Programming Spiders, Bots, and Aggregators in Java, Sybex

E-books Shop
Programming Spiders, Bots, and Aggregators in Java

===========
Spiders, bots, and aggregators are all so-called intelligent agents, which execute tasks on
the Web without the intervention of a human being. Spiders go out on the Web and identify
multiple sites with information on a chosen topic and retrieve the information. Bots find
information within one site by cataloging and retrieving it. Aggregrators gather data from
multiple sites and consolidate it on one page, such as credit card, bank account, and
investment account data. This book offer offers a complete toolkit for the Java programmer
who wants to build bots, spiders, and aggregrators. It teaches the basic low-level
HTTP/network programming Java programmers need to get going and then dives into how to
create useful intelligent agent applications. It is aimed not just at Java programmers but JSP
programmers as well. The CD-ROM includes all the source code for the author's intelligent
agent platform, which readers can use to build their own spiders, bots, and aggregators.
===========
.
Jeff Heaton
Associate Publisher: Richard Mills
Acquisitions and Developmental Editor: Diane Lowery
Editor: Rebecca C. Rider
Production Editor: Dennis Fitzgerald
Technical Editor: Marc Goldford
Graphic Illustrator: Tony Jonick
Electronic Publishing Specialists: Jill Niles, Judy Fung
Proofreaders: Emily Hsuan, Laurie O’Connell, Nancy Riddiough
Indexer: Ted Laux
CD Coordinator: Dan Mummert
CD Technician: Kevin Ly
Cover Designer: Carol Gorska, Gorska Design
Cover Illustrator/Photographer: Akira Kaede, PhotoDisc


Table of Contents

Introduction
Overview
What Is a Bot?
What Is a Spider?
What Are Agents and Intelligent Agents?
What Are Aggregators?
The Java Programming Language
Wrap Up 
Chapter 1: Java Socket Programming
Overview
The World of Sockets
Java I/O Programming
Proxy Issues
Socket Programming in Java
Client Sockets
Server Sockets
Summary 
Chapter 2: Examining the Hypertext Transfer Protocol
Overview
Address Formats
Using Sockets to Program HTTP
Bot Package Classes for HTTP
Under the Hood 
Summary
Chapter 3: Accessing Secure Sites with HTTPS
Overview
HTTP versus HTTPS
Using HTTPS with Java
HTTP User Authentication
Securing Access
Under the Hood
Summary
Chapter 4: HTML Parsing
Overview
Working with HTML
Tags a Bot Cares About 
HTML That Requires Special Handling
Using Bot Classes for HTML Parsing
Using Swing Classes for HTML Parsing
Bot Package HTML Parsing Examples
Under the Hood
Summary
Chapter 5: Posting Forms
Overview
Using Forms
Bot Classes for a Generic Post
Under the Hood 
Summary
Chapter 6: Interpreting Data
Overview
The Structure of the CSV File
The Structure of a QIF File
The XML File Format
Summary
Chapter 7: Exploring Cookies
Overview
Examining Cookies
Bot Classes for Cookie Processing
Under the Hood
Summary
Chapter 8: Building a Spider
Overview
Structure of Websites
Structure of a Spider
Constructing a Spider
Summary
Chapter 9: Building a High-Volume Spider
Overview
What Is Multithreading?
Multithreading with Java
Synchronizing Threads
Using a Database
The High-Performance Spider
Under the Hood 
Summary
Chapter 10: Building a Bot
Overview
Constructing a Typical Bot
Using the CatBot 
An Example CatBot 
Under the Hood
Summary
Chapter 11: Building an Aggregator
Overview
Online versus Offline Aggregation
Building the Underlying Bot
Building the Weather Aggregator
Summary
Chapter 12: Using Bots Conscientiously
Overview
Dealing with Websites
Webmaster Actions
A Conscientious Spider
Under the Hood
Summary 
Chapter 13: The Future of Bots
Overview
Internet Information Transfer
Understanding XML
Transferring XML Data
Bots and SOAP
Summary
Appendix A: The Bot Package
Utility Classes
HTTP Classes
The Parsing Classes
Spider Classes 
Appendix B: Various HTTP Related Charts
The ASCII Chart
HTTP Headers
HTTP Status Codes 
HTML Character Constants
Appendix C: Troubleshooting
WIN32 Errors
UNIX Errors
Cross-Platform Errors
How to Use the NOBOT Scripts
Appendix D: Installing Tomcat
Installing and Starting Tomcat
A JSP Example
Appendix E: How to Compile Examples Under Windows
Using the JDK
Using VisualCafé
Appendix F: How to Compile Examples Under UNIX
Using the JDK
Appendix G: Recompiling the Bot Package
Glossary

Introduction
Overview
A tremendous amount of information is available through the Internet: today’s news, the location of an expected package, the score of last night’s game, or the current stock price of your company. Open your favorite browser, and all of this information is only a mouse click away. Nearly any piece of current information can be found online; you have only to discover it.
Most of the information content of the Internet is both produced and consumed by human users. As a result, web pages are generally structured to be inviting to human visitors. But is this the only use for the Web? Are human users the only visitors a website is likely to accommodate?
Actually, a whole new class of web user is developing. These users are computer programs that have the ability to access the Web in much the same way as a human user with a browser does. There are many names for these kinds of programs, and these names reflect many of the specialized tasks assigned to them. Spiders, bots, aggregators, agents, and intelligent agents are all common terms for web-savvy computer programs. As you read through this book, we will examine how to create each of these Internet programs. We will examine the differences between them as well as see what the benefits for each are. Figure I.1 shows the hierarchy of these programs.

 Screenshot 

E-books Shop

Purchase Now !
Just with Paypal



Product details
 Price
 File Size
 3,024 KB
 Pages
 485 p
 File Type
 PDF format
 ISBN
 0-78214-040-8
 Copyright
 2002 SYBEX Inc      
●▬▬▬▬▬❂❂❂▬▬▬▬▬●
●▬▬❂❂▬▬●
●▬❂▬●


═════ ═════

Previous Post Next Post