 | |

View Larger |
Programming Collective Intelligence: Building Smart Web 2.0 Applications By Toby Segaran ( O'Reilly Media, Inc. )
Release Date: 2007-08-16
Average Customer Rating:
List Price: $39.99
Price: $26.39 Eligible for FREE Super Saver Shipping on orders over $25.
Availability: Usually ships in 24 hours
| Add to Cart |
|
|
Product Description
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in adataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details." -- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
|
Not worth the money ( wadcom )
In short: this book isn't worth its price.
The major part of the volume of the book is code and corresponding explanations. If the reader is a decent programmer, he can actually figure it all out by himself given algorithms. Otherwise it makes more sense to get a book on data structures, or Python, or general algorithm construction and learn the basics there.
The algorithms/methods presented in this book are not really specific to "collaborative intelligence" (with a couple of exceptions). The author gives a brief overview of the techniques and then dives into great details on how to implement it. In reality unless you are working on a toy site, you don't really need that code, since it wouldn't scale or fit the production environment. You'll need the math model / algorithm to come up with reasonable implementation. However, it's exactly what the book is missing. Well, it gives *some* info on that, but you'll need to read a more comprehensive source if you intend to really implement it.
I was quite disappointed with the book. I guess it might be ok for a junior developer to get a feel of what that all is about. If you've ever come up with an algorithm by looking at a mathematical description of the approach, you don't need this book at all; you can write a similar one yourself.
|
An Eye Openning Inspiring Book
I got more from this book than I have from any other book I read in the past couple of years!
It covers in a streamlined form a huge array of algorithms powering the contemporary web - from recomendation engines to a search engine that includes as one of its features the Google PageRank algorithm, to some quite recent AI innovations.
Just about the only area that was not covered was statistical machine translation. I wish he had done that, since that is my favourite subject.
It helps you see the world through the "Collective Intelligence opportunities" prism.
|
Wow ( bitweiser )
If you are interested in this topic, you should read this book. Disclaimer: I am new to the topic but appreciate when it is done well and need to understand how to implement it for my job. I was blown away by both the conceptual coverage and the implementation details. This book will allow you to cover the concepts on a first pass then come back and actually build the approaches you are most interested in. Even if you ultimately use a vendor product for recommendation, you will understand the algorithms being used and their proper application and where they are deficient.
|
good but no great
most people have shared their thoughts on the good of this book. I like to point out some of the bad as I read through:
- first, too many typos - both the author and oreilly should do a better job on proof read the materials. the typos are so much that it can easily wreck otherwise good materials.
- second, arcane solution and coding style. Many first step to the solution of machine learning is to represent the problem at hand well. The author's brain apparently wired different from mine so the opinion is personal. For example: chapter 5 on "optimization for preference", he chose to represent a solution as vector form like [0,0,0,0,0,0,0,0,0,0], there is no way I can relate this solution to the real meaning (you want to allocate 10 students into 5 rooms each with two slots) - if there is an easy explanation, the book didn't say so.
thus the 3 star. I believe a second edition is warranted and should be much better.
just my 2c.
|
Great, simple presentation of some powerful techniques
Programming Collective Intelligence is a book about applying data mining techniques to analyse collections of data. There is submerged information in Ebay prices, in Facebook profile networks, in collections of movie reviews, in news sites, in the stockmarket; this book by Toby Segaran shows ways to extract, visualise, understand, and predict that information.
Each chapter explains and explores a different data mining algorithm, and builds up a working example in Python, while presenting different methods and parameters of the implementation. I hadn't really worked with Python before, but found the code easy to follow, and picked up some interesting Python idioms that I haven't seen in other languages before. Chapters end with a set of exercises to follow that build your understanding.
As you follow the examples you build up a reasonably generic code base that allows you to swap in and out different implementations, and reuse previous code to add to new applications.
The examples use live examples from the web: sites like Ebay, Facebook, and Yahoo Finance, and this makes the book more interesting and the results more visceral than some other books on the subject which use more contrived or obscure examples. Even though there is a strong web (or web 2.0) focus on the examples, the methods and the understanding is useful for a whole range of applications.
Some of the topics covered:
* Bayesian classifiers to detect spam, or to file news articles into site sections
* Hierarchical and k-means clustering to discover groups of similar items in massive sets
* Euclidiean distance, Pearson Correlation Coefficient, Tanimoto Coefficient: ways to measure the distance (or difference) between items
* Neural networks to predict user behaviour and improve search result ordering
* Optimisation methods like hill climbing, simulated annealing, and genetic algorithms
* Non-negative matrix factorization
* Support vector machines and kernel methods to go where linear regression can't
I found it exciting to read -- it's one of those books that give you a whole bunch of new ideas for things to build as you read it. The presentation is very good: no background is assumed, and it doesn't talk down to those more experienced.
Recommended.
|
|
|