Based on a discussion on PaulDotCom (episode 129) about creating custom word lists spidering a target's website and collecting unique words I decided to write CeWL, the Custom Word List generator. CeWL is a ruby app that spiders a given URL to a specified depth, optionally following external links, and returns a list of words that can then be used for password crackers such as John the Ripper.
By default, CeWL sticks to just the site you have specified and will go to a depth of 2 links; this behavior can be changed by passing arguments. Be careful if setting a large depth and allowing it to go offsite; you could drift on to many other domains. All words of three characters and over are output to stdout. This length can be increased, and the words can be written to a file rather than a screen so the app can be automated.
CeWL also has an associated command line app, FAB (Files Already Bagged), which uses the same metadata extraction techniques to create author/creator lists from already downloaded.
Source code and additional information can be found here: https://github.com/digininja/CeWL