This is a pretty cool library I found the other day NClassifer. Its a port of Java JClassifier. Some nice little functions in there like:
- BayesianClassifier - uses Bayes' theorem to rate the text against a known input
- VectorClassifier - uses the vector space search algorithm
- Summariser - Auto summarize long text
The areas of interest are the summarizer nice for auto generating teasers for news items or events in umbraco. It also has GetMostFrequentWords could possibly use it to auto generate keywords.
Just wrap the classifer up as an xsl extension and its ready to use in Umbraco.
If you wanted to be really brave you could create your own datatype possibly something based on myURL to auto generate abstracts for long pieces of content.
If you're feeling really really adventurous you could have a stab at trying to build some kind of automatic classification control. You would need to create a training set first so that the bayesian classifier has a reference set to work with but you really need to know what your doing to get this going. More information on this at the original JClassifier site