|Ken Novak's Weblog
Purpose of this blog: to retain annotated bookmarks for my future reference, and to offer others my filter technology and other news. Note that this blog is categorized. Use the category links to find items that match your interests.
Subscribe to get this blog by e-mail.
New: Read what I'm reading on Bloglines.
Friday, April 23, 2004
: "Classifier4J is a Java library designed to do text classification. It comes with an implementation of a Bayesian classifier, and now has some other features, including a text summary facility. " Last update end 2003. It has been integrated with RSS
and an NNTP & RSS reader
. 11:13:55 PM
What Are Topic Maps?: A generalized data structure standard to link info and support general purpose navigation. "The topic map takes the key concepts described in the databases and documents and relates them together independently of what is said about them in the information being indexed. So when a document says "The maintenance procedure for part X consists of the following steps..." the topic map may say "Part X is of type Q and is contained in parts Y and Z and its maintenance procedure resides in document W". .. The result is an information structure that breaks out of the traditional hierarchical straightjacket that we have gotten used to squeezing our information into. A topic map usually contains several overlapping hierarchies which are rich with semantic cross-links like "Part X is critical to procedure V." ..
The most common use for topic maps right now is to build web sites that are entirely driven by the topic map, in order to fully realize the their information-finding benefits. The topic map provides the site structure, and the page content is taken partly from the topic map itself, and partly from the occurrences. This solution is perfect for all sorts of portals, catalogs, site indexes, and so on. Since a topic map can be said to represent knowledge about the things it describes, topic maps are also ideal as knowledge management tools. "
From The TAO of Topic Maps: "Topic maps started life as a way of representing the knowledge structures inherent in traditional back of book indexes, in order to solve the information management problems involved in creating, maintaining and processing indexes for complex documentation. As the model evolved, their scope was broadened to encompass other kinds of navigational aid, such as glossaries, thesauri and cross references. " 4:15:35 PM
xMail: E-mail as XML
: "E-mail is a good example of a structured text format that can usefully be converted to XML for processing, archiving, and searching. In this chapter, we develop xMaila Python application to convert e-mail to XML." 3:03:21 PM
Developer tools for web site designers:
RSS to JS demo
MODEL U CONCEPT CAR
: "the Model T of the 21st century... Powered by the world's first supercharged hydrogen internal combustion engine, equipped with a hybrid electric transmission and pioneering green materials and processes, Model U is a vision for the future. It is Ford's model for change - exploring the benefits a vehicle provides to its users, the way it is manufactured and how it impacts the world." Contributions from many sources, including William McDonough, Sun Micro, MIT MediaLab. 12:27:07 PM
Cringely on search and digital archeaology: ""MeaningMaster isn't a search engine, but a search technology.. [with a] lexicon -- a computer dictionary that is purported to understand the meanings of more than 200,000 English words IN CONTEXT.. MeaningMaster is hand-coded, a process that took 175 man- and woman-years."
I like Cringely's general observation: "What has changed is that, through the relentless passage of Moore's Law, computers are on average 16 times faster today than they were back in 1998. Today, MeaningMaster claims a server can process 50,000 queries per hour, though they are careful not to specify either the power of the server or the complexity of the query, though with modern brute force approaches like Google's swarm of PC servers, it probably doesn't matter. Where [the 1990's] Inquizit was interesting, but probably not competitive, MeaningMaster is now competitive. .. This makes me wonder, in fact, whether there aren't hundreds of promising technologies from the late 1990s that are worth another look today. It would probably be worthwhile to start a company just to specialize in this type of digital archaeology." 8:49:26 AM