XML and software
XML, web and software in general, with notes on Radio Userland resources

Ken Novak's Weblog


daily link  Friday, April 30, 2004


Collaborative Filtering Resources  12:09:03 AM  permalink  


daily link  Thursday, April 29, 2004


The CPAN Search Site: Ever-growing PERL library, eg, XML::RSS::TimingBot - for efficiently fetching RSS feeds  10:09:14 PM  permalink  

MySQL Gotchas: "It's not a bug - it's a gotcha. A "gotcha" is a feature or function which works as advertised - but not as expected.   MySQL has an abundance of gotchas. These will cause cause much head-scratching, grinding of teeth etc. - particularly for anyone coming from more fully-featured databases who is not used to implementing large portions of the RDBMS functionality in the application code. "  It points to another concise set of MySQL complaints.  1:02:32 AM  permalink  

css-discuss: "The css-discuss Wiki is a companion to the CssDiscussList mailing list. Among other things the wiki serves as a collective long term memory for the list participants."  Table of contents serves as a fast directory/FAQ for css.  12:59:12 AM  permalink  

Google Answers: Spam Assassin Breakdown: More than you'd want to know about how SpamAssassin's rules work.  12:46:26 AM  permalink  

nexus: "Don't you hate all those RSS feeds with only title and links and no content? This little program will help you. It turns those RSS feeds into ones with full content. It does this by reading the original RSS file, following every link, downloading the content and putting it into the RSS files as HTML snippets. " Uses very simple heuristics to pick the best part of table layout.  Written in Perl.  12:29:54 AM  permalink  

Open Clip Art Project: "This project has the goal of creating a free archive of clip art that can be used with free software, closed software, distributed with various software distributions, or be used in graphic design compositions. "  See also Pixel Perfect Digital 3.0 - Free Image Archive: "Here you'll find a growing collection of free high resolution photos and illustrations. They're free to use in both your personal and professional design projects. Registration is not required. "  12:14:24 AM  permalink  


daily link  Wednesday, April 28, 2004


Send Jobs to India? Some Find It's Not Always Best: Examples when programming work was returned the US after trying India.  "Indian programmers required more detailed instructions to write the software code than would a programmer here, who would be more familiar with the customer's needs. This slowed the process, which was a major drawback because this technology is new and changing very fast. .. "Whenever the pace of innovation is very rapid," he said, "is when the work should be done closer to the client." .. [India's] Infosys announced that it would spend $20 million to set up a consulting company in the United States."  5:22:48 PM  permalink  

Nano-Hive: Nanospace Simulator: "Nano-Hive is a modular simulator used for modeling the physical world at a nanometer scale. The intended purpose of the simulator is to act as a tool for the study and development of nanotech entities." Version 1 is for a single-user desktop, version 2 is planned to be distributed, possibly using Globus.  8:59:02 AM  permalink  

Internet-based Distributed Computing Projects: Directories of active and completed projects. Among them is a designer for circuits with Built-In Self-Test (BIST), such as ones used in space or medical applications where reliability and fault detection are critical.  It uses Genetic Algorithms (GA) and Evolutionary Strategies (ES) to derive and simulate alternate designs.

  6:43:27 AM  permalink  

Rapid Web Application Deployment with Maypole: "Maypole enables Perl programmers to get web front-ends to databases, as well as complex web-based applications, up and running quickly."  Part 2 has more examples.  6:18:24 AM  permalink  


daily link  Tuesday, April 27, 2004


Infohound Color Schemer: "Matching colors will be automatically chosen. You can click on one to set it as the primary color."  8:51:45 AM  permalink  

HTML Tidy Online: "HTML Tidy is a tool for checking and cleaning up HTML source files." Paste the ugly html into the form and press submit.  8:35:45 AM  permalink  

HTML Tabbed Dialog Widget: "One of the most common widgets in a GUI is the tabbed dialog, where related options are grouped into tabs, and the user can navigate between them. This powerful widget is not part of the HTML spec, so it is not seen on websites. However, with some simple Java Script and a little CSS styling, it is straightforward to create a tabbed dialog in HTML"  8:32:08 AM  permalink  


daily link  Monday, April 26, 2004


sorttable: Make all your tables sortable: nifty simple javascript functions.  10:53:37 PM  permalink  

Compiling hardware from C++ code: Maxeler is a New York city company that supplies a product called ASC: A Stream Compiler for Computing with FPGAs.  "ASC is fully embedded in standard C++, and as such, ASC programs are compiled by a conventional C++ compiler. The concepts of timing and architecture of the circuit are expressed by ASC hardware types and operators. The ASC system facilitates design space exploration by providing three levels of abstraction: architecture level, arithmetic level and gate level. Since each intermediate representation is human readable C++, it is easy to optimize implementations at each of these levels and explore such optimizations within the ASC framework.  Conceptually, ASC follows the philosophy of the C programming language. The objective is to offer the capability to optimize the program for maximal performance, and at the same time provide a language interface that increases productivity. "

They claim typical 30x improvements in performance. Key factor is optimizing the data types to the bit representations to the data, rather than using standard int and float.  Varying the mantissa and exponent to fit the problem saves a lot.

"ASC provides a software-like interface to programming FPGAs and enables rapid exploration of the design space for FPGA implementations. This increase in productivity of up to 10x can result, for example, in 20-30 implementations of an algorithm in the same time it otherwise takes to develop 2-3 implementations. The advantages of ASC for an architecture that supports reconfiguration, or customizable architectures with a large number of (FPGA) nodes, have the potential to change the way we think about computing."

IP also developed: "Maxeler Technologies utilizes it's programming technology to develop state-of-the-art, flexible, parametrizable arithmetic modules and IP blocks implementing entire algorithms. Examples for our IP blocks are FFT (fixed point and floating point), Reed Solomon Code, IDEA encryption, and IDCT for video coding. "  Makes me think about linking this to genetic programming for IP generation.

  10:31:42 AM  permalink  

Findory Personalized News: "Findory uses a patent-pending method to order news articles gathered from a wide variety of sources. The algorithm combines statistical analysis of the article text and of users who viewed the articles with information about articles you previously viewed."  Uses RSS and Bayesian statistics.  Founded 4Q 2003 by a former Amazon manager who worked on personalization.  The search function seems useful.  12:19:31 AM  permalink  


daily link  Saturday, April 24, 2004


Stomp*3 Bayesian RSS Aggregator: "The Growlmurrdurr RSS aggregator is a piece of software which reads RSS feeds and allows access to them through a CGI web interface. It presents a simple and clean layout of all the news articles from the RSS feeds in chronological order. Additionally, it allows you to use Bayesian filtering to group RSS news items into two catagories, presumably the ones you're interested in and the ones you're not. It is geared towards aggregating many, many news feeds simultaniously and features the ability to specify individual reload times for each feed. It is written using Python. Python 2.3 or later is recommended as earlier versions may not work. Bayesian filtering software has been yanked out of SpamBayes. The RSS parsing module has been taken from pyblagg."  Sample site running aggregator and comments.  6:09:39 PM  permalink  

Regular Expression Mastery: a 100-slide tutorial with many tricks.  Surprisingly quick to review.  Plus, The Regex Coach, an interactive tool.  8:59:42 AM  permalink  

streaming media recording software:  comparison chart.  plus: StationRipper for easily recording radio streams.  7:47:18 AM  permalink  


daily link  Friday, April 23, 2004


Classifier4J: "Classifier4J is a Java library designed to do text classification. It comes with an implementation of a Bayesian classifier, and now has some other features, including a text summary facility. "  Last update end 2003.   It has been integrated with RSS and an NNTP & RSS reader.   11:13:55 PM  permalink  

What Are Topic Maps?:  A generalized data structure standard to link info and support general purpose navigation.  "The topic map takes the key concepts described in the databases and documents and relates them together independently of what is said about them in the information being indexed. So when a document says "The maintenance procedure for part X consists of the following steps..." the topic map may say "Part X is of type Q and is contained in parts Y and Z and its maintenance procedure resides in document W". .. The result is an information structure that breaks out of the traditional hierarchical straightjacket that we have gotten used to squeezing our information into. A topic map usually contains several overlapping hierarchies which are rich with semantic cross-links like "Part X is critical to procedure V." ..

The most common use for topic maps right now is to build web sites that are entirely driven by the topic map, in order to fully realize the their information-finding benefits. The topic map provides the site structure, and the page content is taken partly from the topic map itself, and partly from the occurrences. This solution is perfect for all sorts of portals, catalogs, site indexes, and so on. Since a topic map can be said to represent knowledge about the things it describes, topic maps are also ideal as knowledge management tools. "

From The TAO of Topic Maps: "Topic maps started life as a way of representing the knowledge structures inherent in traditional back of book indexes, in order to solve the information management problems involved in creating, maintaining and processing indexes for complex documentation. As the model evolved, their scope was broadened to encompass other kinds of navigational aid, such as glossaries, thesauri and cross references. "

  4:15:35 PM  permalink  

Quickiwiki, Swiki, Twiki, Zwiki and the Plone Wars Wiki as a PIM and Collaborative Content Tool.  Review of the many varieties of wiki, and comparison to other collaboration tools.  Links to many sources, including Zwiki (for Zope) and comparisons to Plone.  4:04:25 PM  permalink  

css Zen Garden: Amazing demo of one page with many CSS designs, from raw HTML to comic book.  3:36:38 PM  permalink  

The Content Management Comparison Tool: "Use the form below to select up to 10 content management tools to compare at once."  3:11:16 PM  permalink  

xMail: E-mail as XML: "E-mail is a good example of a structured text format that can usefully be converted to XML for processing, archiving, and searching. In this chapter, we develop xMail–a Python application to convert e-mail to XML."  3:03:21 PM  permalink  

Developer tools for web site designers:

  3:02:13 PM  permalink  

Python, JS and CSS code: Nice collection, including a Python talk that can be browsed in HTML, and interesting alternatives for dropdown menus and explorer trees.  2:53:38 PM  permalink  

RSS to JS demo: How "to insert dynamically updated RSS into any web page, blog, or Course Management System. It makes use of a PHP script (demo version running on our server) that parses the XML feed, and returns a JavaScript set of write commands that insert the information into your page. All you need to do is to insert a simple JavaScript line of code in the part of your page where you want the feed. The only other thing to tidy it up is to link or insert a sytle sheet to format the output."  2:17:45 PM  permalink  


daily link  Thursday, April 22, 2004


POPFile - Automatic Email Classification.  "POPFile is an email classification tool with a Naive Bayes classifier, a POP3 proxy and a web interface. It runs on most platforms and with most email clients."  Perl-based, open source, updated March 2004.  10:28:43 PM  permalink  

Expand compression now available as software product: "The Accelerator Server is a Linux-based software solution that ports many of the Application Traffic Management features of the new Expand Accelerator appliances like application acceleration and bandwidth efficiency tools that reduce wide area network (WAN) costs and improve application response times."  4:30:33 PM  permalink  


daily link  Tuesday, April 20, 2004


What Is Zope? A revised intro to Zope, a (mostly) Python web service platform that includes content management and other facilities.  Interesting directory of Zope Products, including the SQL2Form Automatic Form Generator.  10:16:10 AM  permalink  

OpenOffice: Interesting endorsement, with info on how it was built, and how it interoperates with everything XML.  10:13:50 AM  permalink  

Describe RSS in 10 words or less:  My favorites: 

  • The Fastest Way To Waste An Enormous Amount Of Time
  • Freebasing for Web junkies.
  • Makes life easier, but not really.
  • Remember Pointcast? Kinda like that, only actually useful.
  • News sent to your computer. No spam. No browsing.
  10:02:17 AM  permalink  


daily link  Monday, April 19, 2004


Velocity: A java template engine run in many environments, with an apache/jakarta project for generating web html.  From java.net: Velocity: Fast Track to Templating: "Velocity is a fast and easy-to-use Java-based templating engine. Velocity's speed, ease of use, and flexibility contribute to its use in a broad range of applications, including code generation, email templating, and web user-interface creation. A template is a parameterized, predesigned text format. A template engine processes a template and fills in the parameterized pieces with concrete data."  Another article: Client and server-side templating with Velocity: "Velocity is a versatile, open source templating solution that can be used standalone in report generation/data transformation applications, or as a view component in MVC model frameworks. In this article, Sing Li introduces Velocity and reveals how you can integrate its template-processing capabilities into your own client-side standalone application, server-side Web application, or Web services."  Plus, Velosurf: "Velosurf is a java database abstraction layer, for the Velocity template engine. It is meant for ease-of-use, genericity and efficiency."  6:06:37 PM  permalink  

SpamBayes: Bayesian anti-spam classifier written in Python. "The SpamBayes project is working on developing a Bayesian anti-spam filter, initially based on the work of Paul Graham."  Excellent  background page.  Includes POP and IMAP filters and Outlook plugin, but no sharing of filter info.  Work started in August 2002.  Reminds me of my 2001-2002 RDV fellowship, when I went looking for simple tools for doing Bayes filtering of RSS feeds, to rank-order and cluster articles.  Maybe now's the time?  9:49:22 AM  permalink  


daily link  Friday, April 16, 2004


Internet Arcade Games: browser versions of tetrix, pong, and many oldies.  10:51:15 PM  permalink  

Apache URL Rewriting Guide: "It describes how one can use Apache's mod_rewrite to solve typical URL-based problems webmasters are usually confronted with in practice. I give detailed descriptions on how to solve each problem by configuring URL rewriting rulesets. "  11:36:14 AM  permalink  


daily link  Wednesday, April 14, 2004


Google Adsense Test Page: Nice way to find out what sorts of ads would be put on your page by google, based on its content.   12:38:07 AM  permalink  


daily link  Monday, April 12, 2004


Oddpost: Interesting web mail client with news aggregator, Bayesian spam filtering, no advertising, and a DHTML/Javascript interface that resembles a desktop app.  Cheap for end users ($30/year) and can be lisenced (Java version available, Windows version falls from 5 users @$30/user to 1000 users @$2 or unlimited for $5000).  Hilarious weblog, too.   5:22:07 PM  permalink  


daily link  Thursday, April 08, 2004


loaf: Nifty introduction to Bloom filters, which seem potentially broadly useful: "LOAF is a simple extension to email that lets you append your entire address book to outgoing mail message without compromising your privacy. Correspondents can use this information to prioritize their mail, and learn more about their social networks."  12:08:02 AM  permalink  


daily link  Sunday, April 04, 2004


Recording contact info in XML: "I am trying to develop an address book kind of application. The contact information will be maintained in XML format. Is there any standard DTD for contacts?"  Many, here are some starting points.  12:38:49 PM  permalink  

Using libferris with XML: "This article presents the benefits of using libferris with your XML applications. libferris presents a uniform interface to hierarchical data. This data can be persisted using many providers including the filesystem, an RDBMS, or even XML. All the data providers in libferris are made available using a filesystem metaphor: MySQL tables can be seen using ferrisls on a "mysql://host/database/table" URL."  12:37:47 PM  permalink  


daily link  Friday, April 02, 2004


Gnews2RSS: "An experimental convertor that takes a Google News search and turns it into RSS" from programmer Julian Bond. Google filed a court order to shut down one service running this code.  10:50:28 PM  permalink  

Copyright 2005 © Ken Novak.
Last update: 11/25/2005; 12:03:10 AM.
0 page reads.