Comedy: making reality acceptable

May 11, 2006 on 12:29 am | In wikicompany, media, features, spider | No Comments

There's some great stand-up comedy being streamed onto the web by Cringe Humor NYC. Grab streamtuner and streamripper and have a laugh.

I wish streamtuner could also handle video, podcasts, skypecasts, and other 'live' streams from various sites, really decoupling the content from the web/RSS site interface with a simple and consistent browse/search/bookmark interface.

For Wikicompany I've been hacking at an intelligence-augmented web crawler which will create company profiles from URLs. Its pretty useful already.

The spider collects data from various sources, parses and mangles the data (including geocoding, auto tagging, logo handling) and creates a link for the Wikicompany publish form. From the publish form any manual changes can be made, before including the profile in Wikicompany. Parsing an existing profile on Wikicompany back to the form is on the todo list.

The spider is not perfect yet, but the results are very promising. I'm currently testing the algorithms on about 4000 company domains (mainly biotech companies).

I want to automate as much work as possible, but some things only a human can (currently) do. Although some good NLP software might be able to automate even more things. Some statistical approaches, once more correct context data is known, could also be interesting.

I also wrote a tool which can generate a company URL list from a list of company names, which really helps to collect large lists company URLs from various sources on the web.

Here's a small list of profiles which were gathered completely automatic:

  1. www.wholesoyco.com
  2. www.wholesomesweeteners.com
  3. www.wholefoodsmarket.com
  4. www.wholefoods.com
  5. www.whittakersearch.com
  6. www.whitlockpkg.com
  7. www.whitleyspeanut.com
  8. www.whitfieldfoods.com
  9. www.whiteysicecream.com
  10. www.whitewave.com
  11. www.whiterose.com
  12. www.whiterockdistilleries.net

Webforms, semantic tagging, REST API

March 16, 2006 on 10:53 pm | In semweb, wikicompany, features, tagging, rest | No Comments

I’ve been busy with several Wikicompany developments.

There is now an “add a company profile” web form. The old-way to create a new article was a bit too complex for most users. So, if you have a company profile to share, please do so. There are still some usability issues (image uploads, AJAX suggests for tag inputs) which I need to fix.

I’ve for some time been pondering the use of tags instead of categories. Wikicompany is now migrating to the Web2.0 world of tagging, but with a twist. The tags get a semantical annotation (and are then also RDF compliant!).

Reasons for this ’switch’?

  • The category system is too strict and maintenance heavy.
Listen to this great presentation “What Time Does to Categories” by Clay Shirky to understand the underlying reasons for this statement.
  • Easier input system for users.
  • Better search and browse possibilities.
  • Better REST web service integration, needed for internal and external content syndication.

For more details look here.

Then finally I implemented a true REST web service API for Wikicompany. The API supports RSS2.0/ATOM/JSON/HTML output. With the RSS output now supporting all profile fields. At some point a “HTTP PUT” interface may also be setup if there is a need. Some VNU Examples:

Once the semantic tagging system is working more complex queries can be answered.

Feedback is welcome via this blog, the mailinglist, or send an email to: infoATwikicompany.org

Mainpage cleaning & AJAX

February 20, 2006 on 2:01 am | In wikicompany, features, usability, ajax | No Comments

I’ve been re-styling the mainpage of Wikicompany to put more focus on important and interesting elements (see screenshot below).

Next up will be an AJAX-style interface for searching, publishing and browsing.

The search interface will be a simple company / sector / region input-form. Later, an additional advanced search form will be created, with optional RSS/Atom output, to start using the semantical annotations.

The publishing interface will be written in the form of a Specialpage, with pseudo multi-page forms, auto-suggests, and more  web2.0 ingredients.

The browsing interface is pretty much there already, but some things could be probably be improved (at the cost of performance). What would be cool, would be a dynamic filtering interface when browsing in the category tree.

Wikicompany Mainpage

Tagclouds

February 13, 2006 on 11:08 pm | In wikicompany, features, tagging | No Comments

I hacked up a tagcloud extension for Wikicompany … just to be more web2.0 compliant -P . See the main page for an example.

The frequency of article visits are automatically extracted from the Apache webserver logs, after doing some log cleaning first. I was surprised to see the Asian company profiles being relatively popular.

There are probably some other good uses for presenting tagclouds on Wikicompany, but just to be clear I also think those tag-everything-and-more web2.0-wannabe sites are really lame.

- Jama Poulsen

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^