May 16, 2006 on 2:56 pm | In sparql, semweb, wikicompany, mediawiki | No Comments

PS: I'm currently looking for interesting work in the semweb/web service/php/perl/linux/mediawiki field. So if any employer out there is interested, I can send my resume. I currently live in the Netherlands, but would be willing to relocate if I really like the job/location context. My ideal would be NZ ;-) , I much enjoyed my 2 month adventure there a couple of years ago. Email: [email protected]

I've now managed to setup the SPARQL powered REST interface for Wikicompany.


Booleans as in: "show biotech or healthcare companies operating in the US and Europe" are not correctly parsed yet:

Multi-value filters are also not working correctly yet (I suspect the ARC SPARQL parser does not correctly handle '&&' in filters). I emailed the author of ARC about this, (but perhaps I'll be able to fix this myself):

Relations (such as customers, competitors etc.) will soon also be findable using the REST interface. Sorting is also not fully implemented yet.
Now I need to setup a nice form interface for querying Wikicompany (and perhaps other relevant RDF repo's also!).

I like the current design of the REST interface, its simple and clean and allows for expressive queries (without any SPARQL knowledge).

I see the semantical tags and relations as the first noise filter., at some point in the future the results from these queries should also be able to be filtered based on full-text search.

An additional smart method would be to do optionally do a full-text search for each unkown semantical tag.

MediaWiki is hot

May 14, 2006 on 2:25 am | In wikicompany, trends, wiki | No Comments

I've started implementing the second part of the REST interface for Wikicompany. When done, more complex semantic tag queries can be answered.


The returned output will be a list of URLs in RSS/Atom/JSON/RDF format, pointing to the company profile.

Comedy: making reality acceptable

May 11, 2006 on 12:29 am | In wikicompany, media, features, spider | No Comments

There's some great stand-up comedy being streamed onto the web by Cringe Humor NYC. Grab streamtuner and streamripper and have a laugh.

I wish streamtuner could also handle video, podcasts, skypecasts, and other 'live' streams from various sites, really decoupling the content from the web/RSS site interface with a simple and consistent browse/search/bookmark interface.

For Wikicompany I've been hacking at an intelligence-augmented web crawler which will create company profiles from URLs. Its pretty useful already.

The spider collects data from various sources, parses and mangles the data (including geocoding, auto tagging, logo handling) and creates a link for the Wikicompany publish form. From the publish form any manual changes can be made, before including the profile in Wikicompany. Parsing an existing profile on Wikicompany back to the form is on the todo list.

The spider is not perfect yet, but the results are very promising. I'm currently testing the algorithms on about 4000 company domains (mainly biotech companies).

I want to automate as much work as possible, but some things only a human can (currently) do. Although some good NLP software might be able to automate even more things. Some statistical approaches, once more correct context data is known, could also be interesting.

I also wrote a tool which can generate a company URL list from a list of company names, which really helps to collect large lists company URLs from various sources on the web.

Here's a small list of profiles which were gathered completely automatic:


Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^