Free Software :: Free Culture & Archiving Planet
Free Culture projects:
Research links:
ToC
- EFF : Righthaven's Brand of Copyright Trolling
- EFF : EFF Asks Court to Protect Craigslist from Defamation Suit
- Linux Foundation : X Census (for 1.9)
- Public Knowledge : Public Knowledge Applauds FCC Proposals On Broadcast “White Spaces,” “E-Rate” Reform.
- BioMed OA : Join the data debate: draft position statement on open data
- Public Library of Science : PLoS Currents is expanding
- BioMed OA : sMRI – the most powerful Alzheimer’s disease biomarker?
- BioMed OA : BMC Research Notes – adding value to your data
- BioMed OA : Journal of Molecular Signaling welcomes new co-Editor-in-Chief
- Inside Google Book Search : The Armchair Traveler
- Linux Foundation : More GPL enforcement work again.. and a very surreal but important case
- Linux Foundation : The People Who Support Linux: At Work and at Home
- Information Aesthetics : US Open Tennis Real-Time Data Visualization
- Public Knowledge : Copps Displays FCC Leadership
- Public Library of Science : Announcing PLoS Blogs
- Wikimedia : September WMF Engineering Update
- Grassroots mapping : Oil contamination… from the Exxon Valdez
- Public Knowledge : Public Knowledge Expects ‘Prompt’ FCC Action To Protect Broadband Consumers
- Open Video Conference : VIDEO: a peek at our interview with Susan Crawford
- Google Research : Towards Energy-Proportional Datacenters
- Koha : 8 Weeks ’til KohaCon – Are you registered?
- Open Knowledge Foundation : The Power of Open Data
- Research Remix : Dear publisher, is the data open?
- Planet Linked Data : A New Methodology for Building Lightweight, Domain Ontologies
- Elphel : Initial OpenLayers mockup to display images
- Linux Foundation : Torvalds Causes Mob Scene at LinuxCon Brazil
- OStatus : OStatus 1.0 Draft 2 Available under OWFa
- Linux Foundation : Best practices in Open Source Governance at Open World Forum
- NLP : Online Learning Algorithms that Work Harder
- Open Video Conference : This Is Not a Hoax: The Yes Men at Open Video Conference
- Linux Foundation : Rise in use of EUPL for publishing open source software
- Linux Foundation : Free Open Source Academia Conference (fOSSa), November 8-10, Grenoble
- Linux Foundation : Novell Disappoints as Ownership Concerns Continue
- OpenSocial API : Eureka! Lockheed Martin contributes OpenSocial platform to open source
- Dublin Core Metadata : New Task Groups for revising the User Guide and reviewing the DCMI Abstract Model
- Dublin Core Metadata : NISO/DCMI Webinar slides published
- Tesseract : OCR of Screenshots
- EFF : Reading, Writing, and RFID Chips: A Scary Back-to-School Future in California
- Public Knowledge : The Intellectual Property Breakfast Club
- Wikimedia : Google Summer of Code conclusion
- Open Knowledge Foundation : Slides and notes from Data Driven Journalism event
- Public Knowledge : PK In the Know Podcast: Interview with WFMU's Ken Freedman
- Public Knowledge : Open Hardware Summit 2010
- Public Knowledge : The Digital Broadband Migration: The Dynamics of Disruptive Innovation
- Elphel : Elphel-Eyesis, assembled
- Science Commons : University Public Access Policy Whitepaper Part 2
- Open Video Conference : Announcing the Shared Film Festival at OVC
- Linux Foundation : Open Contact with Open Compliance Officers
- Linux Foundation : Software Freedom Law Center to Announce Opening of Branch in India
- Linux Foundation : Should Open Source Communities Avoid Contributor Agreements?
- Planet Linked Data : A Brief Survey of Ontology Development Methodologies
- Linux Foundation : Let a thousand flowers bloom...or be trampled under foot?
- EFF : Good News: Security Researcher Released on Bail
- Tesseract : Init() returning -1
- Koha : Book chapters proposals about Koha
- Tesseract : Tesseract Training Problem (under Mac)
- Mozilla Drumbeat : The reviews are in: Drumbeat’s “Popcorn” is tasty
- Tesseract : i want to add my own language
- NLP : Calibrating Reviews and Ratings
- Linux Foundation : Sun RPC is finally free software
- Mozilla Drumbeat : Mark Surman: 10 days of freedom in Barcelona
- Calibre : calibre 0.7.16
- EFF : Colbert's Word: Control-Self-Delete
- EFF : Facebook Should Stop Censoring Marijuana Legalization Campaign Ads
- OpenStreetView : OpenTrailView: Route making
- Mozilla Drumbeat : Drumbeat Festival: registration is now open!
- BioMed OA : Facilitating standardized genome annotations
- Public Knowledge : Public Knowledge Statement on GAO Cellular Industry Report
- Tesseract : math formulas
- BioMed OA : Arthritis Research & Therapy – published online only from 2011
- Global Text Project : Dr. Jim Feher, GTP textbook author, talks about working with Global Text Project to create and publish an open textbook
- EFF : Musopen Wants to Give Classical Music to the Public Domain
- EFF : EFF's Cindy Cohn Wins IP Vanguard Award from State Bar of California
- EFF : EFF Seeks to Help Righthaven Defendants
- if:book : open peer review
- WebM project : HTML5Rocks <video> tag tutorial
- Music Brainz : MusixMatch becomes our customer!
- Music Brainz : Track level Advanced Relationships for NGS
- Public Knowledge : Verizon Defense of Veroogle Plan Falls Short
- BioMed OA : Melatonin therapy effective in treating primary insomnia
- EFF : Jury Invalidates One of EFF's 'Most Wanted' Patents
- EFF : Steve Jobs Is Watching You: Apple Seeking to Patent Spyware
- EFF : UPDATED: Security Researcher Arrested for Refusing to Disclose Anonymous Source
- BioMed OA : Does genetic test allow prediction of patients’ response to tamoxifen?
- NLP : Finite State NLP with Unlabeled Data on Both Sides
- Planet Linked Data : Listing of 185 Ontology Building Tools
- Tesseract : Tess4J - a Java wrapper for Tesseract OCR DLL
- Public Knowledge : Appreciation: W. Adam Thomas, Public Knowledge Staff Attorney
- Open Knowledge Foundation : Beginnings of an Object Description Mapper
- Open Knowledge Foundation : Data.gov.uk releases CKAN Drupal Module
- NLP : Readers kill blogs?
- Tesseract : recognition languages sets? with hierarchy?
- Tesseract : Any idea of Tesseract 3.0 release date
- Wikimedia : Usability Improvements: Final Phase of Rollout
- Public Knowledge : ISPs Want to Have Their First Amendment Cake and Eat it Too
- Public Knowledge : The Incredible Shrinking FCC
- Open Knowledge Foundation : Data Journalism Meetup, Berlin, 1st September 2010
- Mozilla Drumbeat : Education for the open web fellowship: new deadline
- Public Knowledge : Why I'm Amused Rather Than Outraged Over New "Industry Negotiations" -- And What The Democrats Need To Understand
- Tesseract : Line of equals symbols not recognized
- Calibre : calibre 0.7.15
- if:book : hospice for publishers
- WebM project : WebM Semantic Video Demo
- WebM project : FFmpeg VP8 Decoder Implementation
- NLP : Multi-task learning: should our hypothesis classes be the same?
- Open Knowledge Foundation : Vote Raw Data Now at SXSW panelpicker - ends 27 August
- Tesseract : Which revision of tesseract 3.0 for win7 64bit
- Global Text Project : Frank W. Spencer PHD on GTP text Educational Psychology, a review
- OCRopus : Appending new ground truth to the default language model
- Mozilla Drumbeat : Mark Surman: Brett Gaylor joins Drumbeat team
- BioMed OA : Hereditary Angioedema: Management Consensus 2010 – a thematic series
- W3C Semantic Web : Public W3C Questionnaire on RDF Evolution
- Open Knowledge Foundation : Workshop on Open Bibliographic Data and the Public Domain
- VuFind : VuFind featured in Converge magazine
- Zotero : Zotero Basics: Getting Stuff Into Zotero
- Inside Google Book Search : Chocolate... in a nutshell!
- Inside Google Book Search : Happy Birthday, Emily Brontë!
- BioMed OA : BioMed Central to take on Nature in 10K charity run
- Dublin Core Metadata : Further details added to DC-2010 program
- Dublin Core Metadata : Presentation opportunities for DCMI Partners at DC-2010
- OStatus : OStatus interview with Tyler Gillies
- Open Knowledge Foundation : Gathering, Preserving and Reusing our Cultural Heritage - the OKFN Cultural Heritage Working Group.
- Open Text Book : P2PU
- Open Video Conference : Ethan Zuckerman of Berkman and Global Voices at OVC
- Mozilla Drumbeat : Open Video Alliance: Ethan Zuckerman of Berkman and Global Voices at OVC
- Open Knowledge Foundation : B-Open: Open Data from Bristol City Council
- Planet Linked Data : I Have Yet to Metadata I Didn’t Like
- Open Video Conference : Remixer Jonathan McIntosh at OVC
- Koha : Koha Newsletter: Volume 1/Issue 8: August 2010
- OpenStreetView : Minor update: panorama viewing now using canvas tag
- Elphel : Eyesis-in-Car GUI mockup
- Music Brainz : Please welcome our new Style Leader: Nikki
- VuFind : VuFind 1.0.1 Released
- Open Book Alliance : Betrayal Makes Strange Bedfellows
- Open Medicine : Open medicine is approved for MEDLINE indexing
- Open Knowledge Foundation : Open Government Data Camp 2010, 18-19th November 2010
- BioMed OA : Approaching technical hurdles to iPS technology
- Calibre : calibre 0.7.14
- Open Video Conference : Vincent Moon of La Blogotheque at Open Video Conference
- Mozilla Drumbeat : What can Mozilla Drumbeat learn from the Awesome Foundation?
- Open Medicine : What's new at Open Medicine? August 2010
- OpenStreetView : OpenTrailView: Improved photo upload and management
- Mozilla Drumbeat : Mark Surman: MozFdn July 2010 status update
- Mozilla Drumbeat : Mark Surman: Experiment: badges, identity and you
- Musopen : Free Textbook Project – Looking for Volunteers
- BioMed OA : Computer gamers solve medical problems
- BioMed OA : Sequencing of a tumor and its metastases
- OCRopus : How to use some features of Ocropus ?
- Open Book Alliance : Google and the Backroom
- Information Aesthetics : Infographic and Data Interface Videos: the Latest of the Greatest
- Planet Linked Data : DBpedia 3.5.1 available on Amazon EC2
- Open Video Conference : Jamie Wilkinson and Graffiti Markup Language at OVC
- Music Brainz : Care to take the Android app for a spin?
- Open Knowledge Foundation : Cataloguing Bibliographic Data with Natural Language and RDF
- WebM project : Easy Tricks for Finding WebM Videos in YouTube
- Planet Linked Data : An Executive Intro to Ontologies
- Elphel : SCINI Takes Elphel Under Antarctic Ice
- Wikimedia : Database errors on most Wikipedias
- AKSW Semantic Web : DL-Learner Build 2010-08-07 released
- OCRopus : OCRopus on CentOS 5
- BioMed OA : Eastern moles evolve different haemoglobin to facilitate fast tunnelling
- Calibre : calibre 0.7.13
- Open Video Conference : OVC volunteers meeting in New York
- OStatus : OStatus interview with Pablo Martin
- W3C Semantic Web : Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1 Published
- Inside Google Book Search : Books of the world, stand up and be counted! All 129,864,880 of you.
- Information Aesthetics : Fata Morgana: The World without a Map
- Open Medicine : Female doctors & students from "Across the World Unite"
- Music Brainz : Downtime: We need to tidy house and vacuum our database
- Open Video Conference : Amelia Andersdotter, Piratpartiet MEP at OVC
September 03, 2010
Copyright trolls are nothing new, and Righthaven is just the latest group of lawyers to try to turn copyright litigation into a business model. What these lawyers have in common is that they seek to take advantage of copyright's draconian damages in order to bully Internet users into forking over money. To anyone who has watched the file-sharing lawsuits of the last few years or the current BitTorrent cases brought by a DC law firm, the Righthaven saga is developing into a familiar, unfortunate story. It also has some especially troubling twists.
The basic pattern: Righthaven has brought over a hundred lawsuits in Nevada federal court claiming copyright infringement. They find cases by (a) scouring the Internet for parts of newspaper stories posted online by individuals, nonprofits, and others, (b) buying the copyright to that particular newspaper story, and then (c) proceeding to sue the poster for copyright infringement. Like the RIAA and USCG before them, Righthaven is relying on the fact that their victims may face huge legal bills through crippling statutory damages and the prospect of paying Righthaven's legal fees if they lose the case. Consequently, many victims will settle with Righthaven for a few thousand dollars regardless of their innocence, their right to fair use, or other potential legal defenses.
However, Righthaven is unlike other copyright trolls in some key ways:
- Righthaven is going after bloggers using text news stories for comment or discussion. Many lawsuit targets are using the newspaper articles to augment discussions about current events. Reposting all or part of news stories is part and parcel of digital commentary and discussion and usually the goal of the reposting is to share the uncopyrightable facts included in the article, not the copyrighted expression, like the specific turns of phrase used by the author. By targeting news, Righthaven's lawsuits could have a chilling effect on individuals' attempts to engage their communities in free and open discussion.
- Righthaven is fighting the basic mode of Internet debate. Other copyright trolls have involved controversy over file-sharing programs and encoded digital media, like music and movies. But Righthaven is taking aim at folks who are using elementary "copy & paste" functionalities. Online discussion survives and thrives on showing others the original text before adding a commentary or response. Accurate quoting is a virtue of Internet discussion that can minimize mischarcterization and support progress in a debate.
- Righthaven lawsuits are demanding that courts freeze and transfer the defendants' domain names. Imagine if a single copyright infringement on Huffingtonpost.com or Redstate.com could result in forfeiture of the entire domain. Effectively asking for control of all of a website's existing and future content -- instead of only targeting the allegedly infringing material -- is an overreaching remedy for a single copyright infringement not validated by copyright law or any legal precedent. This also indicates that the attorneys are willing to make overreaching claims in order to scare defendants into a fast settlement.
- Righthaven goes straight for litigation. Righthaven isn't sending cease and desist letters or DMCA takedown notices that would allow the targeted bloggers or website operators to remove or amend only the news articles owned by Righthaven. Instead, Righthaven starts with a full-fledged lawsuit in federal court with no warning. It's sue first and ask questions later, which smacks of a strategy designed to churn up legal costs and intimidate defendants into paying up immediately, rather than a strategy aimed at remedying specific copyright infringements.
Righthaven is claiming that its activities are intended to have a "deterrent effect" on the reposting of news stories online, but it's hard to resist viewing Righthaven's actions as purely business-related. In addition to the sharp legal tactics discussed above, Righthaven appears to only buy copyrights that it believes can be used for lawsuits and otherwise has no involvement in the practice of journalism.
Righthaven also appears to be soliciting other newspapers to sign on with it. But newspaper publishers who think that suing bloggers a story at a time will save journalism are sorely mistaken. Newspaper publishers have actually been having meaningful discussions about innovative business models to support real journalism. Sadly, Righthaven -- if it continues to attract clients -- threatens to derail those conversations with a sideshow proven to distract from progress.
But no matter where a newspaper may stand on the debate about journalism's future, we think it is abundantly clear that a "sue the audience" tactic is nowhere near worth considering. Newspapers should resist the temptation to put themselves into the same position as the music industry circa 2004, where futile lawsuits distracted them from the incorporating new technology and creating new ways to market product to fans.
EFF is watching Righthaven and other copyright trolls closely for overbroad tactics that hurt free speech and fair use, and abuse the legal system. We're looking for good cases to defend and will deliver more news and analysis as the issue develops.
September 03, 2010 12:38 AM
September 02, 2010
San Francisco - The Electronic Frontier Foundation (EFF) and a coalition of public interest groups and law professors have asked a California appeals court to protect craigslist from a lawsuit that could spur websites to be less helpful in responding to complaints about user behavior.
In Scott P. v. craigslist, Inc., the plaintiff complained about a series of craigslist ads he said were written by impersonators. While craigslist removed the ads within minutes of his phone calls, the plaintiff sued, contending that craigslist broke a promise to "take care of it" when the impersonators posted additional ads. In cases like these, federal law -- specifically Section 230 of the Communications Decency Act -- shields Internet forums like craigslist from liability. Section 230 was designed to encourage parties to pursue action against those who created the questionable content instead of the platform that hosted it. But the California Superior Court has ruled that this case can continue because of the plaintiff's allegations that craigslist said it would help.
Craigslist filed a writ petition with the Court of Appeal for the State of California Wednesday, arguing that the trial court should have dismissed the case because of Section 230's protections for forum hosts. In an amicus letter filed today in support of craigslist, EFF argues that the lower court reasoning could create a hole in Section 230, discouraging forum owners from helping users.
"Section 230 was a deliberate effort by Congress to encourage service providers to find innovative ways to self-regulate," said EFF Senior Staff Attorney Kurt Opsahl. "Yet craigslist is facing the prospect of extended litigation because it tried to do just that. Allowing this litigation to continue could result in websites being less helpful to users with complaints."
Additionally troublesome is the specter of further lawsuits, which could convince other Internet innovators not to host user content at all.
"Congress created Section 230 to allow for online interactivity without a flood of lawsuits. But this case could undermine the immunity that the law created," said Opsahl. "If litigation can survive merely because a plaintiff asserts that the site made a vague promise, sites may decide that allowing comments or user generated content is not worth the legal exposure. Then we'll lose the vibrant online environment that Section 230 helped create in the first place."
Joining EFF in the letter to court were the Center for Democracy and Technology, the Citizen Media Law Project, and law professors Eric Goldman, David S. Levine, David G. Post, and Jason Schultz. Separately, a group of Internet companies, including Yahoo!, Amazon, Facebook, Twitter, Google and Linkedin filed another amicus brief in support of craigslist.
For the full amicus letter:
http://www.eff.org/files/filenode/craigslist_v_sup/EFFletter9210.pdf
For more on this case:
http://www.eff.org/cases/craigslist-v-superior-court-california
Contact:
Kurt Opsahl
Senior Staff Attorney
Electronic Frontier Foundation
kurt@eff.org
September 02, 2010 08:29 PM
Tiago Vignatti has published some numbers showing who contributes to X.org: "Of course lines of code and changeset are far from being a good metric to see actually how the development happened. But still, it does represents something. For sure, there’s also a lot of other inaccurate information that I’m missing from this all. For instance, companies like Collabora does X development but sometimes get the merits for Nokia. Is that fair? I don’t know."
September 02, 2010 08:23 PM
For Immediate Release:
September 2, 2010
The Federal Communications Commission (FCC) issued a Public Notice today detailing the proposed agenda for its next Public Meeting, scheduled for September 23, 2010. The following statement is attributed to Harold Feld, Legal Director, Public Knowledge:
“Today’s proposals represent real, concrete steps in fulfilling the promise of the National Broadband Plan. Voting final rules for the use of the broadcast white spaces will make much needed spectrum available for broadband. At a time when cell phone providers like AT&T are building wifi hot spots in places like Times Square to meet the demand created by the iPhone and other “smart” wireless devices, making use of empty television channels for ‘wifi on steroids’ will improve broadband access from the most crowded cities to rural America.
read more
September 02, 2010 08:14 PM
BioMed Central supports the goals of the Panton Principles for Open Data in Science but putting them into practice needs to be done in careful consultation with the scientific community to ensure that researchers still receive appropriate credit for their contributions.
Rather than restricting access to data through restrictive licensing terms, cultural norms need to be defined for the assignment of credit, priority with respect to initial publication and the determination of reasonable embargo periods. Fields such as astronomy, economics and genomics have already made significant progress in this direction.
BioMed Central has drafted a position statement on data sharing, open data and licensing, and we invite the wider scientific community to join the discussion to help us define an explicit open data licensing policy going forwards.
The statement discusses what we see as “the Five Ws” for open data, which includes a proposal that, from a specific date, any author submitting to a BioMed Central journal would agree to dedicate the data elements of their article and supplementary material to the public domain and apply an open data conformant licence, such as Creative Commons CC0.
We invite the scientific and publishing community to join us in defining the optimum way to put the Panton Principles into practice. Comment publicly on the draft statement by using the comment function on this blog. Alternatively, contact us to get involved.
BioMed Central will also be discussing these issues as part of the panel discussion on Publishing primary research data at Science Online London on 3rd September 2010.
September 02, 2010 07:03 PM
Apart from the formation of neurofibrillary tangles and deposition of amyloid plaques, other hallmarks of Alzheimer’s disease (AD) include the loss of both neurones and synapses in the human brain. There is evidence to suggest that this neurodegeneration is closely associated with cognitive decline, which is why structural magnetic resonance imaging (sMRI), which measures brain morphometry, is considered to be a powerful AD biomarker.
In an important
review published in
Alzheimer’s Research & Therapy earlier this week, Vemuri and Jack neatly summarise the role of sMRI in AD. They compare sMRI to the other major AD biomarkers typically studied, discuss the ways in which information can be extracted from sMRI images to condense atrophy information from patients’ scans and highlight the different roles of sMRI as an AD biomarker, including its use in predicting the progression of mild cognitive impairment to Alzheimer’s disease, measuring the efficacy of therapeutics and screening in clinical trials.
sMRI is a stable biomarker of AD progression and is useful in measuring disease intensity, however the authors stress that we should not rest on our laurels, but continue to build on it, by looking to develop automated techniques of extracting disease-specific information from images and by integrating it with other existing biomarkers for clinical use.
September 02, 2010 11:26 AM
Support for scientific data sharing is gathering more and more support in 2010, so rather than “why share data?” the question now is “how?”. Making data available in readily interpretable formats is vital to realising its value in driving new knowledge discovery, and BMC Research Notes today launches a new initiative aimed at promoting best practice in sharing and publishing data, with a focus on standardized, re-useable formats.
Across biology and medicine new data standards are emerging or are already in use, but many may not be enforced by journals or funding agencies, or benefit from established, structured databases for data deposition, such as ArrayExpress for microarray data. Adding value to data has always been at the core of BMC Research Notes’ strategy and the journal aims to produce guidance for authors on domain-specific data standards, to complement our figure preparation guidelines. But as the scientific community itself is best placed to advise on the most appropriate formats for data, the journal has opened this project up to the scientific community and is asking researchers and data managers for their contributions.
Integral to these educational Data Notes will be the inclusion of an example dataset as an additional file, or link to a permanently-available dataset, which can serve as a reference example. Readily re-usable data from a cancer cohort is also published in BMC Research Notes today in the article by Vickers and Cronin, which accompanies the editorial that outlines the goals of this data-driven collection.
Indeed, the future of scholarly communication and research increasingly depends on a commitment to data. Just yesterday in JAMA a commentary on the US Department of Health and Human Services' Open Government strategy discussed the benefits to science – and the economy – of public-use health data sets that maintain privacy. It further called for data to “be released in standardized formats, without intellectual property constraints.”
“Data is the underlying foundation of our science and it is crucial for both replicating results as well as building on them that we work harder at making data more effectively available and useable. It is great to see a pioneer of the Open Access literature like BMC providing leadership on the issue of making data openly available and providing the tools that will enable researchers to improve on current practice,” said Dr Cameron Neylon co-author of the Panton Principles for Open Data in Science.
BioMed Central is waiving the article processing charge for contributions to this special collection of articles, which also extends to contributions on broader aspects of scientific data sharing, archiving, and open data. Contact the BMC Research Notes editorial team for more information or, if you are at tomorrow’s Science Online London, come and talk to us at the session on ‘Publishing primary research data’.
September 02, 2010 10:15 AM
Yung Hou Wong, Head of the Section of Biochemistry and Cell Biology, Division of Life Science, at the Hong Kong University of Science and Technology, has recently joined Journal of Molecular Signaling as co-Editor-in-Chief alongside Danny Dhanasekaran. Professor Wong is a leading expert in the molecular pharmacology of G protein-coupled receptors, signal transduction and integration.
Journal of Molecular Signaling was launched in 2006 and encompasses different molecular aspects of cell signaling underlying normal and pathological conditions. The focus of the journal is on the normal or aberrant molecular mechanisms involving receptors, G-proteins, kinases, phosphatases, and transcription factors in regulating cell proliferation, differentiation, apoptosis, and oncogenesis in mammalian cells. This area also covers the genetic and epigenetic changes that modulate the signaling properties of cells and the resultant physiological conditions. A most highly accessed recent article in the journal determines the molecular effect of sulforaphane (SFN, found in cruciferous vegetables) in growth arrest of pancreatic cancer cells.
We would like to welcome Yung Hou Wong to his new role with this growing journal. He says that “Journal of Molecular Signaling is a significant avenue for researchers in the area of cell signaling to share their discoveries and innovations, and contribute towards the advancement of the field. I am excited to be a part of the team and look forward to working with the editorial board to increase its impact as well as its value to the growing readership across the scientific community and around the world.”
September 02, 2010 10:05 AM
Posted by Cheryl Pon, Google Books Online Team
[Please note, some images in this post may not be available in full view to users outside of the United States.]
Now that it’s early September and we’re officially in the dog days of summer, what better way to spend this hot, sultry period than to take a refresher and travel to exotic lands afar? Even if you’re working through the summer or are more of a staycationer, you can take a trip around the world by exploring different countries through Google Books!
Courtesy of books scanned via our library project, anyone can stroll through China, experience ninety days' worth of Europe or get to know South America. And if you’re feeling a little fantastical, you can leave Kansas behind and head off with Dorothy to explore the land of Oz.
With the plethora of travel-related books available in full view on Google Books, you can explore the world and be visually enlightened with sights from afar from the comfort of your couch and a frosty glass of lemonade!
Check out the beautiful Flower Pagoda in Canton, China:

Swing by the Uffizi Gallery in Florence to admire the Birth of Venice in Italy and the Italians by Edward Hutton:

See London through Herbert Fry’s eighteen bird's-eye views of the principal streets, or be a Wanderer in Paris experiencing the lovely cafés, museums and walks down rue de l'hôtel de ville:

And while you're there, why not visit the Arc De Triomphe De l’Etoile?

If you’re more of a nature-lover, hitch up your wagon of books via My Library on Google Books and set off on the Oregon Trail and imagine wildflowers, horseback riding, and gorgeous sunsets on plains via first-hand experiences penned by Francis Parkman, or if you’re feeling really adventurous, literally "book" yourself an around-the-world experience by traveling alongside Jules Verne for Five Weeks in a Balloon. For an intellectual dos of scientific observations, you can travel from Chile to Argentina and back again with Charles Darwin's Voyage of the Beagle.
After you return from your incredible journeys, you can easily show other readers your virtual trip by sharing images you found interesting. Blog interesting images using our Share This Clip feature in Google Books, and share your bookshelf with family, friends, or the world!

September 02, 2010 09:49 AM
Harald Welte writes that he's doing more work on the gpl-violations.org project again: "Right now I'm facing what I'd consider the most outrageous case that I've been involved so far: A manufacturer of Linux-based embedded devices (no, I will not name the company) really has the guts to go in front of court and sue another company for modifying the firmware on those devices. More specifically, the only modifications to program code are on the GPL licensed parts of the software.
September 02, 2010 08:29 AM
Chase Crum is a U.S. Army veteran, a Shriner, an IT infrastructure manager, and a member of The Linux Foundation. This certainly does not capture all that defines Chase, but it begins to illustrate where he derives his ideas about Linux, community and giving back. Chase also represents a growing majority of systems administrators and IT managers who are using Linux both at work and at home.
September 02, 2010 08:00 AM

On the heels of the many real-time sports visualizations that appeared alongside the recent FIFA soccer worldcup, the US Open Pointstream [usopen.org] presents an original 3D-like way of exploring the statistical data generated during all the live tennis matches of one of the most famous sports events in the world.
Users are able to select individual matches which occurred in the past or are still in progress. A "Momentum Meter" shows who is on top of the match, while a series of filters at the bottom (e.g. ace, double foult, netpoint, breakpoint, ...) allow for deeper analysis of the data. Visually, each player is distinguished by the color green or blue. Each ring represents a set, going from the inside to the outside. Each bar represents a point, with its height according to the serving speed.
Beautiful or useful?
September 02, 2010 12:56 AM
September 01, 2010
Federal Communications Commissioner Michael Copps has managed the art of saying much in a few words. His latest salvo came in a 245-word letter to the editor in the Washington Post, in which he not only savaged yet another misbegotten Washington Post editorial about Internet policy, but also took on the Verizon-Google joint policy “recommendation” and then noted the cruel reality of the agency to which he has devoted almost nine years of his professional career.
He, and others, recognize that this is a unique time in the history of the FCC, and perhaps of regulation and politics. It happens from time to time in Congress that a legislator will vote against a bill that he or she has introduced, usually after an amendment has been added that drastically changes the bill, or in the case of some shift in the political dynamic.
read more
September 01, 2010 08:45 PM
The Wikimedia Foundation Engineering staff has grown quite a bit over the past year, which has made it a lot harder for everyone to keep track of what we’re all working on. In an effort to make things a little clearer, we plan to report monthly on all of our active efforts, and maintain information pages on all of our active projects. Note that this isn’t (yet) a complete list of everything that the Wikimedia Foundation engineering team is up to, but we plan to make this increasingly comprehensive and more organized as we get better at putting together these reports. Here is a full list of projects.
You’ll see that each of these areas has a program manager assigned to the area. That’s the person who is responsible for coordinating the activity in that area, and someone from whom you can expect to get more detailed updates. More below the fold…
Operations
Virginia Data Center – Setting up a world-class primary data center for Wikimedia Foundation properties.
- Status: We’re in the final selection phase for which facility will house our new primary data center in the Ashburn, Virginia area.
- Program manager: Danese
Media Storage – Re-vamping our media storage architecture to accomodate expected increase in media uploads.
- Status: We currently use Solaris/ZFS as the file system for media storage. Due to the rollout of our media-related projects (see “Multimedia tools” below) which have the potential to increase the load on our media storage infrastructure, we’re currently evaluating whether we are going to stay on ZFS, as well as what sort of infrastructure we need to implement in concert with whatever file system implementation we choose. This project will try to look at the whole strategy to design / implement a solution that will scale sufficiently for the next couple of years of projected growth.
- Program manager: Danese
Monitoring – Enhancing both ops and public monitoring to a) notice potential outages sooner, b) increase transparency to the community, c) support progress tracking required in the 5-year plan.
- Status: We use Nagios for systems/load monitoring, but we haven’t taken the time to tune its alert throwing to be really useful to us. We need to increase tooling to better monitor performance metrics such as page load time in target markets (such as India).
- Program manager: Danese
Content Quality Tools
Article assessment – Working on feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia
- Status: We’re in the beginning phase of this project, figuring out requirements and generally determining the scope of our near-term and long-term efforts in this area. We are currently working on a pilot rating system which will be available as part of the Public Policy pilot program in late September.
- Program manager: Alolita
Pending changes enwiki trial Pending Changes is a new review feature recently deployed to en.wikipedia.org, which allows changes made by anonymous and new users be reviewed before they appear as the primary version of an article.
- Status: The official trial period has ended, with a straw poll now underway. Nimish Gautam and Devin Finzer put together some helpful statistics that we hope helps everyone how the feature performs on a per-article basis. Howie has provided some additional analysis to help interpret the numbers. Chad Horohoe has done some work on diagnosing and fixing some lingering performance issues with the feature as his schedule allows. Aaron Schulz is helping out when and where he can as he tackles other obligations, generally advising the rest of us on many aspects of the system.
- Program manager: RobLa
Threaded discussions
Liquid Threads – LiquidThreads is an extension that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
- Status: Most of the back-end work is now complete. We are currently focusing on user experience improvements necessary for much wider deployment. Some of our latest design work can be found here: http://www.mediawiki.org/wiki/LiquidThreads/Redesign
- Program manager: Alolita
Multimedia tools
Upload wizard – The upload wizard is an extension for MediaWiki providing an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
Add media wizard – The Add-media wizard is a gadget to facilitate the insertion of media files into wiki pages. Its development is supported by Kaltura.
- Status: This tool was originally released as a gadget on a test server. It’s currently being adapted to run as an officially supported extension and be better integrated into MediaWiki. This effort will be assisted by the deployment of the Resource Loader (see below).
- Program manager: Alolita
MediaWiki Infrastructure
Resource loader – The resource loader aims to improve the load times for JavaScript and CSS components on any wiki page.
- Status: Trevor and Roan are busy implementing this feature, with hopeful completion sometime in the next month or so.
- Program manager: Alolita
Central Notice – CentralNotice is a banner system used for global messaging across Wikimedia projects.
- Status: We’re revamping the CentralNotice extension to make it easier to add, manage, and test new banners and campaigns. We’re also looking into including new functionality like geo-location and tightly coupling in analytics to improve our decision making. We’re not only looking to make this tool more usable for fundraising but also simple and broad enough to benefit the Wikimedia community as a whole.
Ryan Kaldari recently finished up our first phase of making input simpler. The interface has gotten a huge face lift and is quickly approaching the discussed mockups athttp://meta.wikimedia.org/wiki/CentralNotice_upgrades . We’ve tried to focus on not overwhelming our users and have chosen to collapse, hide and/or remove certain components so that banner input is simpler.
In our second phase, we’ll be looking for volunteers to help test the new geo-location functionality. We’re also working on a better testing infrastructure for our CentralNotice banners.
- Program manager: Tomasz
Analytics Revamp – Incorporate an analytics solution that can grow and answer the questions that the Wikimedia movement has.
- Status: We are evaluating several possible analytics frameworks such as Open Web Analytics as a supplement or even replacement for our homegrown system(s), based on therecommendations from the Strategy task force. We plan to make a decision soon about the system we will use for (at least) this year’s fundraiser, but with an eye toward deploying a more generally useful system.
- Program managers: RobLa & Tomasz
Software Quality Infrastructure
Selenium deployment – Building an automated browser testing environment for MediaWiki.
- Status: Markus Glaser wrote the original set of tests using the Selenium web application testing framework. Ryan Lane has set up a cluster of machines dedicated to Selenium testing. Priyanka Dhanda and Ryan have been working to refine the requirements and generally get us to the point where we can build out a large suite of automated tests for MediaWiki, and Mark Hershberger is starting to figure out how to drive automated runs of Selenium and PHPUnit tests using CruiseControl. A small group has started to meet regularly to plan a more coordinated push in this area. Ping any one of us on IRC or the mailing list if you’re interested in chipping in!
- Program manager: RobLa
Fundraising
Fraud Prevention – This project will focus on integrating new fraud prevention schemes within our credit card donation pipeline.
- Status: We’ve wrapped up our pilot phase of this project, developing in-house solutions along with adopting industry standard practices to safeguard our donors, payment processors, and local systems. We’ve now incorporated many improvements to our credit processing pipeline.
- Program manager: Tomasz
CiviCRM Upgrade – Upgrading from our heavily customized CiviCRMv2 install to a mostly stock CiviCRMv3 install
- Status: Upgrade complete!
- Program manager: Tomasz
Misc
Google Summer of Code – Several projects from students funded by Google.
- Projects:
- Extension management platform (Student: Jeroen De Dauw, Mentor: Brion Vibber)
- Improve metadata support (Student: Brian Wolff, Mentor: Chad Horohoe)
- General RDF export/import in Semantic MediaWiki (Student: Samuel Lampa, Mentor: Denny Vrandecic)
- Javascript overhaul of Semantic MediaWiki (Student: Sanyam Goyal, Mentor: Yaron Koren)
- Wikisource Legal Tool (Student: Stephen LaPorte, Mentor: Ariel Glenn)
- Reasonably efficient interwiki template transclusion (Student: Peter Potrowl, Mentor: Roan Kattouw)
- Status: This has turned out to be a very successful year for us. Though not all projects were finished completely as specified, all were completed to a sufficient degree that we felt very comfortable passing all of the students. While there’s no guarantee that everything here will get beyond the proof-of-concept stage (though at least a couple already are), there’s a lot of promising work to look forward to.
- Program manager: RobLa
Process improvement – Increase transparency and generally organize Wikimedia Foundation’s engineering efforts more efficiently
- Status: We’re currently figuring out the general practices that don’t involve new tools (such as this blog post, and the wiki pages), as well as figuring out what tools will help us work best together with each other and the larger community. We’re also working to figure out what ways our existing tools (such as Bugzilla) can be configured to make it clear the order that we plan to tackle tasks clear and obvious to anyone who wants to find out.
- Program manager: RobLa
If you read this far, thanks for sticking with us! We hope you found this useful. Please let us know what we can do to make this more useful for you.
September 01, 2010 06:35 PM
These saddening photos — taken in 2010 — show oil contamination in beach sediments around Prince William Sound, left over from the 1989 Exxon Valdez oil spill, over 20 years ago. Read more at Prince William Soundkeeper.
September 01, 2010 06:17 PM
For Immediate Release:
September 1, 2010
The Federal Communications Commission issued a public notice, putting out for public comment two elements in the policy suggestion from Verizon and Google. The following statement is attributed to Gigi B. Sohn, president and co-founder of Public Knowledge:
“Nothing in this public notice prevents the FCC from taking prompt action on its ‘Third Way’ proceeding, which would make certain all Americans have affordable access to broadband, and to make sure it can deal with public safety and other crucial issues that are broader than the narrow issues on which the Commission seeks comment.
“We expect the Commission will move quickly to set the legal framework for the FCC to oversee broadband Internet access services, with specific rules to protect the open Internet to follow soon after.
read more
September 01, 2010 05:58 PM
video platform video management video solutions video player
Preview of our interview with Susan Crawford
Download link: [OGG] [MP4]
This week, a Wall Street Journal story on the proposed Comcast/NBCU merger brought concerns about media consolidation back to the fore. The U.S. Department of Justice is reportedly studying how the merger would affect the emerging internet video market.
Critics of the merger—including former Obama adviser and law professor Susan Crawford, a keynote speaker at this year’s Open Video Conference—say that the merger would hurt competition in the online video space.
Combined with anxieties about a shifting landscape for net neutrality, many are convinced that big changes are in store for the Internet as we know it—and by extension, the development of a rich online video medium that encourages user participation, creativity, and innovation.
We sat down with Ms. Crawford this week to hear her thoughts on the proposed merger, the FCC’s role in protecting net neutrality, and much more.
We’ll be releasing the 20-minute interview in three parts starting this week. It really captures the urgency that many are feeling about this critical time for the internet—a sense that we’re deciding new rules for the network and the web, and writing the the next few years of media history.
If you are passionate about the future of the open web and open video, we invite you to join us this October 1 & 2 at the Open Video Conference in New York City. Please register today.
September 01, 2010 02:00 PM
Posted by Dennis Abts, Michael R. Marty, Philip M. Wells, Peter Klausler, and Hong Liu
This is part of the series highlighting some notable publications by Googlers.
At Google, we operate large datacenters containing clusters of servers, networking switches, and more. While this gear costs a lot of money, an increasingly important cost -- both in terms of dollars and environmental impact -- is the electricity that drives the computing clusters and the cooling infrastructure. Since our clusters often do not run at full utilization, Google recently put forth a call to industry and researchers to develop energy proportional computer systems. With such systems, the power consumed by our clusters would be directly proportional to utilization. Servers consume the most electricity, and therefore researchers have responded to Google’s call by focusing their attention towards servers. As the servers become increasingly energy proportional, however, the “always on” network fabric that connects servers together will consume an increasing fraction of datacenter power unless it too becomes energy proportional.
In a paper recently published at the International Symposium on Computer Architecture (ISCA), we push further towards the goal of energy-proportional computing by focusing on the energy usage of high-bandwidth, highly-scalable cluster networking fabrics. This research considers a broad set of architectural and technological solutions to optimize energy usage without sacrificing performance. First, we show how the Flattened Butterfly network topology uses less power since it uses less switching chips and fewer links than a comparable-performance network built using the more conventional Fat Tree topology. Second, our approach takes advantage of the observation that when network demand is low, we can reduce the speed at which links transmit data. We show via simulation, that by tuning the speeds of the links very rapidly, we can reduce power consumption with little impact on performance. Finally, our research is a further call to action for the academic and industry research communities to make energy efficiency, and energy proportionality in particular, a first-class citizen in networking research. Put together, our proposed techniques can reduce energy cost for typical Google workloads seen in our production datacenters by millions of dollars!

September 01, 2010 01:41 PM
This message came across the mailing list today and so I’m sharing it with all of you who might not be on our list:
———————-
Please forward this to lists or people who will be interested.
KohaCon10 starts on October 25th in Wellington, New Zealand. We have an exciting line up of speakers on a range of topics related to Koha and Open Source and Open Standards in libraries. See our programme for details.
http://www.kohacon10.org.nz/2010/program/
KohaCon10 is a free conference (that is right it will cost nothing for you to attend), but you still need to register to reserve your place.
Registrations from the international Koha community have been very strong. Over half of all available spaces are already taken.
If you have been holding off on the premise that you will have plenty of time to do this later, then please register now. Please do not rely on there being free spaces on the day.
Registration is quick and easy via the website. http://www.kohacon10.org.nz/2010/registration/
We look forward to seeing you in Wellington,
Russel Garlick
on behalf of the KohaCon10 Organising Committee
—
What is KohaCon10?
KohaCon is an opportunity for the entire Koha community, librarians and developers alike, to come together, meet each other, swap ideas and learn something new.
The conference is split into 2 parts.
The community conference will be held over 3 days – 25-27th of October. This is not just a developer’s conference. There will be presentations from librarians and developers alike.
The second part of the conference is the Hackfest for Koha developers that will be held from 29th-31st of October.
For more information see our website http://www.kohacon10.org.nz.
September 01, 2010 12:16 PM
The following guest post is from David Bollier, independent policy strategist, journalist, and author of Viral Spiral. It was originally posted at the On the Commons blog.
Science has always recognized the power of sharing in developing new knowledge. But in the search for treatments and cures for diseases like Alzheimer’s and Parkinson’s, the sprawling [...]
Related posts:
- The Medical Innovation Convention: A New Global Framework for Healthcare Research and Development
- Articles in CTWatch Quarterly
- On Getting Raw Data for Cancer Research
September 01, 2010 10:11 AM
Publishers make article text available under a variety of copyright terms. Data, however, are not copyrightable. So what are we allowed to do with them, these datums and datasets within and beside article text? It isn’t clear. Few publisher sites say. It matters. So let’s ask. On behalf of the Open Knowledge Foundation and benefitting [...]
September 01, 2010 05:13 AM
Bringing Ontology Development and Maintenance to the Mainstream
Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Ontologies provide a similar role for the organization of data that is provided by relational data schema. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data [1].
There are many ways to categorize ontologies. One dimension is between upper level and mid- and lower- (or domain-) level. Another is between reference or subject (domain) ontologies. Upper-level ontologies [2] tend to be encompassing, abstract and inclusive ways to split or organize all “things”. Reference ontologies tend to be cross-cutting such as ones that describe people and their interests (e.g., FOAF), reference subject concepts (e.g., UMBEL), bibliographies and citations (e.g., BIBO), projects (e.g., DOAP), simple knowledge structures (e.g., SKOS), social networks and activities (e.g., SIOC), and so forth.
The focus here is on domain ontologies, which are descriptions of particular subject or domain areas. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies.
According to Corcho et al. [3] “a domain ontology can be extracted from special purpose encyclopedias, dictionaries, nomenclatures, taxonomies, handbooks, scientific special languages (say, chemical formulas), specialized KBs, and from experts.” Another way of stating this is to say that a domain ontology — properly constructed — should also be a faithful representation of the language and relationships for those who interact with that domain. The form of the interaction can range from work to play to intellectual understanding or knowledge.
“
… ontology engineering research should strive for a unified, lightweight and component-based methodological framework, principally targeted at domain experts ….”
Simperl
et al. [4]
Another focus here is on lightweight ontologies. These are typically defined as more hierarchical or classificatory in nature. Like their better-known cousins of taxonomies, but with greater connectedness, lightweight ontologies are often designed to represent subsumption or other relationships between concepts. They have not too many or not too complicated predicates (relationships). As relationships are added and the complexities of the world get further captured, ontologies migrate from the lightweight to the “heavyweight” end of the spectrum.
The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. For reasons as stated below, we prefer not to use the term ontology engineering, since it tends to convey a priesthood or specialized expertise in order to define or use them. As indicated, we see ontologies as being (largely) developed and maintained by the users or practitioners within a given domain. The tools and methodologies to be employed need to be geared to these same democratic (small “d”) objectives.
A Review of Prior Methodologies
For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have diminished somewhat in recent years. Yet the research as separately discussed in Ontology Development Methodologies [1] seems to indicate this state of methodology development in the field:
- Very few uniquely different methods exist, and those that do are relatively older in nature
- The methods tend to either cluster into incremental, iterative ones or those more oriented to comprehensive approaches
- There is a general logical sharing of steps across most methodologies from assessment to deployment and testing and refinement
- Actual specifics and flowcharts are quite limited; with the exception of the UML-based systems, most appear not to meet enterprise standards
- The supporting toolsets are not discussed much, and most of the examples if at all are based solely on a single or governing tool. Tool integration and interoperability is almost non-existent in terms of the narratives, and
- Development methodologies do not appear to be an active area of recent research.
While there is by no means unanimity in this community, some general consenses can be seen from these prior reviews, especially those that concentrate on practical or enterprise ontologies. In terms of design objectives, this general consensus suggests that ontologies should be [4]:
- Collaborative
- Lightweight
- Domain-oriented (subject matter and expertise)
- Integrated, and
- Incremental.
While laudable, and which represent design objectives to which we adhere, current ontology development methods do not meet these criteria. Furthermore, to be discussed in our next installment, there is also an inadequate slate of tools ready to support these objectives.
A Call for a New Methodology
If you ask most knowledgeable enterprise IT executives what they understand ontologies to mean and how they are to be built, you would likely hear that ontologies are expensive, complicated and difficult to build. Reactions such as these (and not trying to set up strawmen) are a reflection of both the lack of methods to achieve the consensual objectives above and the lack of tools to do so.
The use of ontology design patterns is one helpful approach [5]. Such patterns help indicate best design practice for particular use cases and relationship patterns. However, while such patterns should be part of a general methodology, they do not themselves constitute a methodology.
Also, as Structured Dynamics has argued for some time, the future of the semantic enterprise resides in ontology-driven apps [6]. Yet, for that vision to be realized, clearly both methods and tools to build ontologies must improve. In part this series is a reflection of our commitment to plug these gaps.
What we see at present for ontology development is a highly technical, overly engineered environment. Methodologies are only sparsely or generally documented. They are not lightweight nor collaborative nor really incremental. While many tools exist, they do not interoperate and are pitched mostly at the professional ontologist, not the domain user. In order to achieve the vision of ontology-driven apps the methods to develop the fulcrum of that vision — namely, the ontologies themselves — need much additional attention. An adaptive methodology for ontology development is well past due.
Design Criteria for an Adaptive Methodology
We can thus combine the results of prior surveys and recommendations with our own unique approach to adaptive ontologies in order to derive design criteria. We believe this adaptive approach should be:
- Lightweight and domain-oriented
- Contextual
- Coherent
- Incremental
- Re-use structure
- Separate the ABox and TBox (separate work), and
- Simpler, with interoperable tools designs.
We discuss each of these design criteria below.
While we agree with the advisability of collaboration as a design condition — and therefore also believe that tools to support this methodology must also accommodate group involvement — collaboration per se is not a design requirement. It is an implementation best practice.
Effective ontology development is as much as anything a matter of mindset. This mindset is grounded in leveraging what already exists, “paying as one benefits” through an incremental approach, and starting simple and adding complexity as understanding and experience are gained. Inherently this approach requires domain users to be the driving force in ongoing development with appropriate tools to support that emphasis. Ontologists and ontology engineering are important backstops, but not in the lead design or development roles. The net result of this mindset is to develop pragmatic ontologies that are understood — and used by — actual domain practitioners.
Lightweight and Domain-oriented
By definition the methodology should be lightweight and oriented to particular domains. Ontologies built for the pragmatic purposes of setting context and aiding interoperability tend to be lightweight with only a few predicates, such as isAbout, narrowerThan or broaderThan. But, if done properly, these lighter weight ontologies can be surprisingly powerful in discovering connections and relationships. Moreover, they are a logical and doable intermediate step on the path to more demanding semantic analysis.
Contextual
Context simply means there is a reference structure for guiding the assignment of what content ‘is about’ [7]. An ontology with proper context has a balanced and complete scope of the domain at hand. It generally uses fairly simple predicates; Structured Dynamics tends to use the UMBEL vocabulary for its predicates and class definitions, and to link to existing UMBEL concepts to help ensure interoperability [8]. A good gauge for whether the context is adequate is whether there are sufficient concept definitions to disambiguate common concepts in the domain.
Coherent
The essence of coherence is that it is a state of consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. With relation to a content graph, this means that the right connections (edges or predicates) have been drawn between the object nodes (or content) in the graph [9].
Relating content coherently itself demands a coherent framework. At the upper reference layer this begins with UMBEL, which itself is an extraction from the vetted and coherent Cyc common sense knowledge base. However, as domain specifics get added, these details, too, must be testable against a unified framework. Logic and coherence testing are thus an essential part of the ontology development methodology.
Incremental
Much value can be realized by starting small, being simple, and emphasizing the pragmatic. It is OK to make those connections that are doable and defensible today, while delaying until later the full scope of semantic complexities associated with complete data alignment.
An open world approach [10] provides the logical basis for incremental growth and adoption of ontologies. This is also in keeping with the continuous and incremental deployment model that Structured Dynamics has adopted from MIKE2.0 [11]. When this model is applied to the process of ontology development, the basic implementation increments appear as follows:
Figure 1. A Phased, Incremental Approach to Ontology Development (click to expand)
The first two phases are devoted to scoping and prototyping. Then, the remaining phases of creating a working ontology, testing it, maintaining it, and then revising and extending it are repeated over multiple increments. In this manner the deployment proceeds incrementally and only as learning occurs. Importantly, too, this approach also means that complexity, sophistication and scope only grows consistent with demonstrable benefits.
Re-use of Structure
Fundamental to the whole concept of coherence is the fact that domain experts and practitioners have been looking at the questions of relationships, structure, language and meaning for decades. Though perhaps today we now finally have a broad useful data and logic model in RDF, the fact remains that massive time and effort has already been expended to codify some of these understandings in various ways and at various levels of completeness and scope.
These are prior investments in structure that would be silly to ignore. Yet, today, most methodologies do ignore these resources. This ignorance of prior investments in information relationships is perplexing. Though unquestioned adoption of legacy structure is inappropriate to modern interoperable systems, that fact is no excuse for re-inventing prior effort and discoveries, many of which are the result of laborious consensus building or negotiations.
The most productive methodologies for modern ontology building are therefore those that re-use and reconcile prior investments in structural knowledge, not ignore them. These existing assets take the form of already proven external ontologies and internal and industry structures and vocabularies.
Separation of the ABox and TBox
Nearly a year ago we undertook a major series on description logics [12], a key underpinning to Structured Dynamics’ conceptual and logic foundation to its ontology development. While we can not always adhere to strict and conforming description logics designs, our four-part series helped provide guidance for the separation of concerns and work that can also lead to more effective ontology designs [13].
Conscious separation of the so-called ABox (assertions or instance records) and TBox (conceptual structure) in ontology design provides some compelling benefits:
- Easier ingest and incorporation of external instance data, including conversion from multiple formats and serializations
- Faster and more efficient inferencing and analysis and use of the conceptual structure (TBox)
- Easier federation and incorporation of distributed data stores (instance records), and
- Better segregation of specialized work to the ABox, TBox and specialty work modules, as this figure shows [14]:
Figure 2. Separation of the TBox and ABox [14]
Maintaining identity relations and disambiguation as separate components also has the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available. A low-fidelity service, for example, could be applied for quick or free uses, with more rigorous methods reserved for paid or batch mode analysis. Similarly, maintaining full-text search as a separate component means that work can be done by optimized search engines with built-in faceting.
Simple, Interoperable Tools Support
An essential design criteria is to have a methodology and work flow that explicitly accounts for simple and interoperable tools. By “simple” we mean targeted, task-specific tools and functionality that is also geared to domain users and practitioners.
Of all design areas, this one is perhaps the weakest in terms of current offerings. The next installment in this series [1] will address this topic directly.
The New Methodology
Armed with these criteria, we are now ready to present the new methodology. In summary terms, we can describe the steps in the methodology as:
- Scope, analyze, then leverage existing assets
- Prototype structure
- Pivot on the working ontology
- Test
- Use and maintain
- Extend working ontology and repeat.
Two Parallel Tracks
After the scoping and analysis phase, the effort is split into two tracks:
- Instances, and their descriptive characteristics, and
- Conceptual relationships, or ontologies.
This split conforms to the separation of ABox and TBox noted above [15]. There are conceptual and workflow parallels between entities and data v. ontologies. However, the specific methodologies differ, and we only focus on the conceptual ontology side in the discussion below, shown as the upper part (blue) of Figure 3:
Figure 3. Flowchart of Ontology Development Methodology [16] (click to expand)
Two key aspects of the initial effort are to properly scope the size and purpose of the starting prototype and to inventory the existing assets (structure and data; internal and external) available to the project.
Re-Use Structure
Most current ontology methodologies do not emphasize re-use of existing structure. Yet these resources are rich in content and meaning, and often represent years to decades of effort and expenditure in creation, assembly and consensus. Just a short list of these potential sources demonstrates the treasure trove of structure and vocabularies available for re-use: Web portals; databases; legacy schema; metadata; taxonomies; controlled vocabularies; ontologies; master data catalogs; industry standards; exchange formats, etc.
Metadata and available structure may have value no matter where or how it exists, and a fundamental aspect of the build methodology is to bring such candidate structure into a common tools environment for inspection and testing. Besides assembling and reviewing existing sources, those selected for re-use must be migrated and converted to proper ontological form (OWL in the case of those developed by Structured Dynamics). Some of these techniques have been demonstrated for prior patterns and schema [17]; in other instances various converters, RDFizers or scripts may need to be employed to effect the migration.
Many tools and options exist at this stage, even though as a formal step this conversion is often neglected.
Prototype Structure
The prototype structure is the first operating instance of the ontology. The creation of this initial structure follows quite closely the approach recommended in Ontology Development 101 [18], with some modifications to reflect current terminology:
- Determine the domain and scope of the ontology
- Consider reusing existing ontologies
- Enumerate important terms in the ontology
- Define the classes and the class hierarchy
- Define the properties of classes
- Create instances
The prototype structure is important since it communicates to the project sponsors the scope and basic operation of the starting structure. This stage often represents a decision point for proceeding; it may also trigger the next budgeting phase.
Link Reference Ontologies
An essential aspect of a build methodology is to re-use “standard” ontologies as much as possible. Core ontologies are Dublin Core, DC Terms, Event, FOAF, GeoNames, SKOS, Timeline, and UMBEL. These core ontologies have been chosen because of universality, quality, community support and other factors [19]. Though less universal, there are also a number of secondary ontologies, namely BIBO, DOAP, and SIOC that may fit within the current scope.
These are then supplemented with quality domain-specific ontologies, if such exist. Only then are new name spaces assigned for any newly generated ontology(ies).
Working Ontology
The working ontology is the first production-grade (deployable) version of the ontology. It conforms to all of the ontology building best practices and needs to be complete enough such that it can be loaded and managed in a fully conforming ontology editor or IDE [20].
By also using the OWL API, this working structure can also be the source for specialty tools and user maintenance functions, short of requiring a full-blown OWL editor. Many of these aspects are some of the poorest represented in the current tools inventory; we return to this topic in the next installment.
The working ontology is the complete, canonical form of the domain ontology(ies) [21]. These are the central structures that are the focus for ongoing maintenance and extension efforts over the ensuing phases. As such, the ontologies need to be managed by a version control system with comprehensive ontology and vocabulary management support and tools.
Testing and Mapping
As new ontologies are generated, they should be tested for coherence against various reasoning, inference and other natural language processing tools. Gap testing is also used to discover key holes or missing links within the resulting ontology graph structure. Coherence testing may result in discovering missing or incorrect axioms. Gap testing helps identify internal graph nodes needed to establish the integrity or connectivity of the concept graph.
Though used for different purposes, mapping and alignment tools may also work to identify logical and other inconsistencies in definitions or labels within the graph structure. Mapping and alignment is also important in its own right in order to establish the links that help promote ontology and information interoperability.
External knowledge bases can also play essential roles in testing and mapping. Two prominent knowledge base examples are Cyc and Wikipedia, but many additional exist for any specific domain.
Use and Maintenance
Of course, the whole purpose of the development methodology is to create practical, working ontologies. Such uses include search, discovery, information federation, data interoperability, analysis and reasoning, The general purposes to which ontologies may be put are described in the Executive Intro to Ontologies [22].
However, it is also in day-to-day use of the ontology that many enhancements and improvements may be discovered. Examples include improved definitions of concepts; expansions of synonyms, aliases and jargon for concepts; better, more intuitive preferred labels; better means to disambiguate between competing meanings; missing connections or excessive connections; and splitting or consolidating of the underlying structure.
Today, such maintenance enhancements are most often not pursued because existing tools do not support such actions. Reliance on IDEs and tools geared to ontology engineering are not well suited to users and practitioners being able to note or effect such changes. Yet ongoing ontology use and adaptation clearly suggest that users should be encouraged to do so. They are the ones in the front lines of identifying and potentially recording such improvements.
Extend
Ontology development is a process, not a static destination or event. This observation makes intuitive sense since we understand ontologies to be a means to capture our understanding of our domains, which is itself constantly changing due to new observations and insights. This factor alone suggests that ontology development methodologies must therefore give explicit attention to extension.
But there is another reason for this attention. Incremental, adaptive ontologies are also explicitly designed to expand their scope and coverage, bite by bite as benefits prove themselves and justify that expansion. A start small and expand strategy is of course lower risk and more affordable. But, for it to be effective, it also must be designed explicitly for extension and expansion. Ontology growth thus occurs both from learning and discovery and from expanding scope.
Versioning, version control and documentation (see below) thus assume more central importance than a more static view would suggest. The use of feedbacks and the continuous improvement design based on MIKE2.0 are therefore also central tenets of our ontology development methodology.
Documentation
This perspective of the ontology as a way to capture the structure and relationships of a domain — which is also constantly changing and growing — carries over to the need to document the institutional memory and use of it. Both better tools — such as vocabulary management and versioning — and better work processes need to be instituted to properly capture and record use and applications of ontologies.
Some of these aspects are now handled with utilities such as OWLdoc or the TechWiki that Structured Dynamics has innovated to capture ontology knowledge bases on an ongoing basis. But these are still rudimentary steps that need to be enforced with management commitment and oversight.
One need merely begin to probe the ontology development literature to observe how sparse the pickings are. Very little information on methodologies, best practices, use cases, recipes, how to manuals, conversion and use steps and other documentation really exists at present. It is unfortunately the case that documentation even lags the inadequate state of tools development in the ontology space.
Content Processing
Once formalized, these constructs — the structured ontologies or the named entity dictionaries as shown in Figure 3 — are then used for processing input content. That processing can range from conversion to direct information extraction. Once extracted, the structure may be injected (via RDFa or other means) back into raw Web pages. The concepts and entities that occur within these structures help inform various tagging systems [23]. The information can also be converted and exported in various forms for direct use or for incorporation in third-party systems.
Visualization systems and specialized widgets (see next) can be driven by the structure and results sets obtained from querying the ontology structure and retrieving its related instance data. While these purposes are somewhat beyond the direct needs of the ontology development methodology, the ontology structures themselves must be designed to support these functions.
Semantic Component Ontology
In our methodology we also provide for administrative ontologies whose purpose is to relate structural understandings of the underlying data and data types with applicable end-use and visualization tools (”widgets”). Thus the structural knowledge of the domain gets combined with an understanding of data types and what kinds of visualization or presentation widgets might be invoked. The phrase ontology-driven apps results from this design.
Amongst other utility ontologies, Structured Dynamics names its major tool-driver ontology the SCO (Semantic Component Ontology). The SCO works in intimate tandem with the domain ontologies, but is constructed and designed with quite different purposes. A description of the build methodology for the SCO (or its other complementary utility ontologies) is beyond the scope of this current document.
Tooling and Best Practices
As sprinkled throughout the above commentary, this methodology is also intimately related to tools and best practices. The next chapter in this series is devoted to and will be archived on the TechWiki as the lightweight domain ontology methodology. Best practices will be handled in a similar way for the chapter after that one and in its ontology best practices document on the TechWiki.
Time for a Leap Forward in Methodology
Earlier reviews and the information in this document suggest a real need for ontology building methodologies that are integrated, easier to use, interoperate with a richer tools set and are geared to practitioners versus priests. The good news is that there are architectures and building blocks to achieve this vision. The bad news is that the first steps on this path are only now beginning.
The next two installments in this series add further detail for why it is time — and how — we can make a leap forward in methodology. Those critical remaining pieces are in tools and best practices.
[1] This posting is part of a current series on ontology development and tools. The series began with an
update of my prior Ontology Tools listing, which now contains 185 tools. It continued with a
survey of ontology development methodologies. The next part in this series will address a new architecture for tooling development. The last installment in the series is planned to cover ontology best practices. This same posting is permanently archived and updated on the
OpenStructs TechWiki as
Lightweight, Domain Ontologies Development Methodology.
[2] Examples of upper-level ontologies include the Suggested Upper Merged Ontology (
SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (
DOLCE),
PROTON,
Cyc and
BFO (Basic Formal Ontology). Most of the content in their upper-levels is akin to broad, abstract relations or concepts (similar to the primary classes, for example, in a
Roget’s Thesaurus — that is, real ontos stuff) than to “generic common knowledge.” Most all of them have both a hierarchical and networked structure, though their actual subject structure relating to concrete things is generally pretty weak. For a more detailed treatment of ontology classifications, see M. K. Bergman, 2007. “
An Intrepid Guide to Ontologies,”
AI3:::Adaptive Information blog, May 16, 2007.
[3] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in
Data & Knowledge Engineering 46, 2003. See
http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf.
[4] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. “Ontology Engineering: A Reality Check,” in
Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE 2006, 2006. See
http://ontocom.ag-nbi.de/docs/odbase2006.pdf.
[5] OntologyDesignPatterns.org is a semantic Web portal dedicated to ontology design patterns (ODPs). The portal was started under the
NeOn project, which still partly supports its development.
[6] See M.K. Bergman, 2009. “
Ontology-driven Applications Using Adaptive Ontologies,”
AI3:::Adaptive Information blog, November 23, 2009.
[7] See M.K. Bergman, 2008. “
The Semantics of Context,”
AI3:::Adaptive Information blog, May 6, 2008.
[8] UMBEL (
Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a
general vocabulary of classes and predicates designed for the creation of domain-specific ontologies.
[9] See M.K. Bergman, 2008. “
When is Content Coherent?,”
AI3:::Adaptive Information blog, July 25, 2008.
[10] See M.K. Bergman, 2009. “
The Open World Assumption: Elephant in the Room,”
AI3:::Adaptive Information blog, December 21, 2009.
[11] MIKE2.0 (
Method for Integrated Knowledge Environments) is an open source information development methodology championed by Bearing Point and Deloitte. Structured Dynamics has adopted the approach and has helped formulate MIKE2.0’s
semantic enterprise offering. For a general intro to the approach, see further M.K. Bergman, 2010. “
MIKE2.0: Open Source Information Development in the Enterprise,”
AI3:::Adaptive Information blog, February 23, 2010.
[12] This is our
working definition for description logics:
“Description logics and their semantics traditionally split
concepts and their relationships from the different treatment of
instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for
terminological knowledge, the basis for
T in
TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for
assertions, the basis for
A in
ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”
[13] See the four-part description logics series from M. K. Bergman, 2009. “
Making Linked Data Reasonable using Description Logics, Part 1,”
AI3:::Adaptive Information blog, Feb. 11, 2009; “
Making Linked Data Reasonable using Description Logics, Part 2,”
AI3:::Adaptive Information blog, Feb. 15, 2009; “
Making Linked Data Reasonable using Description Logics, Part 3,”
AI3:::Adaptive Information blog, Feb. 18, 2009; and “
Making Linked Data Reasonable using Description Logics, Part 4,”
AI3:::Adaptive Information blog, Feb. 23, 2009.
[14] See
Part 2 in [13].
[15] The
TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The
ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure.
[16] The original version, now slightly modified, was first published in M. K. Bergman, 2009. “
Ontology-driven Applications Using Adaptive Ontologies,”
AI3:::Adaptive Information blog, Nov. 23, 2009.
[17] As some examples, see for instance: SKOS: Mark van Assem, Veronique Malais, Alistair Miles and Guus Schreiber, 2006. “A Method to Convert Thesauri to SKOS,” in
The Semantic Web: Research and Applications (2006), pp. 95-109. See
http://www.cs.vu.nl/~mark/papers/Assem06b.pdf for paper, also
http://thesauri.cs.vu.nl/eswc06/ and
http://thesauri.cs.vu.nl/; taxonomies: Fausto Giunchiglia, Maurizio Marchese and Ilya Zaihrayeu, 2006. “Encoding Classifications into Lightweight Ontologies,” presented at
Proceedings of the 3rd European Semantic Web Conference (ESWC 2006), Budva. See
http://www.science.unitn.it/~marchese/pdf/encoding%20classifications%20into%20lightweight%20ontologies_JoDS8.pdf; metadata: Mikael Nilsson, 2007. See
http://mikaelnilsson.blogspot.com/2007/11/semanticizing-metadata-specifications.html; relational schema: see the W3C workgroup on
RDB2RDF; and, of course, there are many others.
[18] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University
Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.
[19] The various criteria that are considered in nominating an existing ontology to “core” status is that it should be general; highly used; universal; broad committee or community support; well done and documented; and easily understood.
[20] Example and comprehensive ontology editing toolkits or IDEs (integrated development environments) include
NeOn toolkit,
Protégé, and
TopBraid Composer. A complement to these larger toolkits is the
OWL API, which when used can also provide a canonical management framework for specific ontology tools and tasks. This topic is covered more in the next installment regarding the tools landscape.
[21] Good ontology design, especially for larger projects, does require a degree of modularity. An architecture of multiple ontologies often work together to isolate different work tasks so as to aid better ontology management. Ontology architecture and modularization is a separate topic in its own right.
[22] Originally published as M.K. Bergman, 2010. “
An Executive Intro to Ontologies,”
AI3:::Adaptive Information blog, August 9, 2010. This popular document has now been permanently archived on the the
OpenStructs TechWiki as
Intro to Ontologies.
[23] Another reason for the clear distinction between ABox and TBox is their use to aid one another in disambiguation. Structured Dynamics’
scones approach (
subject
concepts
or
named
entitie
s) is designed expressly for this purpose. It is also possible to integrate these approaches with third-party tools (
e.g.,
Calais, Expert System (
Cogito), etc.) to improve unstructured content characterization. Via this approach we now can assess concept matches in addition to entity matches. This means we can triangulate between the two assessments to aid disambiguation. Because of logical segmentation, we have increased the informational power of our concept graph.
September 01, 2010 05:10 AM
Today I created a tiny bit of OpenLayers code for the Eyesis display page. It is basically a demo what you can do by playing points of interest on a map. Displaying the panorama and a smaller map.

Currently I did not add a panorama player yet. But since it is only a matter of changing div’s that could be done easily. Personally I would like to go for a HTML5 kind of player, since for most browsers that would be the least resource intensive way of displaying. The code is available at http://eyesis.openstreetphoto.org/ there are some images there but only lowres from the initial stichting tryouts.
September 01, 2010 12:03 AM
August 31, 2010
The Linux Foundation today kicked off its two-day debut of LinuxCon Brazil. Attendees got a rare opportunity to see both Linus Torvalds and Andrew Morton on stage, together, and in person.
August 31, 2010 08:40 PM
One of the important things for developing a specification is providing an explicit license. This give third-party implementers the security to know that they're not walking into a patent or copyright minefield by implementing the specification. It's why the IETF and W3C require explicit descriptions of rights and licenses for all specs made by those bodies.
To make sure that implementers are aware that this spec is open to use and develop with, we've used the great Open Web Foundation Agreement 0.9 (OWFa) made available by the Open Web Foundation. It's an explicit copyright license and patent promise that's been carefully reviewed for use by open web specs like OStatus.
Not all of the technology that's collected in OStatus is currently under the OWFa, but some parts are: PubSubHubbub and Salmon were two of the specs specifically listed when the OWFa was introduced. Our discussions with other upstream spec developers on PoCo, Activity Streams and WebFinger suggest that they too will use the OWFa or something similar. So putting our application profile into the mix makes a lot of sense.
The new draft of the specification includes the license notification, and copies of the signed agreements will be made available on the OStatus site at http://ostatus.org/owfa/. (There's also one addition -- there was an identifier URI left out of draft 1 that we've now got added in!)
Thanks to the people at the OWF who made this great agreement. It's made it easy for use to give the right signal to the OStatus community.
August 31, 2010 08:04 PM
We'll have a one-day session about the governance of open source at Open World Forum to which everyone is invited. Open World Forum will take place in Paris on September 30th and October 1st. The governance session will be on the second day. The talks in the morning address topics related to the adoption of open source whereas the afternoon session focuses on governance issues and best practices.
August 31, 2010 06:19 PM
It seems to be a general goal in practical online learning algorithm development to have the updates be very very simply. Perceptron is probably the simplest, and involves just a few adds. Winnow takes a few multiplies. MIRA takes a bit more, but still nothing hugely complicated. Same with stochastic gradient descent algorithms for, eg., hinge loss.
I think this maybe used to make sense. I'm not sure that it makes sense any more. In particular, I would be happier with online algorithms that do more work per data point, but require only one pass over the data. There are really only two examples I know of: the StreamSVM work that my student Piyush did with me and Suresh, and the confidence-weighted work by Mark Dredze, Koby Crammer and Fernando Pereira (note that they maybe weren't trying to make a one-pass algorithm, but it does seem to work well in that setting).
Why do I feel this way?
Well, if you look even at standard classification tasks, you'll find that if you have a highly optimized, dual threaded implementation of stochastic gradient descent, then your bottleneck becomes I/O, not learning. This is what John Langford observed in his Vowpal Wabbit implementation. He has to do multiple passes. He deals with the I/O bottleneck by creating an I/O friendly, proprietary version of the input file during the first past, and then careening through it on subsequent passes.
In this case, basically what John is seeing is that I/O is too slow. Or, phrased differently, learning is too fast :). I never thought I'd say that, but I think it's true. Especially when you consider that just having two threads is a pretty low requirement these days, it would be nice to put 8 or 16 threads to good use.
But I think the problem is actually quite a bit more severe. You can tell this by realizing that the idealized world in which binary classifier algorithms usually get developed is, well, idealized. In particular, someone has already gone through the effort of computing all your features for you. Even running something simple like a tokenizer, stemmer and stop word remover over documents takes a non-negligible amount of time (to convince yourself: run it over Gigaword and see how long it takes!), easily much longer than a silly perceptron update.
So in the real world, you're probably going to be computing your features and learning on the fly. (Or at least that's what I always do.) In which case, if you have a few threads computing features and one thread learning, your learning thread is always going to be stalling, waiting for features.
One way to partially circumvent this is to do a variant of what John does: create a big scratch file as you go and write everything to this file on the first pass, so you can just read from it on subsequent passes. In fact, I believe this is what Ryan McDonald does in MSTParser (he can correct me in the comments if I'm wrong :P). I've never tried this myself because I am lazy. Plus, it adds unnecessary complexity to your code, requires you to chew up disk, and of course adds its own delays since you now have to be writing to disk (which gives you tons of seeks to go back to where you were reading from initially).
A similar problem crops up in structured problems. Since you usually have to run inference to get a gradient, you end up spending way more time on your inference than your gradients. (This is similar to the problems you run into when trying to parallelize the structured perceptron.)
Anyway, at the end of the day, I would probably be happier with an online algorithm that spent a little more energy per-example and required fewer passes; I hope someone will invent one for me!
August 31, 2010 07:09 PM

The Yes Men Fix The World, the second film from the culture-jamming activist duo, will be the marquee feature in the Shared Film Festival at the Open Video Conference. After the screening, we’ll sit down for a panel including The Yes Men and their defense counsel, EFF’s Corynne McSherry.
The Yes Men raise awareness about social issues by tactically intervening in the mass media. Posing as executives of giant corporations, they lie their way into big conferences and TV appearances to expose—with surreal humor—the dark underbelly of multinational business. “It takes some nerve, not to mention diabolical intelligence… to pull off [these] pranks,” the New York Times wrote in its review of the film.
The film chronicles, among other episodes, the time Yes Man Andy Bichlbaum appeared on BBC World as a faux Dow Chemical spokesman to apologize for the Bhopal chemical disaster. After tricking a BBC producer into granting an interview, Bichlbaum read a lengthy “official statement” on live broadcast, offering reparations for the 120,000 affected victims. By the time the hoax was uncovered, Dow’s market cap had taken a $2 billion dollar hit.
Because it is such a hot potato, The Yes Men have a hard time securing traditional distribution deals for the movie. Though it’s earned heaps of awards and critical accolades, it also chronicles costly and elaborate pranks against Haliburton, WTO, Dow Chemical, and others—giving most distributors heartburn for the potential liability risks.
As a result, The Yes Men decided to freely distribute the film using P2P systems like BitTorrent. They’ve reached a massive audience, cost-free, and have even received tens of thousands of dollars in donations from fans and supporters.
The P2P edition of the film features special scenes of The Yes Men’s prank at the National Press Club, which resulted in a lawsuit being filed against them by the U.S. Chamber of Commerce.
Don’t miss the Yes Men, the Shared Film Festival, and the rest of the activities at this year’s Open Video Conference. Register today, and join us October 1 & 2 in New York City!
August 31, 2010 03:25 PM
OSOR has published an update on the adoption of the European Union Public Licence (EUPL): "A third of the projects available on the European Commission's software development site, the OSOR Forge, 47 out of 147 projects, are published using the EUPL. On Sourceforge, a commercial venture for open source software development based in the US, the licence is now selected by 49 projects."
August 31, 2010 01:36 PM
The goal of fOSSa (Free Open Source Academia Conference) is to reaffirm the underlying values of open source software: innovation and research in software development.
The second edition will focus on specific aspect we feel are key in a renovation of FOSS: Development, innovation & research, Community management and promotion, Public sector, Education.
November 8-10 2010
Grenoble, France
Web site: http://fossa2010.inrialpes.fr/
August 31, 2010 11:27 AM
Datamation reports that Novell fell short of its guidance for the third fiscal quarter of 2010: "For the quarter, Novell reported revenue of $199 million, a decline of 8 percent from the third quarter of 2009. The company reported net income of $16 million, or $0.04 per share, dipping from the $17 million Novell posted in the third quarter of 2009."
August 31, 2010 09:03 AM
Lockheed Martin Corporation recently announced the release of its first open source software initiative around social media called Eureka Streams. Eureka Streams is a social media platform that integrates activity streams and OpenSocial apps. Lockheed Martin has spent the past several years growing a strategy of Social Software within the Enterprise to bring widely distributed employees together. Eureka Streams takes that vision further by incorporating what works well on the internet and builds a platform based on open standards to expand social media even further.
Eureka Streams, initially built internally, is now being made available
under the Apache License as open source. Shindig version 1.1 (beta)
integration provides the framework to offer the OpenSocial 0.9
features, creating a user focused OpenSocial gadget container that can
access the user profiles and activity data created within the tool.
The UI has been developed using Google Web Toolkit to provide a
flexible JavaScript front end developed in Java.
Eureka Streams is currently released to open source at version
0.9. The team has placed a heavy focus on user interaction,
performance, and scalability to this point, but is shifting their
focus to the developer and looking for support from the open source
community.
For more information and to learn how to get started, please visit the Eureka Streams web site.
Posted on behalf of Steve Terlecki, Lockheed Martin Corp, by Mark Weitzel, President, OpenSocial Foundation
August 31, 2010 06:52 AM
August 30, 2010
2010-08-30, Two new DCMI Task Groups have been formed: the DCMI User Guide Task Group that will work on a revision of the popular but outdated document "Using Dublin Core" and the DCMI Abstract Model Review Task Group that will prepare a review of the DCMI Abstract Model, both for discussion at DC-2010 in October 2010. Discussion will take place on the DC-Glossary and DC-Architecture mailing lists, respectively. Participation by interested members of the Dublin Core community is welcomed and encouraged; please contact Tom Baker for further information.
August 30, 2010 11:59 PM
2010-08-30, The slides from the Joint NISO/DCMI Webinar "Dublin Core: The Road from Metadata Formats to Linked Data" held on 25 August 2010 are now available at the Metadata Training Resources page.
August 30, 2010 11:59 PM
I understand the resolutions of screenshots are typically inadequate
for OCR, but besides rescaling to a higher resolution, say, 300 DPI,
what other preprocessing operations may be needed on the images to
yield optimal OCR results?
Thanks.
August 30, 2010 10:46 PM
Scary news from California's Contra Costa County — school officials there have reportedly decided to track some preschoolers with RFID chips, thanks to a federal grant supplying the funding.
According to a story from the Associated Press, the students will wear a jersey at school that has the RFID tag attached. The tag will track the children's movements and collect other data, like if the child has eaten or not. According to a Contra Costa County official, this is a cost-savings move, as teachers used to have to manually keep track of a child's attendance and meal schedule.
But of course, an RFID chip allows for far more than that minimal record-keeping. Instead, it provides the potential for nearly constant monitoring of a child's physical location. If readings are taken often enough, you could create an extraordinarily detailed portrait of a child's school day — one that's easy to imagine being misused, particularly as the chips substitute for direct adult monitoring and judgment. If RFID records show a child moving around a lot, could she be tagged as hyper-active? If he doesn't move around a lot, could he get a reputation for laziness? How long will this data and the conclusions rightly or wrongly drawn from it be stored in these children's school records? Can parents opt-out of this invasive tracking? How many other federal grants are underwriting programs like these?
These are questions that desperately need answers. California is in the middle of a terrible budget crunch, but the solution is not federally funded surveillance of children who are too young to understand the implications.
August 30, 2010 07:27 PM
September 14, 2010 - 8:00am - 10:00am
The Role of the Obama Administration’s IP Enforcement Program
For the first time, a presidential administration has prioritized enforcement of intellectual property rights by appointing a high-level administration official charged with coordinating policy and enforcement. Join a wide-ranging discussion on how the Obama Administration is approaching international and domestic controversies surrounding intellectual property.
Click here for more information.
August 30, 2010 07:02 PM
This past week marked this year’s conclusion of Google Summer of Code. This has turned out to be a very successful year for us and we hope for the students as well. Here are this year’s projects:
- Extension management platform - Creating an awesome extension management platform for MediaWiki, facilitating the installation, updating, removal and configuration of extensions. Student: Jeroen De Dauw, Mentor: Brion Vibber
- Improve metadata support - Improve metadata support for uploaded media in MediaWiki by displaying embedded IPTC and XMP metadata. Student: Brian Wolff, Mentor: Chad Horohoe
- General RDF export/import in Semantic MediaWiki - Extend the import/export functionality of Semantic MediaWiki (SMW) to allow also full, general RDF import. Student: Samuel Lampa, Mentor: Denny Vrandecic.
- Javascript overhaul of Semantic MediaWiki – Improve and extend the Javascript for Semantic MediaWiki and some of its spinoff extensions, most notably Semantic Forms. Student: Sanyam Goyal, Mentor: Yaron Koren
- Wikisource Legal Tool - Creating a tool to format judicial decisions, legal scholarship, and statutes for Wikisource. Student: Stephen LaPorte Mentor: Ariel Glenn
- Reasonably efficient interwiki template transclusion – allow MediaWiki users to insert (transclude) templates from a wiki to another on Wikimedia Foundation (WMF) wikis (Wikipedia, Wikimedia Commons, etc.). Student: Peter Potrowl, Mentor: Roan Kattouw
More detailed information on all of these projects can be found on our GSoC 2010 projects page. Also, Wikipedia Signpost is highlighting this work over the coming weeks, starting with a summary of Brian Wolff’s XMP metadata project.
Though not all projects were finished completely as specified, all were completed to a sufficient degree that we felt very comfortable passing all of the students, and all of the students produced code we’re very happy to have. Note that there is no guarantee that anything here will get beyond the proof-of-concept stage. However, we’re hopeful that much of this work will find broader adoption, and we’re looking forward to that.
We hope that all of the students stick around as MediaWiki contributors long after the summer is over. Please join us in thanking them for their participation this year!
August 30, 2010 06:30 PM
Last week I attended the Data-driven journalism in Amsterdam (which we blogged about here) run by the European Journalism (who interviewed me here).
My slides from the event are now up here:
Open Data and Data Driven JournalismView more presentations from jwyg.
Below are some lovely lofi graphical notes from Anna Lena Schiller:
It was [...]
Related posts:
- Data Driven Journalism, Amsterdam, 24th August 2010
- Data Journalism Meetup, Berlin, 1st September 2010
- Interview with European Journalism Centre on Data Driven Journalism
August 30, 2010 05:32 PM
A transcript is available here. You can download and listen to the audio by clicking here (MP3) or stream it using the player below:
Want to subscribe to our podcast? Click here for the MP3 feed and here for the mixed audio/video feed.
read more
August 30, 2010 04:39 PM
Welcome
8:30 am Breakfast
9:30 am Welcome and Opening Notes
WHY DO OPEN HARDWARE?
10:00 am: Limor Fried, Adafruit
10:30 am: Gerald Coley, Texas Instruments & Beagle Board
11:00 am: Bruce Perens, founder: OSI
11:30 am: John Wilbanks, Creative Commons
12:00 am: Institutional Sprint talks
• Amanda Mc Donald Crowley, EYEBEAM
• Jim Barkley & Sam Sayer, MITRE: “ARx: Almost-Ready-to-Anthing”
• Rich Gibson, NASA
LUNCH
12:30 – 1:30 pm: Lunch (will be provided)
read more
August 30, 2010 03:59 PM
February 13, 2011 (All day) - February 14, 2011 (All day)
The conference will begin with a tutorial overview of the evolution of the Internet, including recent disruptive developments. The first panel will put these developments in perspective by addressing such questions as: (1) what creates the necessary conditions for innovation in networked industries; (2) how those conditions can be cultivated; and (3) what conditions tend to smother rather than encourage innovation?
University of Colorado-Boulder
February 13, 2011 - February 14, 2011
Click here for more information.
August 30, 2010 03:55 PM

Elphel-Eyesis 1
On July 8, we have the first panoramic camera completely assembled and ready for the test ride. The total height is 1300 mm [4' 3"]; it weighs 10 kg or about 22 lbs . The power consumption is 36W when camera is in operation, measured at the AC (110/220VAC) input. Camera head has eight 5 Mpix Color sensors around and one pointing up, with the full resolution of ~38 MPix (45 MPix before stitching). The data storage box (also waterproof) – at the bottom of the leg contains 3 swappable 2.5″ hard drives 500 GB each, which is enough to record up to 12 hours of images taken at 5 fps (max frame rate) at full resolution. Each image is geotagged via external GPS unit attached through the sealed USB connector.
The 8 high-resolution lenses are arranged very compact (distance between entrance pupils is 29.5mm), which allows for very small parallax. The high-res Fish-eye lens is pointed to the sky.
Camera head is 210mm [8.3"] in diameter , is waterproof, contains 3 Elphel 10353 processor boards and 3 Elphel 10369 extension boards, which provide IDE, SATA, USB, RS232, and other interfaces (only SATA, USB and sync I/Os are used in Eyesis configuration). Nine sensor boards (10338D) are connected through the three 10359A multiplexer boards that provide temporary storage for the images – all 3 sensors attached to the same 10359A board are triggered simultaneously, but data is transferred to the system boards one at a time.
Camera data storage box also contains the power supply for camera and hard drives, Gigabit Ethernet switch and USB connector (IP68) for the GPS receiver. Dimensions of the box are 280mm x 120mm x170mm [11" x 4.7" x 6.7"].
Test ride images are coming soon.

total height is 1300mm (4'-3")
August 30, 2010 03:17 PM
When we published Open Doors and Open Minds, we promised a companion piece that discusses in detail some of the legal considerations that university administrators and university general counsels may wish to consider in adopting a public access policy. I’m happy to say that this is now available. This excellent companion piece, providing a thorough [...]
August 30, 2010 02:10 PM
The Open Video Conference is already chock full of panels, talks, and workshops—exploring open technology, the future of mass media, and everything in between. Today we’re pleased to announce that on both days of the Open Video Conference, the discussion around shared culture and peer-to-peer distribution will continue into the evening with the Shared Film Festival.
The Shared Film Festival at OVC is a showcase for the emerging world of free-to-share films. We’re teaming with our friends at BitTorrent, hand-picking notable films from creators who are experimenting with alternative business models and distribution methods.
Each night following OVC, we’ll screen a short film, a feature length production, and then sit down to a discussion with the filmmakers, learning about the stories behind the films, their production experiences and business strategies. Can you make a living by giving it away?
The marquee feature at the Shared Film Festival is definitely something you won’t want to miss. Check back tomorrow to get a peek at the feature lineup!
The Shared Film Festival is for both creators and audiences, and it’s free to all attendees of the Open Video Conference.
August 30, 2010 01:34 PM
There’s nothing quite like having an urgent issue to pursue with a company – a real thorn in your side – and lacking a name or phone number to contact for follow-up. (Once upon a time, I reserved a domain name, customerfeedbackplace.com, intending to aggregate all the world’s corporate customer feedback sites in one place for consumer convenience. But that’s a story for another day.)
August 30, 2010 01:00 PM
The Software Freedom Law Center (SFLC) will announce the opening of its new international organization in India at the upcoming Software Patents and the Commons conference in New Delhi: "Under the direction of founder Mishi Choudhary, the SFLC's India organziation will provide reliable advice to FLOSS developers about how to organize, license and protect the freedom of the software they make and distribute."
August 30, 2010 09:41 AM
Simon Phipps asks whether open source communities should avoid contributor agreements: "What are "contributor agreements", why do they exist, and are they a good thing? The need often arises from the interaction with open source of certain approaches to business. They serve a need of those approaches, but they can come at a significant cost to the health of the project."
August 30, 2010 09:37 AM
The Recent Pace of Ontology Development Appears to Have Waned
The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. This paper summarizes key papers and links to this topic [18].
For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have actually diminished somewhat in recent years.
The main thrust of the papers listed herein is on domain ontologies, which model particular domains or topic areas. (As opposed to reference, upper or theoretical ontologies, which are more general or encompassing.) Also, little commentary is offered on any of the individual methodologies; please see the referenced papers for more details.
General Surveys
One of the first comprehensive surveys was done by Jones et al. in 1998 [1]. This study began to elucidate common stages and noted there are typically separate stages to produce first an informal description of the ontology and then its formal embodiment in an ontology language. The existence of these two descriptions is an important characteristic of many ontologies, with the informal description often carrying through to the formal description.
The next major survey was done by Corcho et al. in 2003 [2]. This built on the earlier Jones survey and added more recent methods. The survey also characterized the methods by tools and tool readiness.
More recently the work of Simperl and her colleagues has focused on empirical results of ontology costing and related topics. This series has been the richest source of methodology insight in recent years [3, 4, 5, 6]. More on this work is described below.
Though not a survey of methods, one of the more attainable descriptions of ontology building is Noy and McGuinness’ well-known Ontology Development 101 [7]. Also really helpful are Alan Rector’s various lecture slides on ontology building [8].
However, one general observation is that the pace of new methodology development seems to have waned in the past five years or so. This does not appear to be the result of an accepted methodology having emerged.
Some Specific Methodologies
Some of the leading methodologies, presented in rough order from the oldest to newest, are as follows:
- Cyc – this oldest of knowledge bases and ontologies has been mapped to many separate ontologies. See the separate document on the Cyc mapping methodology for an overview of this approach [9]
- TOVE (Toronto Virtual Enterprise) – a first-order logic approach to representing activities, states, time, resources, and cost in an enterprise integration architecture [10]
- IDEF5 (Integrated Definition for Ontology Description Capture Method) – is part of a broader set of methodologies developed by Knowledge Based Systems, Inc. [11]
- ONIONS (ONtologic Integration Of Naive Sources) – a set of methods especially geared to integrating multiple information sources [12], with a particular emphasis on domain ontologies
- COINS (COntext INterchange System) – a long-running series of efforts from MIT’s Sloan School of Management [13]
- METHONTOLOGY – one of the better known ontology building methodologies; however, not many known uses [14]
- OTK (On-To-Knowledge) was a methodology that came from the major EU effort at the beginning of last decade; it is a common sense approach reflected in many ways in other methodologies [15]
- UPON (United Process for ONtologies) – is a UML-based approach that is based on use cases, and is incremental and iterative [16].
Please note that many individual projects also describe their specific methodologies; these are purposefully not included. In addition, Ensan and Du look at some specific ontology frameworks (e.g., PROMPT, OntoLearn, etc.) from a domain-specific perspective [17].
Some Flowcharts
Here is the general methodology as presented in the various Simperl et al. papers [c.f., Fig. 1 in 3]:

The Corcho et al. survey also presented a general view of the tools plus framework necessary for a complete ontology engineering environment [Fig. 4 from 2]:
There are more examples that show ontology development workflows. Here is one again from the Simperl et al. efforts [Fig. 2 in 5]:
However, what is most striking about the review of the literature is the paucity of methodology figures and the generality of those that do exist. From this basis, it is unclear what the degree of use is for real, actionable methods.
Best Practices Observations
The Simperl and Tempich paper [3], besides being a rich source of references, also provides some recommended best practices based on their comparative survey. These are:
General Recommendations
- Enforce dissemination, e.g.. publish more best practices
- Define selection criteria for methodologies
- Define a unified methodology following a method engineering approach
- Support decision for the appropriate formality level given a specific use case
Process Recommendations
- Define selection criteria for different knowledge acquisition (KA) techniques
- Introduce process description for the application of different KA techniques
- Improve documentation of existing ontologies
- Improve ontology location facilities
- Build robust translators between formalisms
- Build modular ontologies
- Define metrics for ontology evaluation
- Offer user oriented process descriptions for ontology evaluation
Organizational Recommendations
- Provide ontology engineering activity descriptions using domain-specific terminology
- Improve consensus making process support
Technological Recommendations
- Provide tools to extract ontologies from structured data sources
- Build lightweight ontology engineering environments
- Improve the quality of tools for domain analysis, ontology evaluation, documentation
- Include methodological support in ontology editors
- Build tools supporting collaborative ontology engineering.
Summary of Observations
This review has not set out to characterize specific methodologies, nor their strengths and weaknesses. Yet the research seems to indicate this state of methodology development in the field:
- Very few discrete methods exist, and those that do are relatively older in nature
- The methods tend to either cluster into incremental, iterative ones or those more oriented to more comprehensive approaches
- There is a general logical sharing of steps across most methodologies from assessment to deployment and testing and refinement
- Actual specifics and flowcharts are quite limited; with the exception of the UML-based systems, most appear not to meet enterprise standards
- The supporting toolsets are not discussed much, and most of the examples are based solely on a governing tool. Tool integration and interoperability is almost non-existent in terms of the narratives
- This does not appear to be a very active area of current research.
[1] D.M. Jones, T.J.M. Bench-Caponand, P.R.S. Visser, 1998.
“Methodologies for Ontology Development,” in
Proceedings of the IT and KNOWS Conference of the 15th FIP World Computer Congress, 1998. See
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.2437&rep=rep1&type=pdf.
[2] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in
Data & Knowledge Engineering 46, 2003. See
http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf.
[3] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. Ontology Engineering: A Reality Check, in
Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE2006, 2006. See
http://citeseerx.ist.psu.edu/icons/pdf.gif;jsessionid=DE3414C0282C76F0EA787A06039941D2.
[4] Elena Paslaru Bontas Simperl, Christoph Tempich, and York Sure, 2006. “ONTOCOM: A Cost Estimation Model for Ontology Engineering,” presented at
ISWC 2006; see
http://ontocom.ag-nbi.de/docs/iswc2006.pdf.
[5] Elena Simperl, Christoph Tempich and Denny Vrandečić, 2008. “A Methodology for Ontology Learning,” in
Frontiers in Artificial Intelligence and Applications 167 from the
Proceedings of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 225-249, 2008. See
http://wtlab.um.ac.ir/parameters/wtlab/filemanager/resources/Ontology%20Learning/ONTOLOGY%20LEARNING%20AND%20POPULATION%20BRIDGING% 20THE%20GAP%20BETWEEN%20TEXT%20AND%20KNOWLEDGE.pdf#page=241.
[6] Elena Simperl, Malgorzata Mochol and Tobias Burger, 2010. “Achieving Maturity: the State of Practice in Ontology Engineering in 2009,” in
International Journal of Computer Science and Applications, 7(1), pp. 45 – 65, 2010. See
http://www.tmrfindia.org/ijcsa/v7i13.pdf.
[7] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University
Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html.
[8] See
http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.pdf;
http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.ppt; or
http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Ontology-bulding-2005-Lect-5.ppt.
[9] Stephen L. Reed and Douglas B. Lenat, 2002. Mapping Ontologies into Cyc, paper presented at
AAAI 2002 Conference Workshop on Ontologies For The Semantic Web, Edmonton, Canada, July 2002. See
http://www.cyc.com/doc/white_papers/mapping-ontologies-into-cyc_v31.pdf . Also, as presented by Doug Foxvog, Ontology Mapping with Cyc, at
WMSO, June 14, 2004; see
www.wsmo.org/wsml/papers/presentations/Ontology%20Mapping%20at%20Cycorp.ppt. Also, see Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock, 2007. “Autonomous Classification of Knowledge into an Ontology,” in
The 20th International FLAIRS Conference (FLAIRS), Key West, Florida, May 2007. See
http://www.cyc.com/doc/white_papers/FLAIRS07-AutoClassificationIntoAnOntology.pdf.
[10] M. Gruninger and M.S. Fox, 1994. “The Design and Evaluation of Ontologies for Enterprise Engineering”,
Workshop on Implemented Ontologies, European Conference on Artificial Intelligence 1994, Amsterdam, NL. See
http://stl.mie.utoronto.ca/publications/gruninger-onto-ecai94.pdf.
[11] KBSI, 1994. “The IDEF5 Ontology Description Capture Method Overview”,
Knowledge Based Systems, Inc. (KBSI) Report, Texas. The report describes the stages of: 1) organizing and scoping; 2) data collection; 3) data analysis; 4) initial ontology development; and 5) ontology refinement and validation. See
http://en.wikipedia.org/wiki/IDEF5.
[12] A. Gangemi, G. Steve and F. Giacomelli, 1996. “ONIONS: An Ontological Methodology for Taxonomic Knowledge Integration”,
ECAI-96 Workshop on Ontological Engineering, Budapest, August 13th. See
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3972&rep=rep1&type=pdf.
[13] The COINS approach was developed by Madnick
et al. over the past two decades or so at the MIT Sloan School of Management. See further
http://web.mit.edu/smadnick/www/wp/CISL-Sloan%20WP%20spreadsheet.htm for a listing of papers from this program; some are use cases, and some are architecture-related. For the most detailed treatment, see Aykut Firat, 2003.
Information Integration Using Contextual Knowledge and Ontology Merging, Ph.D. Thesis for the Sloan School of Management, MIT, 151 pp. See
http://www.mit.edu/~bgrosof/paps/phd-thesis-aykut-firat.pdf.
[14] M. Fernandez, A. Gomez-Perez and N. Juristo, 1997. “METHONTOLOGY: From Ontological Art Towards Ontological Engineering”,
AAAI-97 Spring Symposium on Ontological Engineering, Stanford University, March 24-26th, 1997.
[15] York Sure, Christoph Tempich and Denny Vrandecic , 2006. “Ontology Engineering Methodologies,” in
Semantic Web Technologies: Trends and Research in Ontology-based Systems, pp. 171-187, Wiley. The general phases of the approach are: 1) feasibility study; 2) kickoff; 3) refinement; 4) evaluation; and 5) application and evolution.
[16] A. De Nicola, M. Missikoff, R. Navigli, 2009.
“A Software Engineering Approach to Ontology Building”.
Information Systems, 34(2), Elsevier, 2009, pp. 258-275.
[17] Faezeh Ensan and Weichang Du, 2007. Towards Domain-Centric Ontology Development and Maintenance Frameworks; see
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.8915&rep=rep1&type=pdf.
[18] This document is
permanently archived on the
OpenStructs TechWiki. This document is part of a current series on ontology development and tools to be completed over the coming weeks.
August 30, 2010 05:53 AM
August 29, 2010
It's 4 a.m., dark outside, the phone rings, your mobile goes off, you're in a convention hotel an ocean away from home in a different time zone. The server's fallen over, you need to bounce it remotely from a thousand miles away. You have to take the server down and bring it back up then restart the application. Good job the hotel has a connection and you have a signal.
August 29, 2010 08:32 PM
Hari Prasad, the Indian security researcher arrested for allegedly stealing an electronic voting machine, has been released on bail.
Earlier this year, an anonymous source gave the machine to Prasad and a team of researchers, who discovered critical security flaws. Under questioning by authorities last weekend, Prasad refused to divulge the identity of the source who gave them the machine. He was then arrested and reportedly charged with theft and trespass on the theory that he stole the machine himself.
According to the Indian news agency PTI, the magistrate who released Prasad on bail noted that "no offence was disclosed with Hari Prasad's arrest and even if it was assumed that [the electronic voting machine] was stolen it appears that there was no dishonest intention on his part...he was trying to show how [electronic voting] machines can be tampered with."
The court reportedly also asked the Election Commission of India to confirm or disprove Prasad's claim that the country's electronic voting machines can be compromised. If Prasad's claims are false, action could be taken against him, the magistrate said.
August 29, 2010 01:00 AM
August 28, 2010
When using the following code:
tesseract::TessBaseAPI tess;
int result = tesseract.Init(argv[0], lang);
Init will return -1, indication that something went wrong. I know the
tessdata is in the right location (if I move it I get an actual error
message), but I can't seem to figure out why Init() is not working
August 28, 2010 08:54 PM
Dear friends:
We are working on the development of a reference book on library automation and opac 2.0, entitled “Library Automation and OPAC 2.0: Information Access and Services in the 2.0 Landscape.” This book will be published by IGI Global in 2011. We believe that you may be interested in participating, so we encourage you to submit proposals about Koha developments and case studies in accordance with the requirements and the thematic areas set out in
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=10166©ownerid=12444
or
http://igi-global.com/AuthorsEditors/AuthorEditorResources/CallForBookChapters/CallForChapterDetails.aspx?CallForContentId=4ae6e1c4-904b-4d83-8a4a-9ece2171fccb
Thanks for your attention,
Jesús
August 28, 2010 08:47 PM
Hi All,
Currently I am trying to use Tesseract(2.04) to recognize my own data,
with Mac OS X Snow Leopard.
I find this [link]
and I am trying to follow this tutorial.
My questions are:
1. I already have my train.tif ready, but I am not sure where I should
August 28, 2010 06:45 AM
August 27, 2010

Brett Gaylor (left) and the WebMadeMovies community released their first public demo of "popcorn" last week -- and the reviews are pretty sweet.
read more
August 27, 2010 07:28 PM
how can i contribute to tesseract-ocr ? i wish to add my Bengali
language to the OCR? or does it already exist? If so then plz tell me
how to use that.
If you show me some way to train tesseract for Bengali then it would
be great
August 27, 2010 03:47 PM
NIPS decision are going out soon, and then we're done with submitting and reviewing for a blessed few months. Except for journals, of course.
If you're not interested in paper reviews, but are interested in sentiment analysis, please skip the first two paragraphs :).
One thing that anyone who has ever area chaired, or probably even ever reviewed, has noticed is that different people have different "baseline" ratings. Conferences try to adjust for this, for instance NIPS defines their 1-10 rating scale as something like "8 = Top 50% of papers accepted to NIPS" or something like that. Even so, some people are just harsher than others in scoring, and it seems like the area chair's job to calibrate for this. (For instance, I know I tend to be fairly harsh -- I probably only give one 5 (out of 5) for every ten papers I review, and I probably give two or three 1s in the same size batch. I have friends who never give a one -- except in the case of something just being wrong -- and often give 5s. Perhaps I should be nicer; I know CS tends to be harder on itself than other fiends.) As an aside, this is one reason why I'm generally in favor of fewer reviewers and more reviews per reviewer: it allows easier calibration.
There's also the issue of areas. Some areas simply seem to be harder to get papers into than others (which can lead to some gaming of the system). For instance, if I have a "new machine learning technique applied to parsing," do I want it reviewed by parsing people or machine learning people? How do you calibrate across areas, other than by some form of affirmative action for less-represented areas?
A similar phenomenon occurs in sentiment analysis, as was pointed out to me at ACL this year by Franz Och. The example he gives is very nice. If you go to TripAdvisor and look up The French Laundry, which is definitely one of the best restaurants in the U.S. (some people say the best), you'll see that it got 4.0/5.0 stars, and a 79% recommendation. On the other hand, if you look up In'N'Out Burger, a LA-based burger chain (which, having grown up in LA, was admittedly one of my favorite places to eat in high school, back when I ate stuff like that) you see another 4.0/5.0 stars and a 95% recommendation.
So now, we train a machine learning system to predict that the rating for The French Laundry is 79% and In'N'Out Burger is 95%. And we expect this to work?!
Probably the main issue here is calibrating for expectations. As a teacher, I've figured out quickly that managing student expectations is a big part of getting good teaching reviews. If you go to In'N'Out, and have expectations for a Big Mac, you'll be pleasantly surprised. If you go to The French Laundry with expectations of having a meal worth selling your soul, your children's souls, etc., for, then you'll probably be disappointed (though I can't really say: I've never been).
One way that a similar problem has been dealt with on Hotels.com is that they'll show you ratings for the hotel you're looking at, and statistics of ratings for other hotels within a 10 mile radius (or something). You could do something similar for restaurants, though distance probably isn't the right categorization: maybe price. For "$", In'N'Out is probably near the top, and for "$$$$" The French Laundry probably is.
(Anticipating comments, I don't think this is just an "aspect" issue. I don't care how bad your palate is, even just considering the "quality of food" aspect, Laundry has to trump In'N'Out by a large margin.)
I think the problem is that in all of these cases -- papers, restaurants, hotels -- and others (movies, books, etc.) there simply isn't a total order on the "quality" of the objects you're looking at. (For instance, as soon as a book becomes a best seller, or is advocated by Oprah, I am probably less likely to read it.) There is maybe a situation-depend order, and the distance to hotel, or "$" rating, or area classes are heuristics for describing this "situation." Bit without knowing the situation, or having a way to approximate it, I worry that we might be entering a garbage-in-garbage-out scenario here.
August 27, 2010 12:14 PM
Tom Callaway reports that all of the Sun RPC code (which is part of glibc) has finally been relicensed under a free software license: "So, we restarted the effort with Oracle, and on August 18, 2010, Wim Coekaerts, on behalf of Oracle America, gave permission for the remaining files that we knew about under the Sun RPC license (netkit-rusers, krb5, and glibc) to be relicensed under the 3 clause BSD license."
August 27, 2010 09:24 AM
I just had a fun breakfast with Simona Levi from ExGAE/ / oXcars. What I learned: Learning, Freedom and the Web isn’t the only interesting thing happening in Barcelona two months from now. There are at least seven open internet / open education / free culture events happening over the span of 10 days.

Between October 28 and November 6, Barcelona will host: the 2000 person oXcars free culture festival; the Free Culture Forum; the P2PU summit; Open Education 2010; Drumbeat Learning, Freedom and the Web; an open ed play day in the Raval; and possibly a Communia meeting. Phew.

We should find a way to shout and promote all of this. Barcelona will be the global epicentre of free culture / open education / open web stuff for 10 days this fall! We need a phrase or a name for it. ‘10 days of freedom‘? ‘Barcelona abierto‘? Not sure, but Simona and I agreed to call out for suggestions. If you have ideas, post them below.
PS. goes w/o saying -> book an extended trip to Barcelona if you can.
Filed under: drumbeat, education, mozilla, open, openeverything

August 27, 2010 09:04 AM
New Features
- Driver for the Kindle 3
- Users can now customize what actions appear in the toolbar and context menus via Preferences->Interface->Toolbars
- Draw a thin broder around the cover in the edit metadata dialog.
- Create (almost) all temporary files in a subdirectory so as not to clutter up temp directory
- FB2 Output: Add option to try to generate FB2 sections from the TOC. This may or may not work, depending on the file, so use with care.
- Add an option to remove all tags from selected books in the bulk metadata editor.
- Add a tweak to control how the dates in the Date column are formatted.
Bug Fixes
- Fix regression in 0.7.15 that broke the Similar books action and the add books to library from device action
- Add ZIP and RAR to the input format order preferences.
- Update podofo in all binary builds to 0.8.2. Should fix bug where setting metadata in some PDF files would cause file truncation.
- Add/remove header wizard: When running on PDF input, replace non breaking spaces with normal spaces, since it is hard to write regexps to match non breaking spaces with the regex builder wizard.
- Fix crash is user tries to switch libraries whiel a device is being detected
- Title sort now ignores leading quite character. Only applies to newly added books.
- Conversion pipeline: Don't fail if parsing extra css raises an exception. Instead just ignore it.
- SONY driver: Use the tz field (available in newer readers) to set timestamps correctly, when available.
- Shortening file paths: Handle the case of very long filenames with periods in them.
August 27, 2010 06:00 AM
Just a few weeks after his interview with EFF Legal Director Cindy Cohn, American hero Stephen Colbert has returned to the subject of digital rights. And in his show on Tuesday, he came up with a great solution to the problem of privacy and online social networks: Control-Self-Delete.
As Colbert suggests, the CEOs of Google and Facebook can be astonishingly tone deaf when it comes to the question of the privacy of their customers. As these experts in social media ought to know, the fact that a person chooses to share some information about themselves online is no indication that they prefer to share everything — nor does it indicate that control of personal data is not something they care deeply about.
">
">
">Study after study has shown the opposite to be true: users care about privacy, and demand control of their own data.
We like Colbert's basic point, saved for the end of this clip: if anyone should change their behavior to address the problem of online privacy, it isn't young people who have uploaded some racy pics — it's the companies that have made themselves the guardians of our personal data.
August 27, 2010 12:47 AM
August 26, 2010
Facebook is facing down another embarrassing episode of censorship this week after refusing to show ads submitted by the Just Say Now marijuana legalization campaign. The gag is an important reminder that social networks like Facebook — while useful, interesting, and pretty — are "walled gardens" with overseers whose interests can overwrite free speech, open communication, and in this case, essential political debate. (In this they have something in common with Apple.)
Most recently, Facebook was caught censoring mentions of Power.com, an online tool designed to help users collect their information from Facebook to facilitate migration to other social networks. To this day, users are still blocked from sending messages or posting status updates containing the word "Power.com," preventing users from spreading the word about a convenient way to "make the move" to Orkut, or LinkedIn, or any other social networking service that may crop up to compete. The block even stopped law professor Eric Goldman from commenting on Facebook’s lawsuit against Power.com (Disclosure: EFF filed an amicus brief in support of Power in that case).
Facebook's censorship for anticompetitive reasons is petty and lame to be sure, but silencing Just Say Now's marijuana legalization ad campaign is even worse. Voters in various districts nationwide will have to make important political decisions about marijuana this year (California's Proposition 19 is one example). Facebook's decision, reportedly an attempt to be consistent with its ad policies restricting smoking and/or marijuana-related content, is instead primarily silencing an important, motivated voice in a politically significant debate.
Facebook should lift the ban and show Just Say Now's political ads. For better or worse, Facebook has become a important means of communication and organization for candidates and political campaigns. In this role, Facebook functions best as a neutral platform, hosting the debate without entering it. Whether or not Facebook wants to restrict depictions of smoking in commercial ads, it should not prohibit the open and robust political debate central to the value and promise of the Internet.
August 26, 2010 11:08 PM
A significant update since my last post: you can now create and modify routes on OTV’s main page. As you’re probably aware, the key thing about OTV is to allow contributors to connect photos together, to make a route of interlinked photos which end-users will be able to walk along to create a StreetView like experience. The essentials of this are now done – you can create a new route (select “New route” on the main map page) by connecting together existing photos, and you can also add photos to an existing route by selecting “Move” on the main page and dragging the chosen photo onto a route. More information on the Howto page.
So do have a go contributing some of your own photos and making routes. Being in development, the odd bug could well come up so do let me know if you’re having problems.
As already suggested, next thing will be to start working on a prototype viewer for end-users though a range of other things like work commitments, moving house and a holiday are going to be occupying most of my time for the next three weeks or so, so it *may* be some time before the next update. But don’t go away, in the autumn and winter months I’ll hopefully be doing a fair bit of OTV development!
August 26, 2010 10:22 PM
Registration for the 2010 Mozilla Drumbeat Festival is now open! Join teachers, learners and technologists from around the world November 3 – 5 in Barcelona to teach, hack, shape and invent the future of education and the web.
read more
August 26, 2010 06:06 PM

Faster and more reliable genome sequencing has meant that the number of personal genome sequences available is increasing rapidly, yet the analysis of personal human genome sequences has been hampered by the lack of a standard file format to facilitate comparative analyses. In this month’s issue of Genome Biology, Karen Eilbeck and colleagues present GVF, the Genome Variation Format. GVF is an extension of the already widely-used GFF3 standard for describing genome annotations. The utility of GVF is demonstrated by the analysis of the first 10 publicly-available personal human genomes. The authors term this dataset "10Gen" and hope that this will become the standard reference set to facilitate the analysis of future personal genomes.
GVF and the 10Gen dataset are available at http://www.sequenceontology.org/gvf.html and are also included with the article published on the Genome Biology website here.
August 26, 2010 04:27 PM
For Immediate Release:
August 26, 2010
Earlier today, the Government Accountability Office released a report, “Enhanced Data Collection Could Help FCC Better Monitor Competition in the Wireless Industry.” A copy of the report is here.
The following statement is attributed to Gigi B. Sohn, president and co-founder of Public Knowledge:
“Today’s GAO report adds more evidence to the argument that any rules governing an open Internet should apply to the wireless sector as well as to the wired. The report paints a disturbing picture of an industry in which the top four carriers control 90 percent of the market, and industry consolidation is strangling smaller, regional carriers.
read more
August 26, 2010 03:46 PM
Hi,
I need an open OCR library which is able to scan complex printed math
formulas (for example some formulas which were generated via LaTeX). I
want to get some LaTeX-like output (or just some AST-like data).
Can Tesseract do this? Is there something like this already? Or are
current OCR technics just able to parse line-oriented text?
August 26, 2010 03:27 PM

The contents of the last regular print edition of Arthritis Research & Therapy will be finalized at the end of 2010, which marks the latest evolution of the journal and reflects the undeniable shift to electronic communication of science in the past decade. The Editors-in-Chief, Prof Peter Lipsky and Prof Sir Ravinder Maini, discuss in an editorial the reasons behind, and opportunities presented by, the journal’s decision to become an exclusively online publication.
Although BioMed Central was the first commercial open access publisher – and the Internet is fundamental to open access – BioMed Central has continued to publish a small but decreasing number of print journals, until now.
Arthritis Research & Therapy, first published by Current Science Ltd in 1999, was conceived with a strategy to take full advantage the benefits of online publishing. It has previously made innovative decisions in the rheumatology community, such as making all research open access and, latterly, publishing only the abstracts of research articles in print to help remove limitations to article length and to reduce publication times. This move to online-only publication will benefit readers, as they will see more cutting-edge review articles, and authors, who will no longer be faced with the choice of paying for color figures in non-research articles, as well as further limiting the environmental impact of the journal.
We expect that more innovations in rheumatology research publishing will be facilitated by the journal’s transfer to BioMed Central’s newly-designed journal platform in the coming months, and we will be communicating with the journal’s registered users via an online survey to establish what other online features would most benefit this rapidly-changing field.
By innovation and investment in new services for our readers, authors and reviewers we hope the journal will continue to readily drive and adapt to the change (or is that disruption?) the Internet has caused to publishing arthritis and rheumatic autoimmune disease research.
August 26, 2010 02:12 PM
Associate Editor: Tell us your perspective on creating your textbook, Digital Logic with Laboratory Exercises, with Global Text Project.Dr. Jim Feher: What can I say, I may be biased, but I think the Global Text Project (GTP) is just a fantastic organization. I've always been a huge proponent of open source software and the free exchange of ideas that is made possible by using the Creative
August 26, 2010 12:26 PM
August 25, 2010
Music lovers take note: the classical music archive Musopen needs your help to liberate some classic symphonies from copyright entanglement. Museopen is looking to solve a difficult problem: while symphonies written by Beethoven, Brahms, Sibelius, and Tchaikovsky are in the public domain, many modern arrangements and sound recordings of those works are copyrighted. That means that even after purchasing a CD or collection of MP3s of this music, you may not be able to freely exercise all the rights you'd associate with works in the public domain, like sharing the music using a peer-to-peer network or using the music in a film project.
To fix this, Musopen is asking backers to join an effort to hire a world-class orchestra to record sublime digital performances of the symphonies by the composers mentioned above. Musopen will then relinquish all rights to the recordings, giving the public the freedom to experience these works in full: to download, share, derive, and remix without limit. The fundraising campaign is taking place on Kickstarter, a site where users can pledge money to various creative projects. (Users pledge an amount towards a project, but the money doesn't actually go to the project unless the specified funding goal is reached. Kickstarter has a great explanation for their "all-or-nothing funding" design on their FAQ.)
It’s too bad such seminal, cultural works have been effectively buried by copyright interests — despite their age, ubiquity, and importance. (Note problems like this are exacerbated by discrepancies in international laws that create different "public domains" that copyright owners can exploit to stop online archives.) The Musopen campaign presents a creative solution that could help ensure that such essential music is preserved and shared for generations to come. Music lovers and copyfighters — vote with your wallet and support Museopen's work!
August 25, 2010 08:33 PM
We're pleased to announce that EFF's Legal Director, Cindy Cohn, has won a 2010 Intellectual Property Institute Vanguard Award from the State Bar of California.
Cindy was one of four legal professionals honored for spearheading new developments in the world of intellectual property. We're proud to see the work that we do to preserve balance in copyright, trademark, and patent law recognized, and we'll continue to fight for the fans, the tinkerers, independent journalists and bloggers, and consumers.
The 2nd Annual IP Vanguard Award will be presented to Cindy during an awards Luncheon on Friday, October 29, at the 2010 Annual IP Institute meeting in Napa, California.
August 25, 2010 08:33 PM
The Electronic Frontier Foundation is seeking to assist defendants in the Righthaven copyright troll lawsuits. Righthaven, founded in March of 2010, files hundreds of copyright infringement lawsuits on behalf of newspaper publishers against bloggers who make use of news content without permission. To that end, Righthaven searches the internet for stories and parts of stories from the newspapers that they represent. Once they find content that has been re-published, Righthaven purchases the copyright to the article and sues the owner of the blog.
Just like the US Copyright Group shakedowns, and the RIAA shakedowns of the recent past, Righthaven relies on the threat of enormous statutory damages associated with the Copyright Act to scare defendants, often individual bloggers operating non-commercial websites, into a quick settlement, reportedly ranging from two to five thousand dollars. The Righthaven lawsuits are of particular concern because they sometimes target the operators of political websites who re-publish newspaper stories, chilling political speech. Righthaven has also targeted the newspaper's source for the very articles allegedly infringed.
If you are the target for a Righthaven lawsuit in need of representation, please contact Eva Galperin at eva@eff.org. Please understand that we have a relatively small number of very hard-working attorneys, so we do not have the resources to defend everyone who asks, no matter how deserving. However, if we cannot represent you directly, we will make every effort to put you in touch with attorneys who can.
August 25, 2010 06:04 PM
The New York Times ran a front-page story yesterday about open peer review, featuring an experiment conducted by MediaCommons for The Shakespeare Quarterly using CommentPress. The article is here and the experiment itself is here. Both MediaCommons and CommentPress were born at the institute; it's exciting to see our efforts get such prominent notice.
August 25, 2010 04:05 PM
The HTML5Rocks team has published a tutorial on the HTML5 <video> tag. It includes clear explanations of the video formats supported by the various browsers and code snippets for supporting each in your pages. Check it out.
August 25, 2010 03:38 PM
August 24, 2010
MusixMatch, a new lyrics start-up company in Bologna, Italy just signed up to be our latest customer!
MusixMatch aims to license lyrics from all over the world (and not just the usual US/western Europe suspects) and aims to make accessing and licensing lyrics much easier than it currently is. I spent three days in June with the whole MusixMatch team to figure out how MusicBrainz and MusixMatch can work together, and we found a number of interesting ways in which we can help each other.
MusixMatch needs to match lyric publisher data to music metadata like the data in MusicBrainz. This matching will enable MusixMatch to instantly license lyrics to anyone who speaks MBIDs or anyone who can match their data to our metadata. And MusicBrainz will benefit from this relationship by being able to show lyrics on MusicBrainz pages, which enriches MusicBrainz and takes us one step further on our road to being a comprehensive music encyclopedia.
However, it should be noted that MusicBrainz is not getting into the lyrics business. We will never store Lyrics in our database since those are copyrighted! We plan to fetch lyrics from the MusixMatch servers to display them on our site. MusixMatch, however, plans to offer our music metadata and lyrics in a package deal once we’ve matched our data and have lyric support on musicbrainz.org.
All of this lyrics work will come after we’ve shipped NGS — until NGS we will not adapt any new features! We are really keeping our focus on delivering NGS as soon as we’re happy with its stability.
August 24, 2010 10:21 PM
As you may know, in our Next Generation Schema release we are including support for musical Works. Our definition of a Work is a musical composition that will at some point be performed and possibly recorded, in which case it will become a Recording. In the current MusicBrainz implementation we do not have the concept of a Work and a lot of the Advanced Relationships (ARs) we have are muddled between the concept of a Work or a Recording.
This left us with the tricky task of reviewing all track level ARs and prying apart which ARs should be moved to Works and which ones to Recording. Or both! To accomplish this task, Brianfreud had compiled a list of open issues, which Ian Corvidae has adopted and nutured. Today we convened an IRC meeting with Nikki, Pete Marsh from the BBC, Ian and myself. If you’re interested in how we reached the decisions we did, please take a look at the chatlog.
Our decisions have been captured in this wiki page — please take a look at it and see if we’ve missed anything or if there is anything you disagree with. If we do not hear any feedback on this topic, we will change our NGS data conversion script to convert the data as decided in this page.
Thanks to Ian, Pete and Nikki for your help in this meeting! And big thanks also go to Murdos for all of your help in steering me towards getting all Works related issues on to the table!
August 24, 2010 10:20 PM
Tom Tauke, Verizon’s erudite executive vice president for public affairs, made a valiant attempt the other day to try to salvage the policy deal his company made with Google. In a speech at the Technology Policy Institute’s telecom forum in Aspen, he brought out arguments old and new to argue why it was that an agreement forged between two big companies to their benefit should be accepted.
read more
August 24, 2010 08:56 PM
Insomnia is a highly prevalent condition, with up to a third of the general adult populace thought to suffer from insomnia at some time. Insomnia is generally associated with a negative impact on day-to-day functioning and has been noted to have co-morbid associations with a variety of psychiatric conditions.
Melatonin, an endogenous sleep regulating hormone, has been mooted as a potential therapy for this debilitating condition. Endogenous melatonin production is known to decrease as a person ages,
therefore it has been hypothesised that treatment with this hormone may
be efficacious in treating insomnia in the elderly population. However results from studies have often proved contentious, with a lack of consistency in the results seen in differing age groups exposed to melatonin therapy. 
Results from a recently published randomized controlled trial in BMC Medicine have now shed new light on this controversial subject. Wade et al examined the use of prolonged release melatonin (PRM) in sufferers of primary insomnia across a wide range of ages. Their results showed that PRM is particularly effective and well tolerated in patients aged 65 years and over, with the treatment response increasing and being sustained over a 6 month period.
If you wish to learn more about this fascinating result and an array of other high impact articles visit the BMC Medicine website.
August 24, 2010 09:53 AM
Good news in the fight against bad software patents: a jury in the Eastern District of Texas recently found the Firepond/Polaris patent (U.S. Patent No. 6,411,947) invalid. This patent was on EFF's "Most Wanted" list, targeted because it claimed nothing more than a system using natural language processing to respond to customers' online inquires by email.
EFF was not involved in this case, in which Bright Response, LLC — the technical owner of the patent — sued Google, Inc., Yahoo!, Inc. and eight other companies, alleging that Google's AdWords and Yahoo!'s Sponsored Search infringes the Firepond/Polaris patent. The jury found three of the patent's claims invalid based on the public use bar, obviousness, and for lacking written description. The jury also found that neither Google nor Yahoo! infringed those claims. Finally, the jury found the entire patent invalid due to improper inventorship.
In addition to the jury's findings, the Patent and Trademark Office is nearing completion of a reexamination of the patent, instituted by Google, that narrows the scope of that patent's claims.
"This is a great outcome and good news for people and developers who create new products related to customer service or email," said Patrick King, one of the attorneys assisting EFF on this matter.
Because the court has not yet entered a final judgment, Bright Response could still, in theory, attempt to prohibit others from using the basic natural language processing technology in its patent. EFF is on the lookout for this threatening behavior, so please make sure to let us know if you hear of any. EFF will continue to monitor this case — and the corresponding reexam — and will take action as necessary to fight any additional efforts to use the Firepond/Polaris patent to quash competition and hurt innovation.
"We are still waiting for the court case to finish up and to see if Bright Responses will appeal the decision. If any of the patent is still alive after that, we will do whatever we can to invalidate it, and allow competitors to use this simple technology, which was well known prior to the patent filing," said Gina M. Steele, another attorney assisting EFF with this matter.
The Firepond/Polaris patent was one of the ten original Top Ten Patents targeted by EFF’s Patent Busting Project, which combats the chilling effects of bad patents on the public and consumer interests. So far nine patents targeted by EFF have been busted, invalidated, narrowed, or had a reexamination granted by the Patent Office.
August 24, 2010 12:36 AM
August 23, 2010
It looks like Apple, Inc., is exploring a new business opportunity: spyware and what we're calling "traitorware." While users were celebrating the new jailbreaking and unlocking exemptions, Apple was quietly preparing to apply for a patent on technology that, among other things, would allow Apple to identify and punish users who take advantage of those exemptions or otherwise tinker with their devices. This patent application does nothing short of providing a roadmap for how Apple can — and presumably will — spy on its customers and control the way its customers use Apple products. As Sony-BMG learned, spying on your customers is bad for business. And the kind of spying enabled here is especially creepy — it's not just spyware, it's "traitorware," since it is designed to allow Apple to retaliate against you if you do something Apple doesn't like.
Essentially, Apple's patent provides for a device to investigate a user's identity, ostensibly to determine if and when that user is "unauthorized," or, in other words, stolen. More specifically, the technology would allow Apple to record the voice of the device's user, take a photo of the device's user's current location or even detect and record the heartbeat of the device's user. Once an unauthorized user is identified, Apple could wipe the device and remotely store the user's "sensitive data." Apple's patent application suggests it may use the technology not just to limit "unauthorized" uses of its phones but also shut down the phone if and when it has been stolen.
However, Apple's new technology would do much more. This patented device enables Apple to secretly collect, store and potentially use sensitive biometric information about you. This is dangerous in two ways: First, it is far more than what is needed just to protect you against a lost or stolen phone. It's extremely privacy-invasive and it puts you at great risk if Apple's data on you are compromised. But it's not only the biometric data that are a concern. Second, Apple's technology includes various types of usage monitoring — also very privacy-invasive. This patented process could be used to retaliate against you if you jailbreak or tinker with your device in ways that Apple views as "unauthorized" even if it is perfectly legal under copyright law.
Here's a sample of the kinds of information Apple plans to collect:
- The system can take a picture of the user's face, "without a flash, any noise, or any indication that a picture is being taken to prevent the current user from knowing he is being photographed";
- The system can record the user's voice, whether or not a phone call is even being made;
- The system can determine the user's unique individual heartbeat "signature";
- To determine if the device has been hacked, the device can watch for "a sudden increase in memory usage of the electronic device";
- The user's "Internet activity can be monitored or any communication packets that are served to the electronic device can be recorded"; and
- The device can take a photograph of the surrounding location to determine where it is being used.
In other words, Apple will know who you are, where you are, and what you are doing and saying and even how fast your heart is beating. In some embodiments of Apple's "invention," this information "can be gathered every time the electronic device is turned on, unlocked, or used." When an "unauthorized use" is detected, Apple can contact a "responsible party." A "responsible party" may be the device's owner, it may also be "proper authorities or the police."
Apple does not explain what it will do with all of this collected information on its users, how long it will maintain this information, how it will use this information, or if it will share this information with other third parties. We know based on long experience that if Apple collects this information, law enforcement will come for it, and may even order Apple to turn it on for reasons other than simply returning a lost phone to its owner.
This patent is downright creepy and invasive — certainly far more than would be needed to respond to the possible loss of a phone. Spyware, and its new cousin traitorware, will hurt customers and companies alike — Apple should shelve this idea before it backfires on both it and its customers.
August 23, 2010 11:55 PM
An Indian computer scientist was arrested this weekend when he refused to disclose an anonymous source who provided an electronic voting machine to a team of security researchers.
Hari Prasad is the managing director of Netindia Ltd., an Indian research and development firm. He and other researchers have long questioned the security of India's paperless electronic voting machines. Despite repeated reports of election irregularities and concerns about fraud, the Election Commission of India insists that the machines are tamper-proof.
In 2009, the commission publicly challenged Prasad to show that India's voting machines could be compromised, but refused to give him access to the machines to perform a review. Earlier this year, an anonymous source provided an Indian voting machine to a research team led by Prasad, Alex Halderman, and Rop Gonggrijp. The team exposed security flaws that could allow an attacker to change election results and compromise ballot secrecy. They published a paper detailing their findings, which you can read here.
According to Halderman, Prasad was questioned Saturday morning at his home in Hyderabad by authorities who wanted to know the identity of the source who gave the voting machine to the research team. Prasad was ultimately arrested and taken to Mumbai, though reportedly hadn't been charged with a crime.
This turn of events is deeply troubling. Prasad is a respected researcher who helped to discover a critical flaw in India's voting system. He and his fellow researchers would never have been able to document the weaknesses in India's voting machines without the help of their anonymous source. This is precisely why anonymity is important: it allows people to make important contributions to the public dialogue without fear of retribution.
The Election Commission of India should have given researchers access to the voting machines in the first place. Rather than attempting to persecute Prasad and the anonymous source, the government should be focusing its attention and resources on the real problem: electronic voting machines with no mechanism for accountability.
UPDATE: According to the Times of India and Reuters, Prasad has been charged in connection with the alleged theft of the voting machine studied by the research team. He has been remanded to police custody until Thursday, August 26.
August 23, 2010 10:50 PM
Various studies have suggested that a genetic test for the efficacy of the commonly used breast cancer drug, tamoxifen, is an effective predictor of how patients will respond to the drug. Tamoxifen undergoes metabolism upon oral administration, and it is widely accepted that the majority of the anti-proliferative effects of tamoxifen occur via its active metabolites. The CYP2D6 gene plays an important role in these metabolic pathways, and a genetic test is available which establishes which variant of the CYP2D6 gene the patient has. Some experts recommend that this test should be used in clinical practice, particularly in the case of postmenopausal women.
Research published in Breast Cancer Research sheds new light on the matter. The study looked at 6640 breast cancer patients from the United Kingdom and evaluated the association between genotype and breast cancer specific survival (BCSS), finding weak evidence that the poor-metaboliser variant, CYP2D6*6, is associated with decreased BCSS. This suggests that the use of this test in a clinical setting should be avoided until larger studies confirming any associations are available.
There are currently 500,000 women in the U.S.A. taking tamoxifen, so this outcome has the potential to affect hundreds of thousands of people. This fresh evidence reflects recent doubts about the test, as an editorial published recently in the Journal of Clinical Oncology stated that "routine use should await more reliable evidence from well-designed studies."
Anita Bock
Assistant Editor - Breast Cancer Research
August 23, 2010 10:56 AM
(Can you tell, by the recent frequency of posts, that I'm try not to work on getting ready for classes next week?)
[This post is based partially on some conversations with Kevin Duh, though not in the finite state models formalism.]
The finite state machine approach to NLP is very appealing (I mean both string and tree automata) because you get to build little things in isolation and then chain them together in cool ways. Kevin Knight has a great slide about how to put these things together that I can't seem to find right now, but trust me that it's awesome, especially when he explains it to you :).
The other thing that's cool about them is that because you get to build them in isolation, you can use different data sets, which means data sets with different assumptions about the existence of "labels", to build each part. For instance, to do speech to speech transliteration from English to Japanese, you might build a component system like:
English speech --A--> English phonemes --B--> Japanese phonemes --C--> Japanese speech --D--> Japanese speech LM
You'll need a language model (D) for Japanese speech, that can be trained just on acoustic Japanese signals, then parallel Japanese speech/phonemes (for C), parallel English speech/phonemes (for A) and parallel English phonemes/Japanese phonemes (for B). [Plus, of course, if you're missing any of these, EM comes to your rescue!]
Let's take a simpler example, though the point I want to make applies to long chains, too.
Suppose I want to just do translation from French to English. I build an English language model (off of monolingual English text) and then an English-to-French transducer (remember that in the noisy channel, things flip direction). For the E2F transducer, I'll need parallel English/French text, of course. The English LM gives me p(e) and the transducer gives me p(f|e), which I can put together via Bayes' rule to get something proportional to p(e|f), which will let me translate new sentences.
But, presumably, I also have lots of monolingual French text. Forgetting math for a moment, which seems to suggest that this can't help me, we can ask: why should this help?
Well, it probably won't help with my English language model, but it should be able to help with my transducer. Why? Because my transducer is supposed to give me p(f|e). If I have some French sentence in my GigaFrench corpus to which my transducer assigns zero probability (for instance, max_e p(f|e) = 0), then this is probably a sign that something bad is happening.
More generally, I feel like the following two operations should probably give roughly the same probabilities:
- Drawing an English sentence from the language model p(e).
- Picking a French sentence at random from GigaFrench, and drawing an English sentence from p(e|f), where p(e|f) is the composition of the English LM and the transducer.
If you buy this, then perhaps one thing you could do is to try to learn a transducer q(f|e) that has low KL divergence between 1 and 2, above. If you work through the (short) make, and throw away terms that are independent of the transducer, then you end up wanting to minimize
[ sum
_e p(e) log sum
_f q(f|e) ]. Here, the sum over f is a
finite sum over GigaFrench, and the sum over e is an
infinite sum over positive probability English sentences given my the English LM p(e).
One could then apply something like
posterior regularization (Kuzman Ganchev, Graça and Taskar) to do the learning. There's the nasty bit about how to compute these things, but that's why you get to be friends with Jason Eisner so he can tell you how to do anything you could ever want to do with finite state models.
Anyway, it seems like an interesting idea. I'm definitely not aware if anyone has tried it.
August 23, 2010 11:11 AM

Earlier Listing is Expanded by More than 30%
At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing [1].
All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.
Comprehensive Ontology Tools
- Altova SemanticWorks is a visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design. No open source version available
- Amine is a rather comprehensive, open source platform for the development of intelligent and multi-agent systems written in Java. As one of its components, it has an ontology GUI with text- and tree-based editing modes, with some graph visualization
- The Apelon DTS (Distributed Terminology System) is an integrated set of open source components that provides comprehensive terminology services in distributed application environments. DTS supports national and international data standards, which are a necessary foundation for comparable and interoperable health information, as well as local vocabularies. Typical applications for DTS include clinical data entry, administrative review, problem-list and code-set management, guideline creation, decision support and information retrieval.. Though not strictly an ontology management system, Apelon DTS has plug-ins that provide visualization of concept graphs and related functionality that make it close to a complete solution
- DOME is a programmable XML editor which is being used in a knowledge extraction role to transform Web pages into RDF, and available as Eclipse plug-ins. DOME stands for DERI Ontology Management Environment
- FlexViz is a Flex-based, Protégé-like client-side ontology creation, management and viewing tool; very impressive. The code is distributed from Sourceforge; there is a nice online demo available; there is a nice explanatory paper on the system, and the developer, Chris Callendar, has a useful blog with Flex development tips
- <Newest> ITM supports the management of complex knowledge structures (metadata repositories, terminologies, thesauri, taxonomies, ontologies, and knowledge bases) throughout their lifecycle, from authoring to delivery. ITM can also manage alignments between multiple knowledge structures, such as thesauri or ontologies, via the integration of INRIA’s Alignment API. Commercial; from Mondeca
- Knoodl facilitates community-oriented development of OWL based ontologies and RDF knowledge bases. It also serves as a semantic technology platform, offering a Java service-based interface or a SPARQL-based interface so that communities can build their own semantic applications using their ontologies and knowledgebases. It is hosted in the Amazon EC2 cloud and is available for free; private versions may also be obtained. See especially the screencast for a quick introduction
- The NeOn toolkit is a state-of-the-art, open source multi-platform ontology engineering environment, which provides comprehensive support for the ontology engineering life-cycle. The v2.3.0 toolkit is based on the Eclipse platform, a leading development environment, and provides an extensive set of plug-ins covering a variety of ontology engineering activities. You can add these plug-ins or get a current listing from the built-in updating mechanism
- ontopia is a relative complete suite of tools for building, maintaining, and deploying Topic Maps-based applications; open source, and written in Java. Could not find online demos, but there are screenshots and there is visualization of topic relationships
- Protégé is a free, open source visual ontology editor and knowledge-base framework. The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema. There are a large number of third-party plugins that extends the platform’s functionality
- Protégé Plugin Library – frequently consult this page to review new additions to the Protégé editor; presently there are dozens of specific plugins, most related to the semantic Web and most open source
- Collaborative Protégé is a plug-in extension of the existing Protégé system that supports collaborative ontology editing as well as annotation of both ontology components and ontology changes. In addition to the common ontology editing operations, it enables annotation of both ontology components and ontology changes. It supports the searching and filtering of user annotations, also known as notes, based on different criteria. There is also an online demo
- <New>Web Protégé is an online version of Protégé attempting to capture all of the native functionality; still under development
- <New>Sigma is open source knowledge engineering environment that includes ontology mapping, theorem proving, language generation in multiple languages, browsing, OWL read/write, and analysis. It includes the Suggested Upper Merged Ontology (SUMO), a comprehensive formal ontology. It’s under active development and use
- TopBraid Composer is an enterprise-class modeling environment for developing Semantic Web ontologies and building semantic applications. Fully compliant with W3C standards, Composer offers comprehensive support for developing, managing and testing configurations of knowledge models and their instance knowledge bases. It is based on the Eclipse IDE. There is a free version (after registration) for small ontologies
- <New>TwoUse Toolkit is an implementation of current OMG and W3C standards for developing ontology-based software models and model-based OWL2 ontologies, largely based around UML. There are a variety of tools, including graphics editors, with more to come
- <New>Wandora is a topic maps engine written in Java with support for both in-memory topic maps and persisting topic maps in MySQL and SQL Server. It also contains an editor and a publishing system, and has support for automatic classification. It can read OBO, RDF(S), and many other formats, and can export topic maps to various graph formats. There is also a web-based topic maps browser, and graphical visualization.
Not Apparently in Active Use
- Adaptiva is a user-centred ontology building environment, based on using multiple strategies to construct an ontology, minimising user input by using adaptive information extraction
- Exteca is an ontology-based technology written in Java for high-quality knowledge management and document categorisation, including entity extraction. Though code is still available, no updates have been provided since 2006. It can be used in conjunction with search engines
- IODT is IBM’s toolkit for ontology-driven development. The toolkit includes EMF Ontolgy Definition Metamodel (EODM), EODM workbench, and an OWL Ontology Repository (named Minerva)
- KAON is an open-source ontology management infrastructure targeted for business applications. It includes a comprehensive tool suite allowing easy ontology creation and management and provides a framework for building ontology-based applications. An important focus of KAON is scalable and efficient reasoning with ontologies
- Ontolingua provides a distributed collaborative environment to browse, create, edit, modify, and use ontologies. The server supports over 150 active users, some of whom have provided us with descriptions of their projects. Provided as an online service; software availability not known.
Vocabulary Prompting Tools
- AlchemyAPI from Orchestr8 provides an API based application that uses statistical and natural language processing methods. Applicable to webpages, text files and any input text in several languages
- BooWa is a set expander for any language (formerly known as SEALS); developed by RC Wang of Carnegie Mellon
- Google Keywords allows you to enter a few descriptive words or phrases or a site URL to generate keyword ideas
- Google Sets for automatically creating sets of items from a few examples
- Open Calais is free limited API web service to automatically attach semantic metadata to content, based on either entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), or events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’). The metadata results are stored centrally and returned to you as industry-standard RDF constructs accompanied by a Globally Unique Identifier (GUID)
- Query-by-document from BlogScope has a nice phrase extraction service, with a choice of ranking methods. Can also be used in a Firefox plug-in (not texted with 3.5+)
- SemanticHacker (from Textwise) is an API that does a number of different things, including categorization, search, etc. By using ‘concept tags’, the API can be leveraged to generate metadata or tags for content
- TagFinder is a Web service that automatically extracts tags from a piece of text. The tags are chosen based on both statistical and linguistic analysis of the original text
- Tagthe.net has a demo and an API for automatic tagging of web documents and texts. Tags can be single words only. The tool also recognizes named entities such as people names and locations
- TermExtractor extracts terminology consensually referred in a specific application domain. The software takes as input a corpus of domain documents, parses the documents, and extracts a list of “syntactically plausible” terms (e.g. compounds, adjective-nouns, etc.)
- TermFinder uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language; available for English, French and Italian
- TerMine is an online and batch term extractor that emphasizes part of speech (POS) and n-gram (phrase extraction). TerMine is the terminological management system with the C-Value term extraction and AcroMine acronym recognition integrated
- Topia term extractor is a part-of-speech and frequency based term extraction tool implemented in python. Here is a term extraction demo based on this tool
- Topicalizer is a service which automatically analyses a document specified by a URL or a plain text regarding its word, phrase and text structure. It provides a variety of useful information on a given text including the following: Word, sentence and paragraph count, collocations, syllable structure, lexical density, keywords, readability and a short abstract on what the given text is about
- TrMExtractor does glossary extraction on pure text files for either English or Hungarian
- Wikify! is a system to automatically “wikify” a text by adding Wikipedia-like tags throughout the document. The system extracts keywords and then disambiguates and matches them to their corresponding Wikipedia definition
- Yahoo! Placemaker is a freely available geoparsing Web service. It helps developers make their applications location-aware by identifying places in unstructured and atomic content – feeds, web pages, news, status updates – and returning geographic metadata for geographic indexing and markup
- Yahoo! Term Extraction Service is an API to Yahoo’s term extraction service, as well as many other APIs and services in a variety of languages and for a variety of tasks; good general resource. The service has been reported to be shut down numerous times, but apparently is kept alive due to popular demand.
Initial Ontology Development
- COE COE (CmapTools Ontology Editor) is a specialized version of the CmapTools from IMHC. COE — and its CmapTools parent — is based on the idea of concept maps. A concept map is a graph diagram that shows the relationships among concepts. Concepts are connected with labeled arrows, with the relations manifesting in a downward-branching hierarchical structure. COE is an integrated suite of software tools for constructing, sharing and viewing OWL encoded ontologies based on these constructs
- Conzilla2 is a second generation concept browser and knowledge management tool with many purposes. It can be used as a visual designer and manager of RDF classes and ontologies, since its native storage is in RDF. It also has an online collaboration server [apparently last updated in 2008]
- http://diagramic.com/ has an online Flex network graph demo, which also has a neat facility for quick entry and visualization of relationships; mostly small scale; pretty cool. Does not appear to be code available anywhere
- <New>DL-Learner is a tool for learning OWL class expressions from examples and background knowledge. It extends Inductive Logic Programming (ILP) to Description Logics and the Semantic Web. DL-Learner now has a flexible component based design, which allows to extend it easily with new learning algorithms, learning problems, reasoners, and supported background knowledge sources. A new type of supported knowledge sources are SPARQL endpoints, where DL-Learner can extract knowledge fragments, which enables learning classes even on large knowledge sources like DBpedia, and includes an OWL API reasoner interface and Web service interface.
- DogmaModeler is a free and open source, ontology modeling tool based on ORM. The philosophy of DogmaModeler is to enable non-IT experts to model ontologies with a little or no involvement of an ontology engineer; project is quite old, but the software is still available and it may provide some insight into naive ontology development
- Erca is a framework that eases the use of Formal and Relational Concept Analysis, a neat clustering technique. Though not strictly an ontology tool, Erca could be implemented in a work flow that allows easy import of formal contexts from CSV files, then algorithms that computes the concept lattice of the formal contexts that can be exported as dot graphs (or in JPG, PNG, EPS and SVG formats). Erca is provided as an Eclipse plug-in
- GraphMind is a mindmap editor for Drupal. It has the basic mindmap features and some Drupal specific enhancements. There is a quick screencast about how GraphMind looks like and what is does. The Flex source is also available from Github
- <New>H-Maps is a commercial suite of tools for building topic maps applications, consisting of a topic maps engine and server, a mapping framework for converting from legacy data, and a navigator for visualizing data. It is typically used in bioinformatics (drug discovery and research, toxicological studies, etc), engineering (support and expert systems), and for integration of hetereogeneous data. It supports the XTM 1.0 and TMAPI 1.0 specifications
- irON using spreadsheets, via its notation and specification. Spreadsheets can be used for initial authoring, esp if the irON guidelines are followed. See further this case study of Sweet Tools in a spreadsheet using irON (commON)
- <New>JXML2OWL API is a library for mapping XML schemas to OWL Ontologies on the JAVA platform. It creates an XSLT which transforms instances of the XML schema into instances of the OWL ontology. JXML2OWL Mapper is GUI application using the JXML2OWL API
- MindRaider is Semantic Web outliner. It aims to connect the tradition of outline editors with emerging technologies. MindRaider mission is to organize not only the content of your hard drive but also your cognitive base and social relationships in a way that enables quick navigation, concise representation and inferencing
- <New>Neologism is a simple web-based RDF Schema vocabulary editor and publishing system. Use it to create RDF classes and properties, which are needed to publish data on the Semantic Web. Its main goal is to dramatically reduce the time required to create, publish and modify vocabularies for the Semantic Web. It is written in PHP and built on the Drupal platform. Neologism is currently in alpha
- <New>OCS – Ontology Creation System is software to develop ontologies in cooperative way with a graphical interface
- RDF123 is an application and web service for converting data in simple spreadsheets to an RDF graph. Users control how the spreadsheet’s data is converted to RDF by constructing a graphical RDF123 template that specifies how each row in the spreadsheet is converted as well as metadata for the spreadsheet and its RDF translation
- <New>ROC (Rapid Ontology Construction) is a tool that allows domain experts to quickly build a basic vocabulary for their domain, re-using existing terminology whenever possible. How this works is that the ROC tool asks the domain expert for a set of keywords that are ‘core’ terms of the domain, and then queries remote sources for concepts matching those terms. These are then presented to the user, who can select terms from the list, find relations to other terms, and expand the set of terms and relations, iteratively. The resulting vocabulary (or ‘proto-ontology’, basically a SKOS-like thesaurus) can be used as is, or can be used as input for a knowledge engineer to base a more comprehensive domain ontology on. Interface “triples-oriented,” not graphical.
- Topincs is a Topic Map authoring software that allows groups to share their knowledge over the web. It makes use of a variety of modern technologies. The most important are Topic Maps, REST and Ajax. It consists of three components: the Wiki, the Editor, and the Server. The servier requires AMP; the Editor and Wiki are based on browser plug-ins.
Ontology Editing
- First, see all of the Comprehensive Tools and Ontology Development listings above
- Anzo for Excel includes an (RDFS and OWL-based) ontology editor that can be used directly within Excel. In addition to that, Anzo for Excel includes the capability to automatically generate an ontology from existing spreadsheet data, which is very useful for quick bootstrapping of an ontology
- <New>ATop is a topic map browser and editor written in Java and supports the XTM 1.0 specification; project has not been updated since 2008
- Hozo is an ontology visualization and development tool that brings version control constructs to group ontology development; limited to a prototype, with no online demo
- Lexaurus Editor is for off-line creation and editing of vocabularies, taxonomies and thesauri. It supports import and export in Zthes and SKOS XML formats, and allows hierarchical / poly-hierarchical structures to be loaded for editing, or even multiple vocabularies to be loaded simultaneously, so that terms from one taxonomy can be re-used in another, using drag and drop. Not available in open source
- Model Futures OWL Editor combines simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports. The editor is tree-based and has a “navigator” tool for traversing property and class-instance relationships. It can import XMI (the interchange format for UML) and Thesaurus Descriptor (BT-NT XML), and EXPRESS XML files. It can export to MS Word.
- <New>OBO-Edit is an open source ontology editor written in Java. OBO-Edit is optimized for the OBO biological ontology file format. It features an easy to use editing interface, a simple but fast reasoner, and powerful search capabilities
- <New>Onotoa is an Eclipse-based ontology editor for topic maps. It has a graphical UML-like interface, an export function for the current TMCL-draft and a XTM export
- OntoTrack is a browsing and editing ontology authoring tool for OWL Lite. It combines a sophisticated graphical layout with mouse enabled editing features optimized for efficient navigation and manipulation of large ontologies
- OWLViz is an attractive visual editor for OWL and is available as a Protégé plug-in
- PoolParty is a triple store-based thesaurus management environment which uses SKOS and text extraction for tag recommendations. See further this manual, which describes more fully the system’s functionality. Also, there is a PoolParty Web service that enables a Zthes thesaurus in XML format to be uploaded and converted to SKOS (via skos:Concepts)
- SKOSEd is a plugin for Protege 4 that allows you to create and edit thesauri (or similar artefacts) represented in the Simple Knowledge Organisation System (SKOS).
- TemaTres is a Web application to manage controlled vocabularies, taxonomies and thesaurus. The vocabularies may be exported in Zthes, Skos, TopicMap, etc.
- ThManager is a tool for creating and visualizing SKOS RDF vocabularies. ThManager facilitates the management of thesauri and other types of controlled vocabularies, such as taxonomies or classification schemes
- Vitro is a general-purpose web-based ontology and instance editor with customizable public browsing. Vitro is a Java web application that runs in a Tomcat servlet container. With Vitro, you can: 1) create or load ontologies in OWL format; 2) edit instances and relationships; 3) build a public web site to display your data; and 4) search your data with Lucene. Still in somewhat early phases, with no online demos and with minimal interfaces.
- <New>Vocab Editor is an RDF/OWL/SKOS vocabulary-diagram editor. It has both client- (Javascript) and server-side (Python) implmentations. It is open source with a demo. There is a blog (Spanish) and online sample vocabulary app editor.
Not Apparently in Active Use
- Omnigator The Omnigator is a form-based manipulaton tool centered on Topic Maps, though it enables the loading and navigation of any conforming topic map in XTM, HyTM, LTM or RDF formats. There is a free evaluation version.
- OntoGen is a semi-automatic and data-driven ontology editor focusing on editing of topic ontologies (a set of topics connected with different types of relations). The system combines text-mining techniques with an efficient user interface. It requires .Net.
- OntoLight is a set of software modules for: transforming raw ontology data for several ontologies from their specific formats into a unifying light-weight ontology format, grounding the ontology and storing it into grounded ontology format, populating grounded ontologies with new instance data, and creating mappings between grounded ontologies; includes Cyc. Download no longer available. See http://analytics.ijs.si/~blazf/papers/Context_SiKDD07.pdf and http://www.neon-project.org/web-content/index.php?option=com_weblinks&task=view&catid=17&id=52 or http://www.neon-project.org/web-content/index.php?option=com_weblinks&catid=21&Itemid=73
- OWL-S-editor is an editor for the development of services in OWL-S, with graphical, WSDL and import/export support
- ReTAX+ is an aide to help a taxonomist create a consistent taxonomy and in particular provides suggestions as to where a new entity could be placed in the taxonomy whilst retaining the integrity of the revised taxonomy (c.f., problems in ontology modelling)
- SWOOP is a lightweight ontology editor. (Swoop is no longer under active development at mindswap. Continuing development can be found on SWOOP’s Google Code homepage at http://code.google.com/p/swoop/)
- WebOnto supports the browsing, creation and editing of ontologies through coarse grained and fine grained visualizations and direct manipulation.
Ontology Mapping
- <New>The Alignment API is an API and implementation for expressing and sharing ontology alignments. The correspondences between entities (e.g., classes, objects, properties) in ontologies is called an alignment. The API provides a format for expressing alignments in a uniform way. The goal of this format is to be able to share on the web the available alignments. The format is expressed in RDF, so it is freely extensible. The Alignment API itself is a Java description of tools for accessing the common format. It defines four main interfaces (Alignment, Cell, Relation and Evaluator).
- COMA++ is a schema and ontology matching tool with a comprehensive infrastructure. Its graphical interface supports a variety of interaction
- ConcepTool is a system to model, analyse, verify, validate, share, combine, and reuse domain knowledge bases and ontologies, reasoning about their implication
- <New>MapOnto is a research project aiming at discovering semantic mappings between different data models, e.g, database schemas, conceptual schemas, and ontologies. So far, it has developed tools for discovering semantic mappings between database schemas and ontologies as well as between different database schemas. The Protege plug-in is still available, but appears to be for older versions
- MatchIT automates and facilitates schema matching and semantic mapping between different Web vocabularies. MatchIT runs as a stand-alone or plug-in Eclipse application and can be integrated with popular third party applications. MatchIT’s uses Adaptive Lexicon™ as an ontology-driven dictionary and thesaurus of English language terminology to quantify and ank the semantic similarity of concepts. It apparently is not available in open source
- myOntology is used to produce the theoretical foundations, and deployable technology for the Wiki-based, collaborative and community-driven development and maintenance of ontologies instance data and mappings
- OLA/OLA2 (OWL-Lite Alignment) matches ontologies written in OWL. It relies on a similarity combining all the knowledge used in entity descriptions. It also deal with one-to-many relationships and circularity in entity descriptions through a fixpoint algorithm
- Potluck is a Web-based user interface that lets casual users—those without programming skills and data modeling expertise—mash up data themselves. Potluck is novel in its use of drag and drop for merging fields, its integration and extension of the faceted browsing paradigm for focusing on subsets of data to align, and its application of simultaneous editing for cleaning up data syntactically. Potluck also lets the user construct rich visualizations of data in-place as the user aligns and cleans up the data.
- PRIOR+ is a generic and automatic ontology mapping tool, based on propagation theory, information retrieval technique and artificial intelligence model. The approach utilizes both linguistic and structural information of ontologies, and measures the profile similarity and structure similarity of different elements of ontologies in a vector space model (VSM).
- <New>S-Match takes any two tree like structures (such as database schemas, classifications, lightweight ontologies) and returns a set of correspondences between those tree nodes which semantically correspond to one another.
- Vine is a tool that allows users to perform fast mappings of terms across ontologies. It performs smart searches, can search using regular expressions, requires a minimum number of clicks to perform mappings, can be plugged into arbitrary mapping framework, is non-intrusive with mappings stored in an external file, has export to text files, and adds metadata to any mapping. See also http://sourceforge.net/projects/vine/.
Not Apparently in Active Use
- ASMOV (Automated Semantic Mapping of Ontologies with Validation) is an automatic ontology matching tool which has been designed in order to facilitate the integration of heterogeneous systems, using their data source ontologies
- Chimaera is a software system that supports users in creating and maintaining distributed ontologies on the web. Two major functions it supports are merging multiple ontologies together and diagnosing individual or multiple ontologies
- CMS (CROSI Mapping System) is a structure matching system that capitalizes on the rich semantics of the OWL constructs found in source ontologies and on its modular architecture that allows the system to consult external linguistic resources
- ConRef is a service discovery system which uses ontology mapping techniques to support different user vocabularies
- DRAGO reasons across multiple distributed ontologies interrelated by pairwise semantic mappings, with a vision of peer-to-peer mapping of many distributed ontologies on the Web. It is implemented as an extension to an open source Pellet OWL Reasoner
- Falcon-AO (Finding, aligning and learning ontologies) is an automatic ontology matching tool that includes the three elementary matchers of String, V-Doc and GMO. In addition, it integrates a partitioner PBM to cope with large-scale ontologies
- FOAM is the Framework for ontology alignment and mapping. It is based on heuristics (similarity) of the individual entities (concepts, relations, and instances)
- hMAFRA (Harmonize Mapping Framework) is a set of tools supporting semantic mapping definition and data reconciliation between ontologies. The targeted formats are XSD, RDFS and KAON
- IF-Map is an Information Flow based ontology mapping method. It is based on the theoretical grounds of logic of distributed systems and provides an automated streamlined process for generating mappings between ontologies of the same domain
- LILY is a system matching heterogeneous ontologies. LILY extracts a semantic subgraph for each entity, then it uses both linguistic and structural information in semantic subgraphs to generate initial alignments. The system is presently in a demo version only
- MAFRA Toolkit – the Ontology MApping FRAmework Toolkit allows users to create semantic relations between two (source and target) ontologies, and apply such relations in translating source ontology instances into target ontology instances
- OntoEngine is a step toward allowing agents to communicate even though they use different formal languages (i.e., different ontologies). It translates data from a “source” ontology to a “target”
- OWLS-MX is a hybrid semantic Web service matchmaker. OWLS-MX 1.0 utilizes both description logic reasoning, and token based IR similarity measures. It applies different filters to retrieve OWL-S services that are most relevant to a given query
- RiMOM (Risk Minimization based Ontology Mapping) integrates different alignment strategies: edit-distance based strategy, vector-similarity based strategy, path-similarity based strategy, background-knowledge based strategy, and three similarity-propagation based strategies
- semMF is a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties
- Snoggle is a graphical, SWRL-based ontology mapper. Snoggle attempts to solve the ontology mapping problem by providing a graphical user interface (similar to which of the Microsoft Visio) to guide the process of ontology vocabulary alignment. In Snoggle, user-defined mappings can be serialized into rules, which is expressed using SWRL
- Terminator is a tool for creating term to ontology resource mappings (documentation in Finnish).
Ontology Visualization/Analysis
Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.
- Social network graphing tools (many covered elsewhere)
- Cytoscape is a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data; I have also written specifically about Cytoscape’s use in UMBEL
- RDFScape is a project that brings Semantic Web “features” to the popular Systems Biology software Cytoscape
- NetworkAnalyzer performs analysis of biological networks and calculates network topology parameters including the diameter of a network, the average number of neighbors, and the number of connected pairs of nodes. It also computes the distributions of more complex network parameters such as node degrees, average clustering coefficients, topological coefficients, and shortest path lengths. It displays the results in diagrams, which can be saved as images or text files; used by SD
- Graphl is a tool for collaborative editing and visualisation of graphs, representing relationships between resources or concepts of the real world. Graphl may be thought of as a visual wiki, a place where everybody can contribute to a shared repository of knowledge
- <New>Graphviz is open source graph visualization software. It has several main graph layout programs. It also has web and interactive graphical interfaces, and auxiliary tools, libraries, and language bindings.
- <New>GrOWL is an ontology visualizer and editor. The layout of the GrOWL graph can be defined automatically or loaded from a separate style sheet. GrOWL implements configurable filters that can transform the display by simplifying it, hiding concepts and relationships that have no descriptions associated, or perform more complex translations. Concepts can be stored in ontologies with extensive annotations to provide documentation. GrOWL shows these annotation as tooltips, and supports complex HTML and links within them. The GrOWL browser can be used inside a web browser or as a stand-alone application. When used inside a browser, it supports Javascript interaction so that it can be used as a concept chooser with implementation-defined operations.
- igraph is a free software package for creating and manipulating undirected and directed graphs
- Network Workbench is a very complex, comprehensive; Swiss Army Knife
- NetworkX – Python; very clean
- <New>OntoGraf, a Protege 4 plug-in, gives support for interactively navigating the relationships of your OWL ontologies. Various layouts are supported for automatically organizing the structure of your ontology. Different relationships are supported: subclass, individual, domain/range object properties, and equivalence. Relationships and node types can be filtered.
- <New>OWL2Prefuse is a Java package which creats Prefuse graphs and trees from OWL files (and Jena OntModels). It takes care of converting the OWL data structure to the Prefuse datastructure. This makes it is easy for developers, to use the Prefuse graphs and trees into their Semantic Web applications.
- <New>RDF Gravity is a tool for visualising RDF/OWL Graphs/ ontologies. RDF Gravity is implemented by using the JUNG Graph API and Jena semantic web toolkit. Its main features are:
- Graph Visualization
- Global and Local Filters (enabling specific views on a graph)
- Full text Search
- Generating views from RDQL Queries
- Visualising multiple RDF files
- <Newest> SKOS Reader is a SKOS browser and an HTML renderer of SKOS thesauri and terminologies that can display a SKOS file hierarchically, alphabetically, or permuted. Commercial; from Mondeca
- Stanford Network Analysis Package (SNAP) is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes
- Social Networks Visualizer (SocNetV) is a flexible and user-friendly tool for the analysis and visualization of Social Networks. It lets you construct networks (mathematical graphs) with a few clicks on a virtual canvas or load networks of various formats (GraphViz, GraphML, Adjacency, Pajek, UCINET, etc) and modify them to suit your needs. SocNetV also offers a built-in web crawler, allowing you to automatically create networks from all links found in a given initial URL
- Tulip may be incredibly strong
- Springgraph component for Flex
- VizierFX is a Flex library for drawing network graphs. The graphs are laid out using GraphViz on the server side, then passed to VizierFX to perform the rendering. The library also provides the ability to run ActionScript code in response to events on the graph, such as mousing over a node or clicking on it.
- <New>VUE (Visual Understanding Environment) is an open source project focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information.
- <New>yEd is a diagram editor that can be used to quickly and effectively generate high-quality drawings of diagrams. It can support OWL imports.
- <New>ZGRViewer is a graph visualizer implemented in Java and based upon the Zoomable Visual Transformation Machine. It is specifically aimed at displaying graphs expressed using the DOT language from AT&T GraphViz and processed by programs dot, neato or others such as twopi. ZGRViewer is designed to handle large graphs, and offers a zoomable user interface (ZUI), which enables smooth zooming and easy navigation in the visualized structure.
Miscellaneous Ontology Tools
- Apolda (Automated Processing of Ontologies with Lexical Denotations for Annotation) is a plugin (processing resource) for GATE (http://gate.ac.uk/). The Apolda processing resource (PR) annotates a document like a gazetteer, but takes the terms from an (OWL) ontology rather than from a list
- <Newest>CA Manager supports customized workflows for semantic annotation of content. Commercial; from Mondeca
- <New>Gloze is a XML to RDF, RDF to XML, and XSD to OWL mapping tool based on Jena; see also http://jena.hpl.hp.com/juc2006/proceedings/battle/paper.pdf . See also http://jena.sourceforge.net/contrib/contributions.html
- <New>Hoolet is an implementation of an OWL-DL reasoner that uses a first order prover. The ontology is translated to collection of axioms (in an obvious way based on the OWL semantics) and this collection of axioms is then given to a first order prover for consistency checking.
- LexiLink is a tool for building, curating and managing multiple lexicons and ontologies in one enterprise-wide Web-based application. The core of the technology is based on RDF and OWL
- mopy is the Music Ontology Python library, designed to provide easy to use python bindings for ontology terms for the creation and manipulation of music ontology data. mopy can handle information from several ontologies, including the Music Ontology, full FOAF vocab, and the timeline and chord ontologies
- OBDA (Ontology Based Data Access) is a plugin for Protégé aimed to be a full-fledged OBDA ontology and component editor. It provides data source and mapping editors, as well as querying facilities that, in sum, allow you to design and test every aspect of an OBDA system. It supports relational data sources (RDBMS) and GLAV-like mappings. In its current beta form, it requires Protege 3.3.1, a reasoner implementing the OBDA extensions to DIG 1.1 (e.g., the DIG server for QuOnto) and Jena 2.5.5
- <New>oBrowse is a web based ontology browser developed in java. oBrowse parses OWL files of an ontology and displays ontology in a tree view. Protege-API, JSF are used in development
- OntoComP is a Protégé 4 plugin for completing OWL ontologies. It enables the user to check whether an OWL ontology contains “all relevant information” about the application domain, and extend the ontology appropriately if this is not the case
- Ontology Browser is a browser created as part of the CO-ODE (http://www.co-ode.org/) project; rather simple interface and use
- Ontology Metrics is a web-based tool that displays statistics about a given ontology, including the expressivity of the language it is written in
- <New>OntoLT aims at a more direct connection between ontology engineering and linguistic analysis. OntoLT is a Protégé plug-in, with which concepts (Protégé classes) and relations (Protégé slots) can be extracted automatically from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé. Only available for older Protégé versions
- OntoSpec is a SWI-Prolog module, aiming at automatically generating XHTML specification from RDF-Schema or OWL ontologies
- OWL API is a Java interface and implementation for the W3C Web Ontology Language (OWL), used to represent Semantic Web ontologies. The API is focused towards OWL Lite and OWL DL and offers an interface to inference engines and validation functionality
- OWL Module Extractor is a Web service that extracts a module for a given set of terms from an ontology. It is based on an implementation of locality-based modules that is part of the OWL API.
- OWL Syntax Converter is an online tool for converting ontologies between different formats, including several OWL syntaxes, RDF/XML, KRSS
- OWL Verbalizer is an on-line tool that verbalizes OWL ontologies in (controlled) English
- OwlSight is an OWL ontology browser that runs in any modern web browser; it’s developed with Google Web Toolkit and uses Gwt-Ext, as well as OWL-API. OwlSight is the client component and uses Pellet as its OWL reasoner
- Pellint is an open source lint tool for Pellet which flags and (optionally) repairs modeling constructs that are known to cause performance problems. Pellint recognizes several patterns at both the axiom and ontology level.
- PROMPT is a tab plug-in for Protégé is for managing multiple ontologies by comparing versions of the same ontology, moving frames between included and including project, merging two ontologies into one, or extracting a part of an ontology
- <New>ReDeFer is a compendium of RDF-aware utilities organised in a set of packages: RDF2HTML+RDFa: render a piece of RDF/XML as HTML+RDFa; XSD2OWL: transform an XML Schema into an OWL Ontology; CS2OWL: transform a MPEG-7 Classification Scheme into an OWL Ontology; XML2RDF: transform a piece of XML into RDF; and RDF2SVG: render a piece of RDF/XML as a SVG showing the corresponding graph
- SegmentationApp is a Java application that segments a given ontology according to the approach described in “Web Ontology Segmentation: Analysis, Classification and Use” (http://www.co-ode.org/resources/papers/seidenberg-www2006.pdf)
- SETH is a software effort to deeply integrate Python with Web Ontology Language (OWL-DL dialect). The idea is to import ontologies directly into the programming context so that its classes are usable alongside standard Python classes
- SKOS2GenTax is an online tool that converts hierarchical classifications available in the W3C SKOS (Simple Knowledge Organization Systems) format into RDF-S or OWL ontologies
- SpecGen (v5) is an ontology specification generator tool. It’s written in Python using Redland RDF library and licensed under the MIT license
- Text2Onto is a framework for ontology learning from textual resources that extends and re-engineers an earlier framework developed by the same group (TextToOnto). Text2Onto offers three main features: it represents the learned knowledge at a metalevel by instantiating the modelling primitives of a Probabilistic Ontology Model (POM), thus remaining independent from a specific target language while allowing the translation of the instantiated primitives
- Thea is a Prolog library for generating and manipulating OWL (Web Ontology Language) content. Thea OWL parser uses SWI-Prolog’s Semantic Web library for parsing RDF/XML serialisations of OWL documents into RDF triples and then it builds a representation of the OWL ontology
- TONES Ontology Repository is primarily designed to be a central location for ontologies that might be of use to tools developers for testing purposes; it is part of the TONES project
- Visual Ontology Manager (VOM) is a family of tools enables UML-based visual construction of component-based ontologies for use in collaborative applications and interoperability solutions.
- Web Ontology Manager is a lightweight, Web-based tool using J2EE for managing ontologies expressed in Web Ontology Language (OWL). It enables developers to browse or search the ontologies registered with the system by class or property names. In addition, they can submit a new ontology file
- RDF evoc (external vocabulary importer) is an RDF external vocabulary importer module (evoc) for Drupal caches any external RDF vocabulary and provides properties to be mapped to CCK fields, node title and body. This module requires the RDF and the SPARQL modules.
Not Apparently in Active Use
- ActiveOntology is a library, written in Ruby, for easy manipulation of RDF and RDF-Schema models, thru a dynamic DSL based on Ruby idiom
- Almo is an ontology-based workflow engine in Java supporting the ARTEMIS project; part of the OntoWare initiative
- ClassAKT is a text classification web service for classifying documents according to the ACM Computing Classification System
- Elmo provides a simple API to access ontology oriented data inside a Sesame RDF repository. The domain model is simplified into independent concerns that are composed together for multi-dimensional, inter-operating, or integrated applications
- ExtrAKT is a tool for extracting ontologies from Prolog knowledge bases.
- F-Life is a tool for analysing and maintaining life-cycle patterns in ontology development.
- Foxtrot is a recommender system which represents user profiles in ontological terms, allowing inference, bootstrapping and profile visualization.
- HyperDAML creates an HTML representation of OWL content to enable hyperlinking to specific objects, properties, etc.
- LinKFactory is an ontology management tool, it provides an effective and user-friendly way to create, maintain and extend extensive multilingual terminology systems and ontologies (English, Spanish, French, etc.). It is designed to build, manage and maintain large, complex, language independent ontologies.
- LSW – the Lisp semantic Web toolkit enables OWL ontologies to be visualized. It was written by Alan Ruttenberg
- OntoClassify is a system for scalable classification of text into large topic ontologies currently including DMoz and Inspec. The system is available as Web service. The software runs under Windows platform.
- Ontodella is a Prolog HTTP server for category projection and semantic linking
- OntoWeaver is an ontology-based approach to Web sites, which provides high level support for web site design and development
- OWLLib is a PHP library for accessing OWL files. OWL is w3.org standard for storing semantic information
- pOWL is a Semantic Web development platform for ontologies in PHP. pOWL consists of a number of components, including RAP
- ROWL is the Rule Extension of OWL; it is from the Mobile Commerce Lab in the School of Computer Science at Carnegie Mellon University
- Semantic Net Generator is a utlity for generating Topic Maps automatically from different data sources by using rules definitions specified with Jelly XML syntax. This Java library provides Jelly tags to access and modify data sources (also RDF) to create a semantic network
- SMORE is OWL markup for HTML pages. SMORE integrates the SWOOP ontology browser, providing a clear and consistent way to find and view Classes and Properties, complete with search functionality
- SOBOLEO is a system for Web-based collaboration to create SKOS taxonomies and ontologies and to annotate various Web resources using them
- SOFA is a Java API for modeling ontologies and Knowledge Bases in ontology and Semantic Web applications. It provides a simple, abstract and language neutral ontology object model, inferencing mechanism and representation of the model with OWL, DAML+OIL and RDFS languages; from java.dev
- WebScripter is a tool that enables ordinary users to easily and quickly assemble reports extracting and fusing information from multiple, heterogeneous DAMLized Web sources.
[1] This
listing is maintained on a permanent basis on the
OpenStructs‘
TechWiki.
August 23, 2010 05:28 AM
A JNA-based wrapper for Tesseract OCR DLL, the library provides
optical character recognition (OCR) support for:
* TIFF, JPEG, GIF, PNG, and BMP image formats
* Multi-page TIFF images
* PDF document format
[link]
August 23, 2010 02:35 AM
Our hearts are heavy today, having learned of the passing yesterday morning of our beloved colleague, Public Knowledge Staff Attorney Adam Thomas. Adam was a rare individual in this town - willing to take on any task no matter how small, always upbeat, eager for feedback be it positive or negative. But what really set Adam apart was his courage. Just 30 years old and thrice afflicted with Medulloblastoma - a rare and highly malignant form of brain cancer - he fought and beat it each time, until it returned a fourth time just a few weeks ago with a force too strong to overcome.
read more
August 23, 2010 12:43 AM
August 21, 2010
The analogue to an Object-Relational Mapper for RDF. Helping to make OWL Description Logic accessible from Python in a way that will seem familiar to people who are accustomed to things like SQLAlchemy and Django. http://packages.python.org/ordf/odm.html
Share This
Related posts:CKAN 0.7 ReleasedORDF [...]
Related posts:
- CKAN 0.7 Released
- ORDF - the OKFN RDF Library
- KForge v0.16 Released
August 21, 2010 02:42 PM
We’re delighted to see that the data.gov.uk folks have released the code for their CKAN Drupal module. As many will know, the OKF’s CKAN powers data.gov.uk as well as over a dozen other data catalogues around the world.
From the blog post:
As part of the government’s ongoing work around transparency, today we are releasing some of [...]
Related posts:
- Canadian citizen-driven data catalogue datadotgc.ca is powered by CKAN
- Data.gov.uk goes public - and its using CKAN!
- Data.gov.uk Launched - and it’s Using CKAN
August 21, 2010 12:13 PM
I try to avoid making meta-posts, but the timing here was just too impeccable for me to avoid a short post on something that's been bothering me for a year or so.
I actually complete agree with both points. The problem is that I worry that they are actually fairly opposed. I comment
much less on other people's blogs now that I use reader, because the 10 second overhead of clicking on the blog, being redirected, entering a comment, blah blah blah, is just too high. Plus, I worry that no one (except the blog author) will see my comment, since most readers don't (by default) show comments in with posts.
Hopefully the architects behind readers will pick up on this and make these things (adding and viewing comments, within the reader -- yes, I realize that it's then not such a "reader") easier. That is, unless they want to lose out to tweets!
Until then, I'd like to encourage people to continue commenting here.
August 21, 2010 12:49 PM
Is it possible for Tesseract to make ocr with languages put in ordered
set? I have lots of text to ocr consisting primarily of lang1, with
small portions in lang2 and lang3 (quotes and refs). It would be ideal
for Tesseract to recognise "what it can" in lang1 (e.g., to 90%
match), then switch to the lang2 for the unmatched, then to lang3.
August 21, 2010 09:12 AM
Dear all, I am working on a project that badly needed a 3.0 release to
support the image conversion to Chinese. I am wondering if anyone know
the release date of 3.0? Will it release before the end of the year?
Any information is greatly appreciated.
Maggie.
August 21, 2010 03:51 AM
Hi, I’m Alolita Sharma, and I’ve recently started working at the Wikimedia Foundation to help program-manage usability and feature-related software development.
I wanted to send everyone an update on Phase V of the Usability Initiative Rollout. This is the final phase of the rollout and we are planning to deploy the usability features (the new “Vector” skin and enhanced editing features) to all remaining projects that have not yet been switched. The release date has been set for Sep 1, 2010 at 10am PDT / 5pm UTC.
In preparation for the release, we’re doing (among other things) a push to identify and fix critical blockers. We’re running a Central Notice on all remaining projects asking for your help to facilitate the effort by testing gadgets, extensions, and custom scripts on Vector. We’d also like to ask readers of this blog to contribute as well. If you’re working on one of the Phase V projects (that is, if your project is still showing the “Monbook” skin by default), please help us identify blockers by trying the beta and posting bugs either in Bugzilla (file under “Usability Initiative”) or our bug report page.
We’ve also created an Ambassadors mailing list (Wikitech-ambassadors) for anyone interested in helping coordinate or follow-up on release activities. We will also be available on the newly created #wikimedia-dev IRC channel to respond to any questions or feedback.
To give feedback on the rollout process, please leave a comment here.
– Alolita Sharma, Features Engineering Program Manager, Wikimedia Foundation
August 21, 2010 12:03 AM
August 20, 2010
While some ISPs are busy arguing to the FCC that the First Amendment makes net neutrality rules illegal, Congress is considering a bill (HR 3817) that would exempt ISPs from liability for providing fraudulent information to their customers. ISPs, of course, love this. Limitations from liability are great!
read more
August 20, 2010 08:07 PM
When Federal Communications Commissioner (FCC) Michael Copps issued a brief, two-sentence reaction to the news of a policy agreement between Verizon and Google over Net Neutrality, he deliberately emphasized one word. In bold face and italics, Copps said that a “decision” had to be made, to guarantee an open Internet.
"Some will claim this announcement moves the discussion forward. That’s one of its many problems. It is time to move a decision forward—a decision to reassert FCC authority over broadband telecommunications, to guarantee an open Internet now and forever, and to put the interests of consumers in front of the interests of giant corporations.”
read more
August 20, 2010 07:50 PM
We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows:
When? 1st September 2010
Where? Fjord Office, Friedrichstrasse 210, Berlin
Register? You can register here!
Speakers will include:
Martin Belam, The Guardian
Jonathan Gray, The Open Knowledge Foundation
Christian Heise, ZEIT Online
Gerd [...]
Related posts:
- Data Driven Journalism, Amsterdam, 24th August 2010
- Open Everything Berlin + CC Salon Berlin
- Slides and notes from Data Driven Journalism event
August 20, 2010 05:58 PM
In May, Mozilla and the Shuttleworth Foundation announced a new Education for the Open Web Fellowship. The aim is to support practical ideas that help people learn about, improve and promote the open nature of the internet, as part of our commitment to supporting leaders working at the intersection of open education and the open web.
read more
August 20, 2010 03:45 PM
I occassionally suspect my colleagues in the Public Interest community lack a sense of humor -- although perhaps it is simply that I am in a more relaxed frame of mind after my annual vacation from the 21st Century. I am neither surprised nor outraged at the recent news that members of the Information Technology Industry Council (ITIC) are picking up where the FCC "secret meetings" left off and trying to come up with a net neutrality consensus framework. To me, it seems rather sad and funny. My only surprise is that even in Washington, the notion of an industry trade association working with its members is anything unusual or significant. I mean, that's what industry trade associations do after all.
read more
August 20, 2010 02:36 PM
Using tesseract 3.00 on Opensuse 11.2. From CLI as in
tesseract file.tif file
In an image that contains a line of '=' signs the recognition is much
worse than if these lines are removed, eg:
line 1 and stuff
=======================
line 3 and stuff
line 1 will be recognized, but the second and third lines will be
August 20, 2010 11:53 AM
New Features
- Multiple library support: Various improvements to make using multiple calibre libraries easier.
- Content server: Allow setting a restriction so that the server shares only some of the books in the library.
- Speed up metadata editing. Small speed up for single book editing and major speedup for bulk editing.
- Drivers for the Kogan and Spectra e-book readers and the Samsung Captivate
- Allow calibredb to manage saved searches stored in the library.
- Add a tweak to automatically connect to a folder on startup. Accessible via Preferences->Advanced->Tweaks
- You can specify a restriction based on a saved search to be applied on calibre startup
- All actions in toolbar/context menus have been refactored to become plugins
Bug Fixes
- Content server: Fix Saved Search and User Category handling in the OPDS feeds.
- Fix regression that broke reading covers from CBR files
- Fix regression in 0.7.13 that broke Comic Input when image output format was set to JPEG
- Fix Comic Input default settings not being used when bulk converting comics
- SONY driver: Fix series order being lost when metadata management is set to manual
- Fix behavior of Tag Browser and search restictions when switching libraries
- Do not allow the user to override the default tweaks or the hyphenate javascript. Also if a file is not found, do not use the user location as the default base.
- Catalog generation: Changed default regex for genre tags to allow punctuation within genre tags.
- Linux environment: Use a temporary dir as the config directory if write access to the normal config directory is unavailable. Can be overriden by using the CALIBRE_CONFIG_DIRECTORY environment variable
- Jobs window now remebers its size and can be launched by a keyboard shortcut (Alt+Shift+J)
- Fixing regression that broke clicking on links in the Book Details window
- Parallel job management: Do not allow new jobs to start when all cores are used.
- Fix a bug that could cause the jobs window to show details for the wrong job
- Workaround for PyQt4/util-linux conflict on gentoo
August 20, 2010 06:00 AM
August 19, 2010
One of my best friends' parents both became very ill this year. Her mother, 87, elected to have a feeding tube inserted permanently. She is confined to her bed, alone much of the time, and in constant pain, waiting for the inevitable end, which thanks to the feeding tube may be many miserable months ahead. Her father, 90, elected to enter a hospice facility where he spent his last three weeks eating yogurt, sipping the occasional last whiskey, and having long wonderful visits with his three children, their spouses and his beloved grown grandchildren. By all accounts it was a very good death.
Thinking about my friend's parents makes we wonder why their couldn't be a "hospice" option for publishers, many of whom -- my low-end guess is at least 50% -- won't survive the transition from print to networked screens. If a publisher doesn't have the requisite vision, desire and resources to embrace digital, what's wrong with saying, "Gee, it's been a great 25, 50, 100-year run. Instead of beating our heads against a wall and dying an ugly death, why don't we go out in style." Once this difficult decision is arrived at, it would be a matter of selling the assets that can be sold, providing staff with generous severance and really helping them to find new jobs, and then at the very end giving some wonderful parties, celebrating the end of an era. A death with integrity and dignity intact.
Please understand that I make this suggestion with huge love and respect for publishers. At their best they have played a crucial role in the complex discourse that moves society forward. Like a beloved parent, there's no reason why they should suffer more than necessary at the end of a full and productive life.
August 19, 2010 11:25 PM
Brett Gaylor at WebMadeMovies has posted an HTML5 demo of popcorn.js, “a javascript library for manipulating open video on the web.” The demo plays a video while using semantic data in the video to trigger machine-translated subtitles, map lookups, Twitter feeds and other elements on the page. If you’re using a WebM-enabled browser the page serves a WebM video, otherwise it serves an Ogg or MP4 video depending on the browser's capabilities.
See Brett’s post or the popcorn.js wiki page for more info. You can also download the source from the Mozilla github repo.
August 19, 2010 05:19 PM
When we started the WebM project, one of our goals was to promote rapid innovation in video technology through open development. Just two months after WebM debuted, Jason Garret Glaser, Ronald Bultje and David Conrad created a VP8 video decoder implementation for FFmpeg called ffvp8.
The ffvp8 implementation decodes even faster than the WebM Project reference implementation (libvpx), and we congratulate the FFmpeg team on their achievement. It illustrates why we open-sourced VP8, and why we believe the pace of innovation in open web video technology will accelerate.
August 19, 2010 05:17 PM
It is almost an unspoken assumption in multitask learning (and domain adaptation) that you use the same type of classifier (or, more formally, the same hypothesis class) for all tasks. In NLP-land, this usually means that everything is a linear classifier, and the feature sets are the same for all tasks; in ML-land, this usually means that the same kernel is used for every task. In neural-networks land (ala Rich Caruana), this is enforced by the symmetric structure of the networks used.
I probably would have gone on not even considering this unspoken assumption, until a few years ago I saw a couple papers that challenged it, albeit indirectly. One was Factorizing Complex Models: A Case Study in Mention Detection by Radu (Hans) Florian, Hongyan Jing, Nanda Kambhatla and Imed Zitouni, all from IBM. They're actually considering solving tasks separately rather than jointly, but joint learning and multi-task learning are very closely related. What they see is that different features are useful for spotting entity spans, and for labeling entity types.
That year, or the next, I saw another paper (can't remember who or what -- if someone knows what I'm talking about, please comment!) that basically showed a similar thing, where a linear kernel was doing best for spotting entity spans, and a polynomial kernel was doing best for labeling the entity types (with the same feature sets, if I recall correctly).
Now, to some degree this is not surprising. If I put on my feature engineering hat, then I probably would design slightly different features for these two tasks. On the other hand, coming from a multitask learning perspective, this is surprising: if I believe that these tasks are related, shouldn't I also believe that I can do well solving them in the same hypothesis space?
This raises an important (IMO) question: if I want to allow my hypothesis classes to be different, what can I do?
One way is to punt: you can just concatenate your feature vectors and cross your fingers. Or, more nuanced, you can have some set of shared features and some set of features unique to each task. This is similar (the nuanced version, not the punting version) to what Jenny Finkel and Chris Manning did in their ACL paper this year, Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data.
An alternative approach is to let the two classifiers "talk" via unlabeled data. Although motivated differently, this was something of the idea behind my EMNLP 2008 paper on Cross-Task Knowledge-Constrained Self Training, where we run two models on unlabeled data and look for where they "agree."
A final idea that comes to mind, though I don't know if anyone has tried anything like this, would be to try to do some feature extraction over the two data sets. That is, basically think of it as a combination of multi-view learning (since we have two different hypothesis classes) and multi-task learning. Under the assumption that we have access to examples labeled for both tasks simultaneously (i.e., not the settings for either Jenny's paper or my paper), then one could do a 4-way kernel CCA, where data points are represented in terms of their task-1 kernel, task-2 kernel, task-1 label and task-2 label. This would be sort of a blending of CCA-for-multiview-learning and CCA-for-multi-task learning.
I'm not sure what the right way to go about this is, but I think it's something important to consider, especially since it's an assumption that usually goes unstated, even though empirical evidence seems to suggest it's not (always) the right assumption.
August 19, 2010 02:09 PM
Announcement below — voting ends 27 August
Raw Data Now: Building an Open Data Ecosystem
Rufus Pollock and Jordan Hatcher of the Open Knowledge Foundation have submitted a proposal for a workshop highlighting the great work of the Open Knowledge Foundation, including Where Does My Money Go?, Open Shakespeare, CKAN, the Open Definition, and Open Data Commons [...]
Related posts:
- Opening Up Government Data: Give it to Us Raw, Give it to Us Now
- Data Driven Journalism, Amsterdam, 24th August 2010
- Vote for ‘Where Does My Money Go?’ at the Show Us A Better Way poll!
August 19, 2010 10:59 AM
Dear Sir or Madam,
I would like to know which revision of tesseract 3.0 is recommendable
to use under win7 64bit for OCR purposes at the moment? I have
recently tried several revisions: I compiled them with VS2008 in
release mode and tested the OCR functionality by running tesseract.exe
with the tif images attached to the source code. Without more ado
August 19, 2010 10:23 AM
Posted August 13, on his blog (http://www.frankwspencer.com/)"As part of the Global Text Project, Kelvin Seifert and Rosemary Sutton have written Educational Psychology: Second Edition. It is a textbook, covering such topics as student development, diversity, special needs, classroom management, instructional methods, assessment, and teaching thinking skills. It is written for teachers. I'll
August 19, 2010 08:44 AM
I'm trying to build my own language model by extending the default one
at /usr/local/share/ocropus/model s/default.fst. Following the example
of ocropus-linefst and fstutils, I'm doing the following:
fst = openfst.StdVectorFst.Read("/us r/local/share/ocropus/models/
default.fst")
filenames = glob.glob("training/*.gt.txt")
August 19, 2010 07:06 AM
August 18, 2010
I’m very happy to announce that Brett Gaylor officially joined the Mozilla Drumbeat team earlier this month. He’ll be playing the role of project producer — leading his own Web Made Movies project and helping to find new Drumbeat projects over time. Brett will also be directing a documentary series about Mozilla and the future of the web.
Photo: CC-BY, Joi Ito
Brett’s already made great strides setting up the Web Made Movies lab initiative with Seneca College. The idea is to get filmmakers and web developers collaborating on new tech tools that shape what cinema will look like on the open web. The first project coming out of this lab is popcorn.js, which was demo’ed in early alpha at Whistler. A polished version of that demo is here:

Also, Brett has started work on a documentary where Mozillians will paint a picture of the open web that we’re building. He interviewed about a dozen people at Whistler and has a number of other shoots set up. Footage and a call for participation will start leaking out through the fall, with first episodes or edited clips coming by the end of the year.
For those haven’t heard of Brett before: he is the director of RIP: A Remix Manifesto, an awesome film on copyright and culture that has been broadcast in over 20 countries and seen by millions He also founded OpenSourceCinema.org, an experiment in applying open source principles to filmmaking which was used to get thousands of people to contribute to the making of RIP. In many ways, Web Made Movies is a continuation of the Open Source Cinema experiment.
Filed under: drumbeat, mozilla, webmademovie

August 18, 2010 09:29 PM
Allergy, Asthma and Clinical Immunology has published its first thematic series, reviewing the current consensus on the treatment of the potentially fatal condition, hereditary angioedema.
Hereditary angioedema is a rare genetic disease that causes the rapid swelling of the limbs, face, intestinal tract, larynx or trachaea. The disease, which affects 1 in 50,000 people globally, is caused when a protein called C1 inhibitor is either deficient or non-functional. The symptoms of the disease cannot be controlled by conventional treatment with antihistamines or corticosteroids, and can lead to sudden death. The thematic series reviews the current international approach to the diagnosis, treatment and management of the disease. This includes investigating the management of the disease in children, which represents 50 % of clinical cases, and in women, who are more susceptible to the symptoms because of hormonal factors. The series also incorporates a comprehensive review of past, current and potential therapies for the disease.
The articles in the series were presented at the Toronto Consensus meeting organized by the Canadian Hereditary Angioedema Network, the Canadian Society of Allergy and Clinical Immunology and the University of Calgary. A final consensus document outlining the current global guidelines for the management of hereditary angioedema was agreed and authored by scientists who attended the meeting in Toronto.
August 18, 2010 08:17 AM
As has been reported earlier, W3C held an "RDF Next Steps" workshop in June 2010 and has published the Report of the Workshop in early July. That workshop discussed the possibility of an RDF Working Group. The overall goal would be to extend RDF to include some of the features that the community has identified as both desirable and important for interoperability based on experience with the 2004 version of the standard, but without having a negative effect on existing deployment efforts.
The Workshop has listed a number of work items that might be of interest for such a Working Group, and has also conducted an informal poll as for the relative priority of those items (with links to the detailed description of the items themselves). As a next step, a public questionnaire has been created listing, essentially, those items (although some of them have been regrouped for a better readability). The goal of the questionnaire is to poll the Web community at large so that the upcoming charter would reflect the real needs for the years to come.
So… if you are interested in the evolution of RDF, here is the chance to make your opinion heard. All the results of the questionnaire will be public. The questionnaire will stay open until the 13th of September.
August 18, 2010 04:57 AM
August 17, 2010
We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows:
Where? Rooms 108/108a, FU Berlin, Garystr. 21, 14195 Berlin
When? 7th October 2010
Registration? http://publicdomain.eventbrite.com/
Hashtag? #pdobd
Notes? http://okfnpad.org/pdobd
Here’s the blurb:
This one day workshop will focus on open bibliographic data and the public domain. In particular it [...]
Related posts:
- Open bibliographic data promotes knowledge of the public domain
- Which works fall into the public domain in 2010?
- Public Domain Calculators at Europeana
August 17, 2010 05:45 PM
Converge, an online magazine focusing on technology in education, has published a feature article about VuFind. Past, present and future of the project are discussed, and several key players are quoted. Take a look here: http://bit.ly/aDhRXe .
August 17, 2010 04:42 PM
There are tons of ways to get, books, articles, web pages, and any other kind of item into Zotero. So many, in fact, that we thought we needed this to make this short screencast. It covers six ways to get things into Zotero. You might just be surprised at how many ways there are to [...]
August 17, 2010 03:22 PM
Posted by Archi Sarkar, Google Books Online Team
If you thought you knew everything there is to know about chocolate, think again! This world famous decadent dessert certainly has some dark secrets of its own - a treasure that has been enriched over the past three centuries. Try the following trivia and sharpen your knowledge of the indulgent, yet exquisite confection. Check out the links and learn more about your favorite sweet on Google Books.

(Photo by Suat Eman)
Q: Which ancient civilizations were the first to discover chocolate?
A. The Aztecs and the Mayans of Central America - (The taste of chocolate has only been perfected ever since.)
Q. Where is the world’s largest chocolate museum?
A. Cologne Chocolate Museum in Germany - (Here’s where the flavours are immortalized.)
Q. In which city was the world’s largest chocolate sculpted?
A. Milan, Italy. In May, 2010, Italian chocolatier, Mirco Della Vecchia sculpted a 1.5 meters tall, Dome of Milan, to bag the Guinness World Record for the largest ever chocolate art. (Beat that!)
Q. Where is the world’s largest chocolate factory?
A. No, it’s neither Willy Wonka’s nor Charlie’s chocolate factory. It's Hershey's, in Pennsylvania.

(In 1940, an emergency ration: a Hershey’s chocolate bar, served at Fort Myers. Photo: LIFE Magazine)
Q. In which city is Ghirardelli headquartered?
A. San Leandro, California (Did I hear San Fransisco? If yes, give yourself half a point, as it was first incorporated and formerly headquartered in San Francisco.)
Q. Which country is the largest consumer of chocolate?
A. Switzerland... Swiss Chocolate, anyone?
Q. Which country is the largest cocoa bean producer?
A. Côte d'Ivoire (44% of all the cocoa beans exported in the world come from this West African nation.)
Q. What is the scientific name for chocolate?
A. Theobroma cacao (Try saying that five times fast!)
Q. Name a beneficial health effect of chocolate?
A. Chocolate enhances the circulatory system. (Flavanoids in chocolate increase antioxidants in the blood, protecting against heart damage.)
Q. Name the author of the best-selling book, Chocolat, which was later made into a Hollywood blockbuster starring Juliette Binoche and Johnny Depp?
A. Joanne Harris (Why is it that the book is always better than the movie?)
Scores:
- 0-2: You’re a choco-novice!
- 3-5: You’re choco-connoisseur!
- 6-8: You’re a choco-guru!
- 9-10: You’re a choco-holic!
August 17, 2010 01:34 PM
Posted by Archi Sarkar, Google Books Online Team
Portrait of Emily Jane Brontë (Source: LIFE Magazine)
No coward soul is mine,
No trembler in the world's storm-troubled sphere:
I see Heaven's glories shine,
And faith shines equal, arming me from fear.
-- Emily Brontë
The indomitable spirit that defined the Yorkshire poet and novelist Emily Brontë also formed the very essence of the classic Wuthering Heights -- her only novel.
In an age when contemporary English society refused to take women’s contributions to literature seriously, Emily and her sisters, Charlotte and Anne, adopted ambiguous pen names to have their works published and accepted. In 1846, the Brontë sisters collaboratively published Poems by Currer, Ellis, and Acton Bell.
The Brontë sisters--Anne, Emily and Charlotte--painted by their brother Bramwell (Source: LIFE Magazine)
While Charlotte Brontë assumed the pseudonym Currer Bell and went on to write Jane Eyre, Anne Brontë settled for Acton Bell and produced Agnes Grey. Emily preferred to be called Ellis Bell in the first edition of Wuthering Heights, which was published in 1847.
And ever since, her creations of Heathcliff and Catherine have captivated audiences worldwide, making Emily Brontë not just a household name, but also a stalwart of romantic fiction. In combination, the courage and passion of her characters, the unusually innovative Gothic structure of her novel and the brilliance of her prose, enabled her to create one of the finest Romantic works.
Actors Merle Oberon and Laurence Olivier during filming of Wuthering Heights in 1939 (Source: LIFE Magazine)
Although Emily unfortunately succumbed to tuberculosis at the young age of 30, her spirit continues to live on through her works -- a tribute to her genius.
Here’s remembering you, Emily Brontë! Happy Birthday!
August 17, 2010 01:13 PM
On September 14th, a team of runners from BioMed Central will be taking part in a 10K race against our friends (and rivals) at Nature Publishing Group. BioMed Central’s team will be raising money for our partner charity Computer Aid International, which works to recycle computer equipment for use in developing countries.
You can support BioMed Central’s open access David as we take on the traditional publishing Goliath by sponsoring us via the BioMed Central team’s fundraising page.
Our plucky open access mascot turtle Gulliver is already in training, and he will be joined by around 15 BioMed Central staff, all of whom are aiming to complete the course in under an hour. For the latest updates on Gulliver’s progress, or to sponsor him, see his blog and/or Facebook page.
About Computer Aid and BioMed Central
Computer Aid International provides professionally refurbished computers for reuse in education, health and not-for-profit organizations in developing countries.
Computer Aid has provided over 170,000 PCs to where they are most needed in more than 100 countries across Africa and South America, and is the world's largest and most experienced ICT for Development provider.
BioMed Central has supported Computer Aid for some time, and the funds we have raised will be used to send a container-load of reconditioned computer equipment to Kenyatta University in Nairobi later this summer. You can also support Computer Aid by buying a BioMed Central journal T-shirt.
Read more about Computer Aid’s activities in this recent guest blog post by Computer Aid’s Stephen Campbell.
August 17, 2010 09:47 AM
August 16, 2010
2010-08-16, Further details have been added to the program and the description of the sessions at DC-2010, the tenth International Conference on Dublin Core and Metadata Applications, to be held in Pittsburgh, PA, USA, 20-22 October 2010. Additional details of the sessions of DCMI Communities and Task Groups will be posted on the DCMI mailing lists and Wikis. Online registration is open; early-bird discount is available until 10 September 2010.
August 16, 2010 11:59 PM
2010-08-16, This year, we will be offering presentation opportunities at DC-2010 for DCMI Partners. If your organization is interested to become a DCMI Partner and present your product or service that is built on Dublin Core metadata, please contact DCMI at info@dublincore.org with "Partnership" in the subject line.
August 16, 2010 11:59 PM
Today's interviewee for the OStatus interview is Tyler Gillies, the resident hacker at ReadWriteWeb.com.

Give us an overview of your software. What is it, and what does it do?
Tyler: I've been using StatusNet since the first day identi.ca was released. You can find me at Tyler or tjgillies.
My first implementation of OStatus was robin. You can find it at robin. It is a rails based app that implements all the main features of status.net (webfinger/salmon/pubsubhubbub, etc), however, it is not currently being actively maintained. Please feel free to fork and commit patches. I have a plan to re-implement robin using "upgraded" technology, probably nodejs and redis.

Why did you decide to implement social web federation?
Tyler: I chose to pick social web federation because I only had two choices. Either federate, or don't. I didn't want to live in a walled garden. (I own http://opengard.in)
What problems did you have?
Tyler: Honestly, the specs on salmon are a little weird, and it was frustrating back then, because status.net, me and cliqset.com were the only ones who actually had a working implementation of salmon, so there wasn't a big support community. Also the documentation for the ruby ssl library is almost non existent.
How can users try out OStatus in your software?
Tyler: I don't currently have a website up running robin, but if they are familiar with rails, they can download it and try to get it running themselves. I am currently working on a location based app that will probably end up using OStatus to federate messages.
Check out geoloqi, I am developing the nodejs server.
- Tyler
Over the next couple of weeks there will be more OStatus interviews posted right here, so stay tuned!
August 16, 2010 09:03 PM
P2PU is an initiative designed to promote direct teaching/learning opportunities. You can participate as a student by signing up for a course or as a teacher by designing and running a course.
Una Daly, Associate Director College Open Textbooks Collaborative has proposed a course that should interest anyone reading this blog: Adopting Open Textbooks.
http://wiki.p2pu.org/Adopting-Open-Textbooks
P2PU is certainly in the spirit of things “open.” Sign up for the course and learn more.
August 16, 2010 05:47 PM
Ethan Zuckerman is a senior researcher at the Berkman Center for Internet and Society at Harvard University. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists.
With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.
In 2000, Ethan founded Geekcorps, a technology volunteer corps that sends IT specialists to work on projects in developing nations, with a focus on West Africa. Previously Ethan helped found Tripod.com, one of the web’s first “personal publishing” sites. He blogs at http://ethanzuckerman.com/blog.
Register today for the Open Video Conference, October 1-2 in New York City!
Photo: dweinberger
August 16, 2010 02:27 PM
Ethan Zuckerman is a senior researcher at the Berkman Center for Internet and Society at Harvard University. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists.
With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.
In 2000, Ethan founded Geekcorps, a technology volunteer corps that sends IT specialists to work on projects in developing nations, with a focus on West Africa. Previously Ethan helped found Tripod.com, one of the web’s first “personal publishing” sites. He blogs at http://ethanzuckerman.com/blog.
Register today for the Open Video Conference, October 1-2 in New York City!
Photo: dweinberger
August 16, 2010 02:27 PM
The following guest post is from Stephen Hilton, Programme Lead of the Connecting Bristol initiative.
Unusually perhaps, for a city council, we recognise and relish the fact that our city is a quirky, unorthodox, hot-bed of creative digital activity and activism. Bristol City Council has been promoting local e-democracy for the last decade. And it [...]
Related posts:
- How to open up local data: notes from Warwickshire council
- Open Definition Advisory Council launched
- Talking at Open Up the City in Helsinki
August 16, 2010 10:58 AM

Contrasted with Some Observations on Linked Data
At the SemTech conference earlier this summer there was a kind of vuvuzela-like buzzing in the background. And, like the World Cup games on television, in play at the same time as the conference, I found the droning to be just as irritating.
That droning was a combination of the sense of righteousness in the superiority of linked data matched with a reprise of the “chicken-and-egg” argument that plagued the early years of semantic Web advocacy [1]. I think both of these premises are misplaced. So, while I have been a fan and explicator of linked data for some time, I do not worship at its altar [2]. And, for those that do, this post argues for a greater sense of ecumenism.
My main points are not against linked data. I think it a very useful technique and good (if not best) practice in many circumstances. But my main points get at whether linked data is an objective in itself. By making it such, I argue our eye misses the ball. And, in so doing, we miss making the connection with meaningful, interoperable information, which should be our true objective. We need to look elsewhere than linked data for root causes.
Observation #1: What Problem Are We Solving?
When I began this blog more than five years ago — and when I left my career in population genetics nearly three decades before that — I did so because of my belief in the value of information to confer adaptive advantage. My perspective then, and my perspective now, was that adaptive information through genetics and evolution was being uniquely supplanted within the human species. This change has occurred because humanity is able to record and carry forward all information gained in its experiences.
Adaptive innovations from writing to bulk printing to now electronic form uniquely position the human species to both record its past and anticipate its future. We no longer are limited to evolution and genetic information encoded in surviving offspring to determine what information is retained and moves forward. Now, all information can be retained. Further, we can combine and connect that information in ways that break to smithereens the biological limits of other species.
Yet, despite the electronic volumes and the potentials,