Free Software :: Free Culture & Archiving Planet

Free Culture projects:

Research links:

ToC

  1. EFF : Righthaven's Brand of Copyright Trolling
  2. EFF : EFF Asks Court to Protect Craigslist from Defamation Suit
  3. Linux Foundation : X Census (for 1.9)
  4. Public Knowledge : Public Knowledge Applauds FCC Proposals On Broadcast “White Spaces,” “E-Rate” Reform.
  5. BioMed OA : Join the data debate: draft position statement on open data
  6. Public Library of Science : PLoS Currents is expanding
  7. BioMed OA : sMRI – the most powerful Alzheimer’s disease biomarker?
  8. BioMed OA : BMC Research Notes – adding value to your data
  9. BioMed OA : Journal of Molecular Signaling welcomes new co-Editor-in-Chief
  10. Inside Google Book Search : The Armchair Traveler
  11. Linux Foundation : More GPL enforcement work again.. and a very surreal but important case
  12. Linux Foundation : The People Who Support Linux: At Work and at Home
  13. Information Aesthetics : US Open Tennis Real-Time Data Visualization
  14. Public Knowledge : Copps Displays FCC Leadership
  15. Public Library of Science : Announcing PLoS Blogs
  16. Wikimedia : September WMF Engineering Update
  17. Grassroots mapping : Oil contamination… from the Exxon Valdez
  18. Public Knowledge : Public Knowledge Expects ‘Prompt’ FCC Action To Protect Broadband Consumers
  19. Open Video Conference : VIDEO: a peek at our interview with Susan Crawford
  20. Google Research : Towards Energy-Proportional Datacenters
  21. Koha : 8 Weeks ’til KohaCon – Are you registered?
  22. Open Knowledge Foundation : The Power of Open Data
  23. Research Remix : Dear publisher, is the data open?
  24. Planet Linked Data : A New Methodology for Building Lightweight, Domain Ontologies
  25. Elphel : Initial OpenLayers mockup to display images
  26. Linux Foundation : Torvalds Causes Mob Scene at LinuxCon Brazil
  27. OStatus : OStatus 1.0 Draft 2 Available under OWFa
  28. Linux Foundation : Best practices in Open Source Governance at Open World Forum
  29. NLP : Online Learning Algorithms that Work Harder
  30. Open Video Conference : This Is Not a Hoax: The Yes Men at Open Video Conference
  31. Linux Foundation : Rise in use of EUPL for publishing open source software
  32. Linux Foundation : Free Open Source Academia Conference (fOSSa), November 8-10, Grenoble
  33. Linux Foundation : Novell Disappoints as Ownership Concerns Continue
  34. OpenSocial API : Eureka! Lockheed Martin contributes OpenSocial platform to open source
  35. Dublin Core Metadata : New Task Groups for revising the User Guide and reviewing the DCMI Abstract Model
  36. Dublin Core Metadata : NISO/DCMI Webinar slides published
  37. Tesseract : OCR of Screenshots
  38. EFF : Reading, Writing, and RFID Chips: A Scary Back-to-School Future in California
  39. Public Knowledge : The Intellectual Property Breakfast Club
  40. Wikimedia : Google Summer of Code conclusion
  41. Open Knowledge Foundation : Slides and notes from Data Driven Journalism event
  42. Public Knowledge : PK In the Know Podcast: Interview with WFMU's Ken Freedman
  43. Public Knowledge : Open Hardware Summit 2010
  44. Public Knowledge : The Digital Broadband Migration: The Dynamics of Disruptive Innovation
  45. Elphel : Elphel-Eyesis, assembled
  46. Science Commons : University Public Access Policy Whitepaper Part 2
  47. Open Video Conference : Announcing the Shared Film Festival at OVC
  48. Linux Foundation : Open Contact with Open Compliance Officers
  49. Linux Foundation : Software Freedom Law Center to Announce Opening of Branch in India
  50. Linux Foundation : Should Open Source Communities Avoid Contributor Agreements?
  51. Planet Linked Data : A Brief Survey of Ontology Development Methodologies
  52. Linux Foundation : Let a thousand flowers bloom...or be trampled under foot?
  53. EFF : Good News: Security Researcher Released on Bail
  54. Tesseract : Init() returning -1
  55. Koha : Book chapters proposals about Koha
  56. Tesseract : Tesseract Training Problem (under Mac)
  57. Mozilla Drumbeat : The reviews are in: Drumbeat’s “Popcorn” is tasty
  58. Tesseract : i want to add my own language
  59. NLP : Calibrating Reviews and Ratings
  60. Linux Foundation : Sun RPC is finally free software
  61. Mozilla Drumbeat : Mark Surman: 10 days of freedom in Barcelona
  62. Calibre : calibre 0.7.16
  63. EFF : Colbert's Word: Control-Self-Delete
  64. EFF : Facebook Should Stop Censoring Marijuana Legalization Campaign Ads
  65. OpenStreetView : OpenTrailView: Route making
  66. Mozilla Drumbeat : Drumbeat Festival: registration is now open!
  67. BioMed OA : Facilitating standardized genome annotations
  68. Public Knowledge : Public Knowledge Statement on GAO Cellular Industry Report
  69. Tesseract : math formulas
  70. BioMed OA : Arthritis Research & Therapy – published online only from 2011
  71. Global Text Project : Dr. Jim Feher, GTP textbook author, talks about working with Global Text Project to create and publish an open textbook
  72. EFF : Musopen Wants to Give Classical Music to the Public Domain
  73. EFF : EFF's Cindy Cohn Wins IP Vanguard Award from State Bar of California
  74. EFF : EFF Seeks to Help Righthaven Defendants
  75. if:book : open peer review
  76. WebM project : HTML5Rocks <video> tag tutorial
  77. Music Brainz : MusixMatch becomes our customer!
  78. Music Brainz : Track level Advanced Relationships for NGS
  79. Public Knowledge : Verizon Defense of Veroogle Plan Falls Short
  80. BioMed OA : Melatonin therapy effective in treating primary insomnia
  81. EFF : Jury Invalidates One of EFF's 'Most Wanted' Patents
  82. EFF : Steve Jobs Is Watching You: Apple Seeking to Patent Spyware
  83. EFF : UPDATED: Security Researcher Arrested for Refusing to Disclose Anonymous Source
  84. BioMed OA : Does genetic test allow prediction of patients’ response to tamoxifen?
  85. NLP : Finite State NLP with Unlabeled Data on Both Sides
  86. Planet Linked Data : Listing of 185 Ontology Building Tools
  87. Tesseract : Tess4J - a Java wrapper for Tesseract OCR DLL
  88. Public Knowledge : Appreciation: W. Adam Thomas, Public Knowledge Staff Attorney
  89. Open Knowledge Foundation : Beginnings of an Object Description Mapper
  90. Open Knowledge Foundation : Data.gov.uk releases CKAN Drupal Module
  91. NLP : Readers kill blogs?
  92. Tesseract : recognition languages sets? with hierarchy?
  93. Tesseract : Any idea of Tesseract 3.0 release date
  94. Wikimedia : Usability Improvements: Final Phase of Rollout
  95. Public Knowledge : ISPs Want to Have Their First Amendment Cake and Eat it Too
  96. Public Knowledge : The Incredible Shrinking FCC
  97. Open Knowledge Foundation : Data Journalism Meetup, Berlin, 1st September 2010
  98. Mozilla Drumbeat : Education for the open web fellowship: new deadline
  99. Public Knowledge : Why I'm Amused Rather Than Outraged Over New "Industry Negotiations" -- And What The Democrats Need To Understand
  100. Tesseract : Line of equals symbols not recognized
  101. Calibre : calibre 0.7.15
  102. if:book : hospice for publishers
  103. WebM project : WebM Semantic Video Demo
  104. WebM project : FFmpeg VP8 Decoder Implementation
  105. NLP : Multi-task learning: should our hypothesis classes be the same?
  106. Open Knowledge Foundation : Vote Raw Data Now at SXSW panelpicker - ends 27 August
  107. Tesseract : Which revision of tesseract 3.0 for win7 64bit
  108. Global Text Project : Frank W. Spencer PHD on GTP text Educational Psychology, a review
  109. OCRopus : Appending new ground truth to the default language model
  110. Mozilla Drumbeat : Mark Surman: Brett Gaylor joins Drumbeat team
  111. BioMed OA : Hereditary Angioedema: Management Consensus 2010 – a thematic series
  112. W3C Semantic Web : Public W3C Questionnaire on RDF Evolution
  113. Open Knowledge Foundation : Workshop on Open Bibliographic Data and the Public Domain
  114. VuFind : VuFind featured in Converge magazine
  115. Zotero : Zotero Basics: Getting Stuff Into Zotero
  116. Inside Google Book Search : Chocolate... in a nutshell!
  117. Inside Google Book Search : Happy Birthday, Emily Brontë!
  118. BioMed OA : BioMed Central to take on Nature in 10K charity run
  119. Dublin Core Metadata : Further details added to DC-2010 program
  120. Dublin Core Metadata : Presentation opportunities for DCMI Partners at DC-2010
  121. OStatus : OStatus interview with Tyler Gillies
  122. Open Knowledge Foundation : Gathering, Preserving and Reusing our Cultural Heritage - the OKFN Cultural Heritage Working Group.
  123. Open Text Book : P2PU
  124. Open Video Conference : Ethan Zuckerman of Berkman and Global Voices at OVC
  125. Mozilla Drumbeat : Open Video Alliance: Ethan Zuckerman of Berkman and Global Voices at OVC
  126. Open Knowledge Foundation : B-Open: Open Data from Bristol City Council
  127. Planet Linked Data : I Have Yet to Metadata I Didn’t Like
  128. Open Video Conference : Remixer Jonathan McIntosh at OVC
  129. Koha : Koha Newsletter: Volume 1/Issue 8: August 2010
  130. OpenStreetView : Minor update: panorama viewing now using canvas tag
  131. Elphel : Eyesis-in-Car GUI mockup
  132. Music Brainz : Please welcome our new Style Leader: Nikki
  133. VuFind : VuFind 1.0.1 Released
  134. Open Book Alliance : Betrayal Makes Strange Bedfellows
  135. Open Medicine : Open medicine is approved for MEDLINE indexing
  136. Open Knowledge Foundation : Open Government Data Camp 2010, 18-19th November 2010
  137. BioMed OA : Approaching technical hurdles to iPS technology
  138. Calibre : calibre 0.7.14
  139. Open Video Conference : Vincent Moon of La Blogotheque at Open Video Conference
  140. Mozilla Drumbeat : What can Mozilla Drumbeat learn from the Awesome Foundation?
  141. Open Medicine : What's new at Open Medicine? August 2010
  142. OpenStreetView : OpenTrailView: Improved photo upload and management
  143. Mozilla Drumbeat : Mark Surman: MozFdn July 2010 status update
  144. Mozilla Drumbeat : Mark Surman: Experiment: badges, identity and you
  145. Musopen : Free Textbook Project – Looking for Volunteers
  146. BioMed OA : Computer gamers solve medical problems
  147. BioMed OA : Sequencing of a tumor and its metastases
  148. OCRopus : How to use some features of Ocropus ?
  149. Open Book Alliance : Google and the Backroom
  150. Information Aesthetics : Infographic and Data Interface Videos: the Latest of the Greatest
  151. Planet Linked Data : DBpedia 3.5.1 available on Amazon EC2
  152. Open Video Conference : Jamie Wilkinson and Graffiti Markup Language at OVC
  153. Music Brainz : Care to take the Android app for a spin?
  154. Open Knowledge Foundation : Cataloguing Bibliographic Data with Natural Language and RDF
  155. WebM project : Easy Tricks for Finding WebM Videos in YouTube
  156. Planet Linked Data : An Executive Intro to Ontologies
  157. Elphel : SCINI Takes Elphel Under Antarctic Ice
  158. Wikimedia : Database errors on most Wikipedias
  159. AKSW Semantic Web : DL-Learner Build 2010-08-07 released
  160. OCRopus : OCRopus on CentOS 5
  161. BioMed OA : Eastern moles evolve different haemoglobin to facilitate fast tunnelling
  162. Calibre : calibre 0.7.13
  163. Open Video Conference : OVC volunteers meeting in New York
  164. OStatus : OStatus interview with Pablo Martin
  165. W3C Semantic Web : Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1 Published
  166. Inside Google Book Search : Books of the world, stand up and be counted! All 129,864,880 of you.
  167. Information Aesthetics : Fata Morgana: The World without a Map
  168. Open Medicine : Female doctors & students from "Across the World Unite"
  169. Music Brainz : Downtime: We need to tidy house and vacuum our database
  170. Open Video Conference : Amelia Andersdotter, Piratpartiet MEP at OVC

September 03, 2010

EFF.org Updates

Righthaven's Brand of Copyright Trolling

Copyright trolls are nothing new, and Righthaven is just the latest group of lawyers to try to turn copyright litigation into a business model. What these lawyers have in common is that they seek to take advantage of copyright's draconian damages in order to bully Internet users into forking over money. To anyone who has watched the file-sharing lawsuits of the last few years or the current BitTorrent cases brought by a DC law firm, the Righthaven saga is developing into a familiar, unfortunate story. It also has some especially troubling twists.

The basic pattern: Righthaven has brought over a hundred lawsuits in Nevada federal court claiming copyright infringement. They find cases by (a) scouring the Internet for parts of newspaper stories posted online by individuals, nonprofits, and others, (b) buying the copyright to that particular newspaper story, and then (c) proceeding to sue the poster for copyright infringement. Like the RIAA and USCG before them, Righthaven is relying on the fact that their victims may face huge legal bills through crippling statutory damages and the prospect of paying Righthaven's legal fees if they lose the case. Consequently, many victims will settle with Righthaven for a few thousand dollars regardless of their innocence, their right to fair use, or other potential legal defenses.

However, Righthaven is unlike other copyright trolls in some key ways:

Righthaven is claiming that its activities are intended to have a "deterrent effect" on the reposting of news stories online, but it's hard to resist viewing Righthaven's actions as purely business-related. In addition to the sharp legal tactics discussed above, Righthaven appears to only buy copyrights that it believes can be used for lawsuits and otherwise has no involvement in the practice of journalism.

Righthaven also appears to be soliciting other newspapers to sign on with it. But newspaper publishers who think that suing bloggers a story at a time will save journalism are sorely mistaken. Newspaper publishers have actually been having meaningful discussions about innovative business models to support real journalism. Sadly, Righthaven -- if it continues to attract clients -- threatens to derail those conversations with a sideshow proven to distract from progress.

But no matter where a newspaper may stand on the debate about journalism's future, we think it is abundantly clear that a "sue the audience" tactic is nowhere near worth considering. Newspapers should resist the temptation to put themselves into the same position as the music industry circa 2004, where futile lawsuits distracted them from the incorporating new technology and creating new ways to market product to fans.

EFF is watching Righthaven and other copyright trolls closely for overbroad tactics that hurt free speech and fair use, and abuse the legal system. We're looking for good cases to defend and will deliver more news and analysis as the issue develops.

September 03, 2010 12:38 AM

September 02, 2010

EFF.org Updates

EFF Asks Court to Protect Craigslist from Defamation Suit

San Francisco - The Electronic Frontier Foundation (EFF) and a coalition of public interest groups and law professors have asked a California appeals court to protect craigslist from a lawsuit that could spur websites to be less helpful in responding to complaints about user behavior.

In Scott P. v. craigslist, Inc., the plaintiff complained about a series of craigslist ads he said were written by impersonators. While craigslist removed the ads within minutes of his phone calls, the plaintiff sued, contending that craigslist broke a promise to "take care of it" when the impersonators posted additional ads. In cases like these, federal law -- specifically Section 230 of the Communications Decency Act -- shields Internet forums like craigslist from liability. Section 230 was designed to encourage parties to pursue action against those who created the questionable content instead of the platform that hosted it. But the California Superior Court has ruled that this case can continue because of the plaintiff's allegations that craigslist said it would help.

Craigslist filed a writ petition with the Court of Appeal for the State of California Wednesday, arguing that the trial court should have dismissed the case because of Section 230's protections for forum hosts. In an amicus letter filed today in support of craigslist, EFF argues that the lower court reasoning could create a hole in Section 230, discouraging forum owners from helping users.

"Section 230 was a deliberate effort by Congress to encourage service providers to find innovative ways to self-regulate," said EFF Senior Staff Attorney Kurt Opsahl. "Yet craigslist is facing the prospect of extended litigation because it tried to do just that. Allowing this litigation to continue could result in websites being less helpful to users with complaints."

Additionally troublesome is the specter of further lawsuits, which could convince other Internet innovators not to host user content at all.

"Congress created Section 230 to allow for online interactivity without a flood of lawsuits. But this case could undermine the immunity that the law created," said Opsahl. "If litigation can survive merely because a plaintiff asserts that the site made a vague promise, sites may decide that allowing comments or user generated content is not worth the legal exposure. Then we'll lose the vibrant online environment that Section 230 helped create in the first place."

Joining EFF in the letter to court were the Center for Democracy and Technology, the Citizen Media Law Project, and law professors Eric Goldman, David S. Levine, David G. Post, and Jason Schultz. Separately, a group of Internet companies, including Yahoo!, Amazon, Facebook, Twitter, Google and Linkedin filed another amicus brief in support of craigslist.

For the full amicus letter:
http://www.eff.org/files/filenode/craigslist_v_sup/EFFletter9210.pdf

For more on this case:
http://www.eff.org/cases/craigslist-v-superior-court-california

Contact:

Kurt Opsahl
Senior Staff Attorney
Electronic Frontier Foundation
kurt@eff.org

September 02, 2010 08:29 PM

Browse Blogs

X Census (for 1.9)

Tiago Vignatti has published some numbers showing who contributes to X.org: "Of course lines of code and changeset are far from being a good metric to see actually how the development happened. But still, it does represents something. For sure, there’s also a lot of other inaccurate information that I’m missing from this all. For instance, companies like Collabora does X development but sometimes get the merits for Nokia. Is that fair? I don’t know."

September 02, 2010 08:23 PM

Public Knowledge - Blogging, Events, and Action Alerts

Public Knowledge Applauds FCC Proposals On Broadcast “White Spaces,” “E-Rate” Reform.

For Immediate Release:  September 2, 2010

The Federal Communications Commission (FCC) issued a Public Notice today detailing the proposed agenda for its next Public Meeting, scheduled for September 23, 2010.  The following statement is attributed to Harold Feld, Legal Director, Public Knowledge:
 
“Today’s proposals represent real, concrete steps in fulfilling the promise of the National Broadband Plan. Voting final rules for the use of the broadcast white spaces will make much needed spectrum available for broadband. At a time when cell phone providers like AT&T are building wifi hot spots in places like Times Square to meet the demand created by the iPhone and other “smart” wireless devices, making use of empty television channels for ‘wifi on steroids’ will improve broadband access from the most crowded cities to rural America.
 

read more

September 02, 2010 08:14 PM

BioMed Central

Join the data debate: draft position statement on open data

BioMed Central supports the goals of the Panton Principles for Open Data in Science but putting them into practice needs to be done in careful consultation with the scientific community to ensure that researchers still receive appropriate credit for their contributions.

Rather than restricting access to data through restrictive licensing terms, cultural norms need to be defined for the assignment of credit, priority with respect to initial publication and the determination of reasonable embargo periods. Fields such as astronomy, economics and genomics have already made significant progress in this direction.

BioMed Central has drafted a position statement on data sharing, open data and licensing, and we invite the wider scientific community to join the discussion to help us define an explicit open data licensing policy going forwards.

The statement discusses what we see as “the Five Ws” for open data, which includes a proposal that, from a specific date, any author submitting to a BioMed Central journal would agree to dedicate the data elements of their article and supplementary material to the public domain and apply an open data conformant licence, such as Creative Commons CC0.

We invite the scientific and publishing community to join us in defining the optimum way to put the Panton Principles into practice. Comment publicly on the draft statement by using the comment function on this blog. Alternatively, contact us to get involved.

BioMed Central will also be discussing these issues as part of  the panel discussion on Publishing primary research data at Science Online London on 3rd September 2010.

September 02, 2010 07:03 PM

Public Library of Science

PLoS Currents is expanding

September 02, 2010 05:00 PM

BioMed Central

sMRI – the most powerful Alzheimer’s disease biomarker?

Apart from the formation of neurofibrillary tangles and deposition of amyloid plaques, other hallmarks of Alzheimer’s disease (AD) include the loss of both neurones and synapses in the human brain. There is evidence to suggest that this neurodegeneration is closely associated with cognitive decline, which is why structural magnetic resonance imaging (sMRI), which measures brain morphometry, is considered to be a powerful AD biomarker.

In an important review published in Alzheimer’s Research & Therapy earlier this week, Vemuri and Jack neatly summarise the role of sMRI in AD. They compare sMRI to the other major AD biomarkers typically studied, discuss the ways in which information can be extracted from sMRI images to condense atrophy information from patients’ scans and highlight the different roles of sMRI as an AD biomarker, including its use in predicting the progression of mild cognitive impairment to Alzheimer’s disease, measuring the efficacy of therapeutics and screening in clinical trials.

sMRI is a stable biomarker of AD progression and is useful in measuring disease intensity, however the authors stress that we should not rest on our laurels, but continue to build on it, by looking to develop automated techniques of extracting disease-specific information from images and by integrating it with other existing biomarkers for clinical use.

September 02, 2010 11:26 AM

BMC Research Notes – adding value to your data

Support for scientific data sharing is gathering more and more support in 2010, so rather than “why share data?” the question now is “how?”. Making data available in readily interpretable formats is vital to realising its value in driving new knowledge discovery, and BMC Research Notes today launches a new initiative aimed at promoting best practice in sharing and publishing data, with a focus on standardized, re-useable formats.

Across biology and medicine new data standards are emerging or are already in use, but many may not be enforced by journals or funding agencies, or benefit from established, structured databases for data deposition, such as ArrayExpress for microarray data. Adding value to data has always been at the core of  BMC Research Notes’ strategy and the journal aims to produce guidance for authors on domain-specific data standards, to complement our figure preparation guidelines. But as the scientific community itself is best placed to advise on the most appropriate formats for data, the journal has opened this project up to the scientific community and is asking researchers and data managers for their contributions.

Integral to these educational Data Notes will be the inclusion of an example dataset as an additional file, or link to a permanently-available dataset, which can serve as a reference example. Readily re-usable data from a cancer cohort is also published in BMC Research Notes today in the article by Vickers and Cronin, which accompanies the editorial that outlines the goals of this data-driven collection.

Indeed, the future of scholarly communication and research increasingly depends on a commitment to data. Just yesterday in JAMA a commentary on the US Department of Health and Human Services' Open Government strategy discussed the benefits to science  – and the economy – of public-use health data sets that maintain privacy. It further called for data to “be released in standardized formats, without intellectual property constraints.”

“Data is the underlying foundation of our science and it is crucial for both replicating results as well as building on them that we work harder at making data more effectively available and useable. It is great to see a pioneer of the Open Access literature like BMC providing leadership on the issue of making data openly available and providing the tools that will enable researchers to improve on current practice,” said Dr Cameron Neylon co-author of the Panton Principles for Open Data in Science.

BioMed Central is waiving the article processing charge for contributions to this special collection of articles, which also extends to contributions on broader aspects of scientific data sharing, archiving, and open data. Contact the BMC Research Notes editorial team for more information or, if you are at tomorrow’s Science Online London, come and talk to us at the session on ‘Publishing primary research data’. 

September 02, 2010 10:15 AM

Journal of Molecular Signaling welcomes new co-Editor-in-Chief

Yung Hou Wong, Head of the Section of Biochemistry and Cell Biology, Division of Life Science, at the Hong Kong University of Science and Technology, has recently joined Journal of Molecular Signaling as co-Editor-in-Chief alongside Danny Dhanasekaran. Professor Wong is a leading expert in the molecular pharmacology of G protein-coupled receptors, signal transduction and integration.


Journal of Molecular Signaling was launched in 2006 and encompasses different molecular aspects of cell signaling underlying normal and pathological conditions. The focus of the journal is on the normal or aberrant molecular mechanisms involving receptors, G-proteins, kinases, phosphatases, and transcription factors in regulating cell proliferation, differentiation, apoptosis, and oncogenesis in mammalian cells. This area also covers the genetic and epigenetic changes that modulate the signaling properties of cells and the resultant physiological conditions. A most highly accessed recent article in the journal determines the molecular effect of sulforaphane (SFN, found in cruciferous vegetables) in growth arrest of pancreatic cancer cells.


We would like to welcome Yung Hou Wong to his new role with this growing journal. He says that “Journal of Molecular Signaling is a significant avenue for researchers in the area of cell signaling to share their discoveries and innovations, and contribute towards the advancement of the field. I am excited to be a part of the team and look forward to working with the editorial board to increase its impact as well as its value to the growing readership across the scientific community and around the world.”

September 02, 2010 10:05 AM

Google Book Search Blog

The Armchair Traveler



[Please note, some images in this post may not be available in full view to users outside of the United States.]

Now that it’s early September and we’re officially in the dog days of summer, what better way to spend this hot, sultry period than to take a refresher and travel to exotic lands afar? Even if you’re working through the summer or are more of a staycationer, you can take a trip around the world by exploring different countries through Google Books!

Courtesy of books scanned via our library project, anyone can stroll through China, experience ninety days' worth of Europe or get to know South America. And if you’re feeling a little fantastical, you can leave Kansas behind and head off with Dorothy to explore the land of Oz.

With the plethora of travel-related books available in full view on Google Books, you can explore the world and be visually enlightened with sights from afar from the comfort of your couch and a frosty glass of lemonade!

Check out the beautiful Flower Pagoda in Canton, China:


Swing by the Uffizi Gallery in Florence to admire the Birth of Venice in Italy and the Italians by Edward Hutton:


See London through Herbert Fry’s eighteen bird's-eye views of the principal streets, or be a Wanderer in Paris experiencing the lovely cafés, museums and walks down rue de l'hôtel de ville:


And while you're there, why not visit the Arc De Triomphe De l’Etoile?


If you’re more of a nature-lover, hitch up your wagon of books via My Library on Google Books and set off on the Oregon Trail and imagine wildflowers, horseback riding, and gorgeous sunsets on plains via first-hand experiences penned by Francis Parkman, or if you’re feeling really adventurous, literally "book" yourself an around-the-world experience by traveling alongside Jules Verne for Five Weeks in a Balloon. For an intellectual dos of scientific observations, you can travel from Chile to Argentina and back again with Charles Darwin's Voyage of the Beagle.

After you return from your incredible journeys, you can easily show other readers your virtual trip by sharing images you found interesting. Blog interesting images using our Share This Clip feature in Google Books, and share your bookshelf with family, friends, or the world!

September 02, 2010 09:49 AM

Browse Blogs

More GPL enforcement work again.. and a very surreal but important case

Harald Welte writes that he's doing more work on the gpl-violations.org project again: "Right now I'm facing what I'd consider the most outrageous case that I've been involved so far: A manufacturer of Linux-based embedded devices (no, I will not name the company) really has the guts to go in front of court and sue another company for modifying the firmware on those devices. More specifically, the only modifications to program code are on the GPL licensed parts of the software.

September 02, 2010 08:29 AM

The People Who Support Linux: At Work and at Home

Chase Crum is a U.S. Army veteran, a Shriner, an IT infrastructure manager, and a member of The Linux Foundation. This certainly does not capture all that defines Chase, but it begins to illustrate where he derives his ideas about Linux, community and giving back. Chase also represents a growing majority of systems administrators and IT managers who are using Linux both at work and at home.

September 02, 2010 08:00 AM

information aesthetics

US Open Tennis Real-Time Data Visualization

usopen_visualization.jpg
On the heels of the many real-time sports visualizations that appeared alongside the recent FIFA soccer worldcup, the US Open Pointstream [usopen.org] presents an original 3D-like way of exploring the statistical data generated during all the live tennis matches of one of the most famous sports events in the world.

Users are able to select individual matches which occurred in the past or are still in progress. A "Momentum Meter" shows who is on top of the match, while a series of filters at the bottom (e.g. ace, double foult, netpoint, breakpoint, ...) allow for deeper analysis of the data. Visually, each player is distinguished by the color green or blue. Each ring represents a set, going from the inside to the outside. Each bar represents a point, with its height according to the serving speed.

Beautiful or useful?

September 02, 2010 12:56 AM

September 01, 2010

Public Knowledge - Blogging, Events, and Action Alerts

Copps Displays FCC Leadership

Federal Communications Commissioner Michael Copps has managed the art of saying much in a few words.  His latest salvo came in a 245-word letter to the editor in the Washington Post, in which he not only savaged yet another misbegotten Washington Post editorial about Internet policy, but also took on the Verizon-Google joint policy “recommendation” and then noted the cruel reality of the agency to which he has devoted almost nine years of his professional career.

He, and others, recognize that this is a unique time in the history of the FCC, and perhaps of regulation and politics.  It happens from time to time in Congress that a legislator will vote against a bill that he or she has introduced, usually after an amendment has been added that drastically changes the bill, or in the case of some shift in the political dynamic.

read more

September 01, 2010 08:45 PM

Public Library of Science

Announcing PLoS Blogs

September 01, 2010 07:08 PM

Wikimedia Technical Blog

September WMF Engineering Update

The Wikimedia Foundation Engineering staff has grown quite a bit over the past year, which has made it a lot harder for everyone to keep track of what we’re all working on. In an effort to make things a little clearer, we plan to report monthly on all of our active efforts, and maintain information pages on all of our active projects. Note that this isn’t (yet) a complete list of everything that the Wikimedia Foundation engineering team is up to, but we plan to make this increasingly comprehensive and more organized as we get better at putting together these reports. Here is a full list of projects.

You’ll see that each of these areas has a program manager assigned to the area. That’s the person who is responsible for coordinating the activity in that area, and someone from whom you can expect to get more detailed updates.  More below the fold…

Operations

Virginia Data Center – Setting up a world-class primary data center for Wikimedia Foundation properties.

Media Storage – Re-vamping our media storage architecture to accomodate expected increase in media uploads.

Monitoring – Enhancing both ops and public monitoring to a) notice potential outages sooner, b) increase transparency to the community, c) support progress tracking required in the 5-year plan.

Content Quality Tools

Article assessment – Working on feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia

Pending changes enwiki trial Pending Changes is a new review feature recently deployed to en.wikipedia.org, which allows changes made by anonymous and new users be reviewed before they appear as the primary version of an article.

Threaded discussions

Liquid Threads – LiquidThreads is an extension that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.

Multimedia tools

Upload wizard – The upload wizard is an extension for MediaWiki providing an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.

Add media wizard – The Add-media wizard is a gadget to facilitate the insertion of media files into wiki pages. Its development is supported by Kaltura.

MediaWiki Infrastructure

Resource loader – The resource loader aims to improve the load times for JavaScript and CSS components on any wiki page.

Central Notice – CentralNotice is a banner system used for global messaging across Wikimedia projects.

Analytics Revamp – Incorporate an analytics solution that can grow and answer the questions that the Wikimedia movement has.

Software Quality Infrastructure

Selenium deployment – Building an automated browser testing environment for MediaWiki.

Fundraising

Fraud Prevention – This project will focus on integrating new fraud prevention schemes within our credit card donation pipeline.

CiviCRM Upgrade – Upgrading from our heavily customized CiviCRMv2 install to a mostly stock CiviCRMv3 install

Misc

Google Summer of Code – Several projects from students funded by Google.

Process improvement – Increase transparency and generally organize Wikimedia Foundation’s engineering efforts more efficiently

If you read this far, thanks for sticking with us! We hope you found this useful. Please let us know what we can do to make this more useful for you.

September 01, 2010 06:35 PM

Grassroots Mapping

Oil contamination… from the Exxon Valdez

These saddening photos — taken in 2010 — show oil contamination in beach sediments around Prince William Sound, left over from the 1989 Exxon Valdez oil spill, over 20 years ago. Read more at Prince William Soundkeeper.

September 01, 2010 06:17 PM

Public Knowledge - Blogging, Events, and Action Alerts

Public Knowledge Expects ‘Prompt’ FCC Action To Protect Broadband Consumers

For Immediate Release:  September 1, 2010

The Federal Communications Commission issued a public notice, putting out for public comment two elements in the policy suggestion from Verizon and Google.  The following statement is attributed to Gigi B. Sohn, president and co-founder of Public Knowledge:

 “Nothing in this public notice prevents the FCC from taking prompt action on its ‘Third Way’ proceeding, which would make certain all Americans have affordable access to broadband, and to make sure it can deal with public safety and other crucial issues that are broader than the narrow issues on which the Commission seeks comment.

“We expect the Commission will move quickly to set the legal framework for the FCC to oversee broadband Internet access services, with specific rules to protect the open Internet to follow soon after.

read more

September 01, 2010 05:58 PM

Open Video Conference

VIDEO: a peek at our interview with Susan Crawford




video platform video management video solutions video player

Preview of our interview with Susan Crawford
Download link: [OGG] [MP4]

This week, a Wall Street Journal story on the proposed Comcast/NBCU merger brought concerns about media consolidation back to the fore. The U.S. Department of Justice is reportedly studying how the merger would affect the emerging internet video market.

Critics of the merger—including former Obama adviser and law professor Susan Crawford, a keynote speaker at this year’s Open Video Conference—say that the merger would hurt competition in the online video space.

Combined with anxieties about a shifting landscape for net neutrality, many are convinced that big changes are in store for the Internet as we know it—and by extension, the development of a rich online video medium that encourages user participation, creativity, and innovation.

We sat down with Ms. Crawford this week to hear her thoughts on the proposed merger, the FCC’s role in protecting net neutrality, and much more.

We’ll be releasing the 20-minute interview in three parts starting this week. It really captures the urgency that many are feeling about this critical time for the internet—a sense that we’re deciding new rules for the network and the web, and writing the the next few years of media history.

If you are passionate about the future of the open web and open video, we invite you to join us this October 1 & 2 at the Open Video Conference in New York City. Please register today.

September 01, 2010 02:00 PM

Google Research Blog

Towards Energy-Proportional Datacenters



This is part of the series highlighting some notable publications by Googlers.

At Google, we operate large datacenters containing clusters of servers, networking switches, and more. While this gear costs a lot of money, an increasingly important cost -- both in terms of dollars and environmental impact -- is the electricity that drives the computing clusters and the cooling infrastructure. Since our clusters often do not run at full utilization, Google recently put forth a call to industry and researchers to develop energy proportional computer systems. With such systems, the power consumed by our clusters would be directly proportional to utilization. Servers consume the most electricity, and therefore researchers have responded to Google’s call by focusing their attention towards servers. As the servers become increasingly energy proportional, however, the “always on” network fabric that connects servers together will consume an increasing fraction of datacenter power unless it too becomes energy proportional.

In a paper recently published at the International Symposium on Computer Architecture (ISCA), we push further towards the goal of energy-proportional computing by focusing on the energy usage of high-bandwidth, highly-scalable cluster networking fabrics. This research considers a broad set of architectural and technological solutions to optimize energy usage without sacrificing performance. First, we show how the Flattened Butterfly network topology uses less power since it uses less switching chips and fewer links than a comparable-performance network built using the more conventional Fat Tree topology. Second, our approach takes advantage of the observation that when network demand is low, we can reduce the speed at which links transmit data. We show via simulation, that by tuning the speeds of the links very rapidly, we can reduce power consumption with little impact on performance. Finally, our research is a further call to action for the academic and industry research communities to make energy efficiency, and energy proportionality in particular, a first-class citizen in networking research. Put together, our proposed techniques can reduce energy cost for typical Google workloads seen in our production datacenters by millions of dollars!

September 01, 2010 01:41 PM

Koha Library Software Community

8 Weeks ’til KohaCon – Are you registered?

This message came across the mailing list today and so I’m sharing it with all of you who might not be on our list:

———————-

Please forward this to lists or people who will be interested.

KohaCon10 starts on October 25th in Wellington, New Zealand. We have an exciting line up of speakers on a range of topics related to Koha and Open Source and Open Standards in libraries. See our programme for details.

http://www.kohacon10.org.nz/2010/program/

KohaCon10 is a free conference (that is right it will cost nothing for you to attend), but you still need to register to reserve your place.

Registrations from the international Koha community have been very strong. Over half of all available spaces are already taken.

If you have been holding off on the premise that you will have plenty of time to do this later, then please register now. Please do not rely on there being free spaces on the day.

Registration is quick and easy via the website. http://www.kohacon10.org.nz/2010/registration/

We look forward to seeing you in Wellington,

Russel Garlick
on behalf of the KohaCon10 Organising Committee

What is KohaCon10?

KohaCon is an opportunity for the entire Koha community, librarians and developers alike, to come together, meet each other, swap ideas and learn something new.

The conference is split into 2 parts.

The community conference will be held over 3 days – 25-27th of October. This is not just a developer’s conference. There will be presentations from librarians and developers alike.

The second part of the conference is the Hackfest for Koha developers that will be held from 29th-31st of October.

For more information see our website http://www.kohacon10.org.nz.

September 01, 2010 12:16 PM

Open Knowledge Foundation Blog

The Power of Open Data

The following guest post is from David Bollier, independent policy strategist, journalist, and author of Viral Spiral. It was originally posted at the On the Commons blog. Science has always recognized the power of sharing in developing new knowledge. But in the search for treatments and cures for diseases like Alzheimer’s and Parkinson’s, the sprawling [...] Related posts:

  1. The Medical Innovation Convention: A New Global Framework for Healthcare Research and Development
  2. Articles in CTWatch Quarterly
  3. On Getting Raw Data for Cancer Research

September 01, 2010 10:11 AM

Research Remix

Dear publisher, is the data open?

Publishers make article text available under a variety of copyright terms. Data, however, are not copyrightable. So what are we allowed to do with them, these datums and datasets within and beside article text? It isn’t clear. Few publisher sites say. It matters. So let’s ask. On behalf of the Open Knowledge Foundation and benefitting [...]

September 01, 2010 05:13 AM

Linked Data Blog Aggregator

A New Methodology for Building Lightweight, Domain Ontologies

Bringing Ontology Development and Maintenance to the Mainstream

Ontologies supply the structure for relating information to other information in the semantic Web or the linked data realm. Ontologies provide a similar role for the organization of data that is provided by relational data schema. Because of this structural role, ontologies are pivotal to the coherence and interoperability of interconnected data [1].

There are many ways to categorize ontologies. One dimension is between upper level and mid- and lower- (or domain-) level. Another is between reference or subject (domain) ontologies. Upper-level ontologies [2] tend to be encompassing, abstract and inclusive ways to split or organize all “things”. Reference ontologies tend to be cross-cutting such as ones that describe people and their interests (e.g., FOAF), reference subject concepts (e.g., UMBEL), bibliographies and citations (e.g., BIBO), projects (e.g., DOAP), simple knowledge structures (e.g., SKOS), social networks and activities (e.g., SIOC), and so forth.

The focus here is on domain ontologies, which are descriptions of particular subject or domain areas. Domain ontologies are the “world views” by which organizations, communities or enterprises describe the concepts in their domain, the relationships between those concepts, and the instances or individuals that are the actual things that populate that structure. Thus, domain ontologies are the basic bread-and-butter descriptive structures for real-world applications of ontologies.

According to Corcho et al. [3] “a domain ontology can be extracted from special purpose encyclopedias, dictionaries, nomenclatures, taxonomies, handbooks, scientific special languages (say, chemical formulas), specialized KBs, and from experts.” Another way of stating this is to say that a domain ontology — properly constructed — should also be a faithful representation of the language and relationships for those who interact with that domain. The form of the interaction can range from work to play to intellectual understanding or knowledge.

… ontology engineering research should strive for a unified, lightweight and component-based methodological framework, principally targeted at domain experts ….”

Simperl et al. [4]

Another focus here is on lightweight ontologies. These are typically defined as more hierarchical or classificatory in nature. Like their better-known cousins of taxonomies, but with greater connectedness, lightweight ontologies are often designed to represent subsumption or other relationships between concepts. They have not too many or not too complicated predicates (relationships). As relationships are added and the complexities of the world get further captured, ontologies migrate from the lightweight to the “heavyweight” end of the spectrum.

The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. For reasons as stated below, we prefer not to use the term ontology engineering, since it tends to convey a priesthood or specialized expertise in order to define or use them. As indicated, we see ontologies as being (largely) developed and maintained by the users or practitioners within a given domain. The tools and methodologies to be employed need to be geared to these same democratic (small “d”) objectives.

A Review of Prior Methodologies

For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have diminished somewhat in recent years. Yet the research as separately discussed in Ontology Development Methodologies [1] seems to indicate this state of methodology development in the field:

While there is by no means unanimity in this community, some general consenses can be seen from these prior reviews, especially those that concentrate on practical or enterprise ontologies. In terms of design objectives, this general consensus suggests that ontologies should be [4]:

While laudable, and which represent design objectives to which we adhere, current ontology development methods do not meet these criteria. Furthermore, to be discussed in our next installment, there is also an inadequate slate of tools ready to support these objectives.

A Call for a New Methodology

If you ask most knowledgeable enterprise IT executives what they understand ontologies to mean and how they are to be built, you would likely hear that ontologies are expensive, complicated and difficult to build. Reactions such as these (and not trying to set up strawmen) are a reflection of both the lack of methods to achieve the consensual objectives above and the lack of tools to do so.

The use of ontology design patterns is one helpful approach [5]. Such patterns help indicate best design practice for particular use cases and relationship patterns. However, while such patterns should be part of a general methodology, they do not themselves constitute a methodology.

Also, as Structured Dynamics has argued for some time, the future of the semantic enterprise resides in ontology-driven apps [6]. Yet, for that vision to be realized, clearly both methods and tools to build ontologies must improve. In part this series is a reflection of our commitment to plug these gaps.

What we see at present for ontology development is a highly technical, overly engineered environment. Methodologies are only sparsely or generally documented. They are not lightweight nor collaborative nor really incremental. While many tools exist, they do not interoperate and are pitched mostly at the professional ontologist, not the domain user. In order to achieve the vision of ontology-driven apps the methods to develop the fulcrum of that vision — namely, the ontologies themselves — need much additional attention. An adaptive methodology for ontology development is well past due.

Design Criteria for an Adaptive Methodology

We can thus combine the results of prior surveys and recommendations with our own unique approach to adaptive ontologies in order to derive design criteria. We believe this adaptive approach should be:

We discuss each of these design criteria below.

While we agree with the advisability of collaboration as a design condition — and therefore also believe that tools to support this methodology must also accommodate group involvement — collaboration per se is not a design requirement. It is an implementation best practice.

Effective ontology development is as much as anything a matter of mindset. This mindset is grounded in leveraging what already exists, “paying as one benefits” through an incremental approach, and starting simple and adding complexity as understanding and experience are gained. Inherently this approach requires domain users to be the driving force in ongoing development with appropriate tools to support that emphasis. Ontologists and ontology engineering are important backstops, but not in the lead design or development roles. The net result of this mindset is to develop pragmatic ontologies that are understood — and used by — actual domain practitioners.

Lightweight and Domain-oriented

By definition the methodology should be lightweight and oriented to particular domains. Ontologies built for the pragmatic purposes of setting context and aiding interoperability tend to be lightweight with only a few predicates, such as isAbout, narrowerThan or broaderThan. But, if done properly, these lighter weight ontologies can be surprisingly powerful in discovering connections and relationships. Moreover, they are a logical and doable intermediate step on the path to more demanding semantic analysis.

Contextual

Context simply means there is a reference structure for guiding the assignment of what content ‘is about’ [7]. An ontology with proper context has a balanced and complete scope of the domain at hand. It generally uses fairly simple predicates; Structured Dynamics tends to use the UMBEL vocabulary for its predicates and class definitions, and to link to existing UMBEL concepts to help ensure interoperability [8]. A good gauge for whether the context is adequate is whether there are sufficient concept definitions to disambiguate common concepts in the domain.

Coherent

The essence of coherence is that it is a state of consistent connections, a logical framework for integrating diverse elements in an intelligent way. So while context supplies a reference structure, coherence means that the structure makes sense. With relation to a content graph, this means that the right connections (edges or predicates) have been drawn between the object nodes (or content) in the graph [9].

Relating content coherently itself demands a coherent framework. At the upper reference layer this begins with UMBEL, which itself is an extraction from the vetted and coherent Cyc common sense knowledge base. However, as domain specifics get added, these details, too, must be testable against a unified framework. Logic and coherence testing are thus an essential part of the ontology development methodology.

Incremental

Much value can be realized by starting small, being simple, and emphasizing the pragmatic. It is OK to make those connections that are doable and defensible today, while delaying until later the full scope of semantic complexities associated with complete data alignment.

An open world approach [10] provides the logical basis for incremental growth and adoption of ontologies. This is also in keeping with the continuous and incremental deployment model that Structured Dynamics has adopted from MIKE2.0 [11]. When this model is applied to the process of ontology development, the basic implementation increments appear as follows:

Continuous Ontology Implementation
Figure 1. A Phased, Incremental Approach to Ontology Development (click to expand)

The first two phases are devoted to scoping and prototyping. Then, the remaining phases of creating a working ontology, testing it, maintaining it, and then revising and extending it are repeated over multiple increments. In this manner the deployment proceeds incrementally and only as learning occurs. Importantly, too, this approach also means that complexity, sophistication and scope only grows consistent with demonstrable benefits.

Re-use of Structure

Fundamental to the whole concept of coherence is the fact that domain experts and practitioners have been looking at the questions of relationships, structure, language and meaning for decades. Though perhaps today we now finally have a broad useful data and logic model in RDF, the fact remains that massive time and effort has already been expended to codify some of these understandings in various ways and at various levels of completeness and scope.

These are prior investments in structure that would be silly to ignore. Yet, today, most methodologies do ignore these resources. This ignorance of prior investments in information relationships is perplexing. Though unquestioned adoption of legacy structure is inappropriate to modern interoperable systems, that fact is no excuse for re-inventing prior effort and discoveries, many of which are the result of laborious consensus building or negotiations.

The most productive methodologies for modern ontology building are therefore those that re-use and reconcile prior investments in structural knowledge, not ignore them. These existing assets take the form of already proven external ontologies and internal and industry structures and vocabularies.

Separation of the ABox and TBox

Nearly a year ago we undertook a major series on description logics [12], a key underpinning to Structured Dynamics’ conceptual and logic foundation to its ontology development. While we can not always adhere to strict and conforming description logics designs, our four-part series helped provide guidance for the separation of concerns and work that can also lead to more effective ontology designs [13].

Conscious separation of the so-called ABox (assertions or instance records) and TBox (conceptual structure) in ontology design provides some compelling benefits:

TBox- and ABox-level work
Figure 2. Separation of the TBox and ABox [14]

Maintaining identity relations and disambiguation as separate components also has the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available. A low-fidelity service, for example, could be applied for quick or free uses, with more rigorous methods reserved for paid or batch mode analysis. Similarly, maintaining full-text search as a separate component means that work can be done by optimized search engines with built-in faceting.

Simple, Interoperable Tools Support

An essential design criteria is to have a methodology and work flow that explicitly accounts for simple and interoperable tools. By “simple” we mean targeted, task-specific tools and functionality that is also geared to domain users and practitioners.

Of all design areas, this one is perhaps the weakest in terms of current offerings. The next installment in this series [1] will address this topic directly.

The New Methodology

Armed with these criteria, we are now ready to present the new methodology. In summary terms, we can describe the steps in the methodology as:

  1. Scope, analyze, then leverage existing assets
  2. Prototype structure
  3. Pivot on the working ontology
  4. Test
  5. Use and maintain
  6. Extend working ontology and repeat.

Two Parallel Tracks

After the scoping and analysis phase, the effort is split into two tracks:

This split conforms to the separation of ABox and TBox noted above [15]. There are conceptual and workflow parallels between entities and data v. ontologies. However, the specific methodologies differ, and we only focus on the conceptual ontology side in the discussion below, shown as the upper part (blue) of Figure 3:

Ontology and Instance Build Methodology
Figure 3. Flowchart of Ontology Development Methodology [16] (click to expand)

Two key aspects of the initial effort are to properly scope the size and purpose of the starting prototype and to inventory the existing assets (structure and data; internal and external) available to the project.

Re-Use Structure

Most current ontology methodologies do not emphasize re-use of existing structure. Yet these resources are rich in content and meaning, and often represent years to decades of effort and expenditure in creation, assembly and consensus. Just a short list of these potential sources demonstrates the treasure trove of structure and vocabularies available for re-use: Web portals; databases; legacy schema; metadata; taxonomies; controlled vocabularies; ontologies; master data catalogs; industry standards; exchange formats, etc.

Metadata and available structure may have value no matter where or how it exists, and a fundamental aspect of the build methodology is to bring such candidate structure into a common tools environment for inspection and testing. Besides assembling and reviewing existing sources, those selected for re-use must be migrated and converted to proper ontological form (OWL in the case of those developed by Structured Dynamics). Some of these techniques have been demonstrated for prior patterns and schema [17]; in other instances various converters, RDFizers or scripts may need to be employed to effect the migration.

Many tools and options exist at this stage, even though as a formal step this conversion is often neglected.

Prototype Structure

The prototype structure is the first operating instance of the ontology. The creation of this initial structure follows quite closely the approach recommended in Ontology Development 101 [18], with some modifications to reflect current terminology:

  1. Determine the domain and scope of the ontology
  2. Consider reusing existing ontologies
  3. Enumerate important terms in the ontology
  4. Define the classes and the class hierarchy
  5. Define the properties of classes
  6. Create instances

The prototype structure is important since it communicates to the project sponsors the scope and basic operation of the starting structure. This stage often represents a decision point for proceeding; it may also trigger the next budgeting phase.

Link Reference Ontologies

An essential aspect of a build methodology is to re-use “standard” ontologies as much as possible. Core ontologies are Dublin Core, DC Terms, Event, FOAF, GeoNames, SKOS, Timeline, and UMBEL. These core ontologies have been chosen because of universality, quality, community support and other factors [19]. Though less universal, there are also a number of secondary ontologies, namely BIBO, DOAP, and SIOC that may fit within the current scope.

These are then supplemented with quality domain-specific ontologies, if such exist. Only then are new name spaces assigned for any newly generated ontology(ies).

Working Ontology

The working ontology is the first production-grade (deployable) version of the ontology. It conforms to all of the ontology building best practices and needs to be complete enough such that it can be loaded and managed in a fully conforming ontology editor or IDE [20].

By also using the OWL API, this working structure can also be the source for specialty tools and user maintenance functions, short of requiring a full-blown OWL editor. Many of these aspects are some of the poorest represented in the current tools inventory; we return to this topic in the next installment.

The working ontology is the complete, canonical form of the domain ontology(ies) [21]. These are the central structures that are the focus for ongoing maintenance and extension efforts over the ensuing phases. As such, the ontologies need to be managed by a version control system with comprehensive ontology and vocabulary management support and tools.

Testing and Mapping

As new ontologies are generated, they should be tested for coherence against various reasoning, inference and other natural language processing tools. Gap testing is also used to discover key holes or missing links within the resulting ontology graph structure. Coherence testing may result in discovering missing or incorrect axioms. Gap testing helps identify internal graph nodes needed to establish the integrity or connectivity of the concept graph.

Though used for different purposes, mapping and alignment tools may also work to identify logical and other inconsistencies in definitions or labels within the graph structure. Mapping and alignment is also important in its own right in order to establish the links that help promote ontology and information interoperability.

External knowledge bases can also play essential roles in testing and mapping. Two prominent knowledge base examples are Cyc and Wikipedia, but many additional exist for any specific domain.

Use and Maintenance

Of course, the whole purpose of the development methodology is to create practical, working ontologies. Such uses include search, discovery, information federation, data interoperability, analysis and reasoning, The general purposes to which ontologies may be put are described in the Executive Intro to Ontologies [22].

However, it is also in day-to-day use of the ontology that many enhancements and improvements may be discovered. Examples include improved definitions of concepts; expansions of synonyms, aliases and jargon for concepts; better, more intuitive preferred labels; better means to disambiguate between competing meanings; missing connections or excessive connections; and splitting or consolidating of the underlying structure.

Today, such maintenance enhancements are most often not pursued because existing tools do not support such actions. Reliance on IDEs and tools geared to ontology engineering are not well suited to users and practitioners being able to note or effect such changes. Yet ongoing ontology use and adaptation clearly suggest that users should be encouraged to do so. They are the ones in the front lines of identifying and potentially recording such improvements.

Extend

Ontology development is a process, not a static destination or event. This observation makes intuitive sense since we understand ontologies to be a means to capture our understanding of our domains, which is itself constantly changing due to new observations and insights. This factor alone suggests that ontology development methodologies must therefore give explicit attention to extension.

But there is another reason for this attention. Incremental, adaptive ontologies are also explicitly designed to expand their scope and coverage, bite by bite as benefits prove themselves and justify that expansion. A start small and expand strategy is of course lower risk and more affordable. But, for it to be effective, it also must be designed explicitly for extension and expansion. Ontology growth thus occurs both from learning and discovery and from expanding scope.

Versioning, version control and documentation (see below) thus assume more central importance than a more static view would suggest. The use of feedbacks and the continuous improvement design based on MIKE2.0 are therefore also central tenets of our ontology development methodology.

Documentation

This perspective of the ontology as a way to capture the structure and relationships of a domain — which is also constantly changing and growing — carries over to the need to document the institutional memory and use of it. Both better tools — such as vocabulary management and versioning — and better work processes need to be instituted to properly capture and record use and applications of ontologies.

Some of these aspects are now handled with utilities such as OWLdoc or the TechWiki that Structured Dynamics has innovated to capture ontology knowledge bases on an ongoing basis. But these are still rudimentary steps that need to be enforced with management commitment and oversight.

One need merely begin to probe the ontology development literature to observe how sparse the pickings are. Very little information on methodologies, best practices, use cases, recipes, how to manuals, conversion and use steps and other documentation really exists at present. It is unfortunately the case that documentation even lags the inadequate state of tools development in the ontology space.

Content Processing

Once formalized, these constructs — the structured ontologies or the named entity dictionaries as shown in Figure 3 — are then used for processing input content. That processing can range from conversion to direct information extraction. Once extracted, the structure may be injected (via RDFa or other means) back into raw Web pages. The concepts and entities that occur within these structures help inform various tagging systems [23]. The information can also be converted and exported in various forms for direct use or for incorporation in third-party systems.

Visualization systems and specialized widgets (see next) can be driven by the structure and results sets obtained from querying the ontology structure and retrieving its related instance data. While these purposes are somewhat beyond the direct needs of the ontology development methodology, the ontology structures themselves must be designed to support these functions.

Semantic Component Ontology

In our methodology we also provide for administrative ontologies whose purpose is to relate structural understandings of the underlying data and data types with applicable end-use and visualization tools (”widgets”). Thus the structural knowledge of the domain gets combined with an understanding of data types and what kinds of visualization or presentation widgets might be invoked. The phrase ontology-driven apps results from this design.

Amongst other utility ontologies, Structured Dynamics names its major tool-driver ontology the SCO (Semantic Component Ontology). The SCO works in intimate tandem with the domain ontologies, but is constructed and designed with quite different purposes. A description of the build methodology for the SCO (or its other complementary utility ontologies) is beyond the scope of this current document.

Tooling and Best Practices

As sprinkled throughout the above commentary, this methodology is also intimately related to tools and best practices. The next chapter in this series is devoted to and will be archived on the TechWiki as the lightweight domain ontology methodology. Best practices will be handled in a similar way for the chapter after that one and in its ontology best practices document on the TechWiki.

Time for a Leap Forward in Methodology

Earlier reviews and the information in this document suggest a real need for ontology building methodologies that are integrated, easier to use, interoperate with a richer tools set and are geared to practitioners versus priests. The good news is that there are architectures and building blocks to achieve this vision. The bad news is that the first steps on this path are only now beginning.

The next two installments in this series add further detail for why it is time — and how — we can make a leap forward in methodology. Those critical remaining pieces are in tools and best practices.


[1] This posting is part of a current series on ontology development and tools. The series began with an update of my prior Ontology Tools listing, which now contains 185 tools. It continued with a survey of ontology development methodologies. The next part in this series will address a new architecture for tooling development. The last installment in the series is planned to cover ontology best practices. This same posting is permanently archived and updated on the OpenStructs TechWiki as Lightweight, Domain Ontologies Development Methodology. [2] Examples of upper-level ontologies include the Suggested Upper Merged Ontology (SUMO), the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), PROTON, Cyc and BFO (Basic Formal Ontology). Most of the content in their upper-levels is akin to broad, abstract relations or concepts (similar to the primary classes, for example, in a Roget’s Thesaurus — that is, real ontos stuff) than to “generic common knowledge.” Most all of them have both a hierarchical and networked structure, though their actual subject structure relating to concrete things is generally pretty weak. For a more detailed treatment of ontology classifications, see M. K. Bergman, 2007. “An Intrepid Guide to Ontologies,” AI3:::Adaptive Information blog, May 16, 2007. [3] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in Data & Knowledge Engineering 46, 2003. See http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf. [4] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. “Ontology Engineering: A Reality Check,” in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE 2006, 2006. See http://ontocom.ag-nbi.de/docs/odbase2006.pdf. [5] OntologyDesignPatterns.org is a semantic Web portal dedicated to ontology design patterns (ODPs). The portal was started under the NeOn project, which still partly supports its development. [6] See M.K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, November 23, 2009. [7] See M.K. Bergman, 2008. “The Semantics of Context,” AI3:::Adaptive Information blog, May 6, 2008. [8] UMBEL (Upper Mapping and Binding Exchange Layer) is an ontology of about 20,000 subject concepts that acts as a reference structure for inter-relating disparate datasets. It is also a general vocabulary of classes and predicates designed for the creation of domain-specific ontologies. [9] See M.K. Bergman, 2008. “When is Content Coherent?,” AI3:::Adaptive Information blog, July 25, 2008. [10] See M.K. Bergman, 2009. “The Open World Assumption: Elephant in the Room,” AI3:::Adaptive Information blog, December 21, 2009. [11] MIKE2.0 (Method for Integrated Knowledge Environments) is an open source information development methodology championed by Bearing Point and Deloitte. Structured Dynamics has adopted the approach and has helped formulate MIKE2.0’s semantic enterprise offering. For a general intro to the approach, see further M.K. Bergman, 2010. “MIKE2.0: Open Source Information Development in the Enterprise,” AI3:::Adaptive Information blog, February 23, 2010. [12] This is our working definition for description logics:

“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.” [13] See the four-part description logics series from M. K. Bergman, 2009. “Making Linked Data Reasonable using Description Logics, Part 1,” AI3:::Adaptive Information blog, Feb. 11, 2009; “Making Linked Data Reasonable using Description Logics, Part 2,” AI3:::Adaptive Information blog, Feb. 15, 2009; “Making Linked Data Reasonable using Description Logics, Part 3,” AI3:::Adaptive Information blog, Feb. 18, 2009; and “Making Linked Data Reasonable using Description Logics, Part 4,” AI3:::Adaptive Information blog, Feb. 23, 2009. [14] See Part 2 in [13]. [15] The TBox portion, or classes (concepts), is the basis of the ontologies. The ontologies establish the structure used for governing the conceptual relationships for that domain and in reference to external (Web) ontologies. The ABox portion, or instances (named entities), represents the specific, individual things that are the members of those classes. Named entities are the notable objects, persons, places, events, organizations and things of the world. Each named entity is related to one or more classes (concepts) to which it is a member. Named entities do not set the structure of the domain, but populate that structure. The ABox and TBox play different roles in the use and organization of the information and structure. [16] The original version, now slightly modified, was first published in M. K. Bergman, 2009. “Ontology-driven Applications Using Adaptive Ontologies,” AI3:::Adaptive Information blog, Nov. 23, 2009. [17] As some examples, see for instance: SKOS: Mark van Assem, Veronique Malais, Alistair Miles and Guus Schreiber, 2006. “A Method to Convert Thesauri to SKOS,” in The Semantic Web: Research and Applications (2006), pp. 95-109. See http://www.cs.vu.nl/~mark/papers/Assem06b.pdf for paper, also http://thesauri.cs.vu.nl/eswc06/ and http://thesauri.cs.vu.nl/; taxonomies: Fausto Giunchiglia, Maurizio Marchese and Ilya Zaihrayeu, 2006. “Encoding Classifications into Lightweight Ontologies,” presented at Proceedings of the 3rd European Semantic Web Conference (ESWC 2006), Budva. See http://www.science.unitn.it/~marchese/pdf/encoding%20classifications%20into%20lightweight%20ontologies_JoDS8.pdf; metadata: Mikael Nilsson, 2007. See http://mikaelnilsson.blogspot.com/2007/11/semanticizing-metadata-specifications.html; relational schema: see the W3C workgroup on RDB2RDF; and, of course, there are many others. [18] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html. [19] The various criteria that are considered in nominating an existing ontology to “core” status is that it should be general; highly used; universal; broad committee or community support; well done and documented; and easily understood. [20] Example and comprehensive ontology editing toolkits or IDEs (integrated development environments) include NeOn toolkit, Protégé, and TopBraid Composer. A complement to these larger toolkits is the OWL API, which when used can also provide a canonical management framework for specific ontology tools and tasks. This topic is covered more in the next installment regarding the tools landscape. [21] Good ontology design, especially for larger projects, does require a degree of modularity. An architecture of multiple ontologies often work together to isolate different work tasks so as to aid better ontology management. Ontology architecture and modularization is a separate topic in its own right. [22] Originally published as M.K. Bergman, 2010. “An Executive Intro to Ontologies,” AI3:::Adaptive Information blog, August 9, 2010. This popular document has now been permanently archived on the the OpenStructs TechWiki as Intro to Ontologies. [23] Another reason for the clear distinction between ABox and TBox is their use to aid one another in disambiguation. Structured Dynamics’ scones approach (subject concepts or named entities) is designed expressly for this purpose. It is also possible to integrate these approaches with third-party tools (e.g., Calais, Expert System (Cogito), etc.) to improve unstructured content characterization. Via this approach we now can assess concept matches in addition to entity matches. This means we can triangulate between the two assessments to aid disambiguation. Because of logical segmentation, we have increased the informational power of our concept graph.

September 01, 2010 05:10 AM

Elphel Development Blog

Initial OpenLayers mockup to display images

Today I created a tiny bit of OpenLayers code for the Eyesis display page. It is basically a demo what you can do by playing points of interest on a map. Displaying the panorama and a smaller map.

OpenLayers

Currently I did not add a panorama player yet. But since it is only a matter of changing div’s that could be done easily. Personally I would like to go for a HTML5 kind of player, since for most browsers that would be the least resource intensive way of displaying. The code is available at http://eyesis.openstreetphoto.org/ there are some images there but only lowres from the initial stichting tryouts.

September 01, 2010 12:03 AM

August 31, 2010

Browse Blogs

Torvalds Causes Mob Scene at LinuxCon Brazil

The Linux Foundation today kicked off its two-day debut of LinuxCon Brazil. Attendees got a rare opportunity to see both Linus Torvalds and Andrew Morton on stage, together, and in person.

August 31, 2010 08:40 PM

OStatus

OStatus 1.0 Draft 2 Available under OWFa

One of the important things for developing a specification is providing an explicit license. This give third-party implementers the security to know that they're not walking into a patent or copyright minefield by implementing the specification. It's why the IETF and W3C require explicit descriptions of rights and licenses for all specs made by those bodies.

To make sure that implementers are aware that this spec is open to use and develop with, we've used the great Open Web Foundation Agreement 0.9 (OWFa) made available by the Open Web Foundation. It's an explicit copyright license and patent promise that's been carefully reviewed for use by open web specs like OStatus.

Not all of the technology that's collected in OStatus is currently under the OWFa, but some parts are: PubSubHubbub and Salmon were two of the specs specifically listed when the OWFa was introduced. Our discussions with other upstream spec developers on PoCo, Activity Streams and WebFinger suggest that they too will use the OWFa or something similar. So putting our application profile into the mix makes a lot of sense.

The new draft of the specification includes the license notification, and copies of the signed agreements will be made available on the OStatus site at http://ostatus.org/owfa/. (There's also one addition -- there was an identifier URI left out of draft 1 that we've now got added in!)

Thanks to the people at the OWF who made this great agreement. It's made it easy for use to give the right signal to the OStatus community.

August 31, 2010 08:04 PM

Browse Blogs

Best practices in Open Source Governance at Open World Forum

We'll have a one-day session about the governance of open source at Open World Forum to which everyone is invited. Open World Forum will take place in Paris on September 30th and October 1st. The governance session will be on the second day. The talks in the morning address topics related to the adoption of open source whereas the afternoon session focuses on governance issues and best practices.

August 31, 2010 06:19 PM

natural language processing blog

Online Learning Algorithms that Work Harder

It seems to be a general goal in practical online learning algorithm development to have the updates be very very simply.  Perceptron is probably the simplest, and involves just a few adds.  Winnow takes a few multiplies.  MIRA takes a bit more, but still nothing hugely complicated.  Same with stochastic gradient descent algorithms for, eg., hinge loss.

I think this maybe used to make sense.  I'm not sure that it makes sense any more.  In particular, I would be happier with online algorithms that do more work per data point, but require only one pass over the data.  There are really only two examples I know of: the StreamSVM work that my student Piyush did with me and Suresh, and the confidence-weighted work by Mark Dredze, Koby Crammer and Fernando Pereira (note that they maybe weren't trying to make a one-pass algorithm, but it does seem to work well in that setting).

Why do I feel this way?

Well, if you look even at standard classification tasks, you'll find that if you have a highly optimized, dual threaded implementation of stochastic gradient descent, then your bottleneck becomes I/O, not learning.  This is what John Langford observed in his Vowpal Wabbit implementation.  He has to do multiple passes.  He deals with the I/O bottleneck by creating an I/O friendly, proprietary version of the input file during the first past, and then careening through it on subsequent passes.

In this case, basically what John is seeing is that I/O is too slow.  Or, phrased differently, learning is too fast :).  I never thought I'd say that, but I think it's true.  Especially when you consider that just having two threads is a pretty low requirement these days, it would be nice to put 8 or 16 threads to good use.

But I think the problem is actually quite a bit more severe.  You can tell this by realizing that the idealized world in which binary classifier algorithms usually get developed is, well, idealized.  In particular, someone has already gone through the effort of computing all your features for you.  Even running something simple like a tokenizer, stemmer and stop word remover over documents takes a non-negligible amount of time (to convince yourself: run it over Gigaword and see how long it takes!), easily much longer than a silly perceptron update.

So in the real world, you're probably going to be computing your features and learning on the fly.  (Or at least that's what I always do.)  In which case, if you have a few threads computing features and one thread learning, your learning thread is always going to be stalling, waiting for features.

One way to partially circumvent this is to do a variant of what John does: create a big scratch file as you go and write everything to this file on the first pass, so you can just read from it on subsequent passes.  In fact, I believe this is what Ryan McDonald does in MSTParser (he can correct me in the comments if I'm wrong :P).  I've never tried this myself because I am lazy.  Plus, it adds unnecessary complexity to your code, requires you to chew up disk, and of course adds its own delays since you now have to be writing to disk (which gives you tons of seeks to go back to where you were reading from initially).

A similar problem crops up in structured problems.  Since you usually have to run inference to get a gradient, you end up spending way more time on your inference than your gradients.  (This is similar to the problems you run into when trying to parallelize the structured perceptron.)

Anyway, at the end of the day, I would probably be happier with an online algorithm that spent a little more energy per-example and required fewer passes; I hope someone will invent one for me!

August 31, 2010 07:09 PM

Open Video Conference

This Is Not a Hoax: The Yes Men at Open Video Conference

The Yes Men Fix The World, the second film from the culture-jamming activist duo, will be the marquee feature in the Shared Film Festival at the Open Video Conference. After the screening, we’ll sit down for a panel including The Yes Men and their defense counsel, EFF’s Corynne McSherry.

The Yes Men raise awareness about social issues by tactically intervening in the mass media. Posing as executives of giant corporations, they lie their way into big conferences and TV appearances to expose—with surreal humor—the dark underbelly of multinational business. “It takes some nerve, not to mention diabolical intelligence… to pull off [these] pranks,” the New York Times wrote in its review of the film.

The film chronicles, among other episodes, the time Yes Man Andy Bichlbaum appeared on BBC World as a faux Dow Chemical spokesman to apologize for the Bhopal chemical disaster. After tricking a BBC producer into granting an interview, Bichlbaum read a lengthy “official statement” on live broadcast, offering reparations for the 120,000 affected victims. By the time the hoax was uncovered, Dow’s market cap had taken a $2 billion dollar hit.

Because it is such a hot potato, The Yes Men have a hard time securing traditional distribution deals for the movie.  Though it’s earned heaps of awards and critical accolades, it also chronicles costly and elaborate pranks against Haliburton, WTO, Dow Chemical, and others—giving most distributors heartburn for the potential liability risks.

As a result, The Yes Men decided to freely distribute the film using P2P systems like BitTorrent. They’ve reached a massive audience, cost-free, and have even received tens of thousands of dollars in donations from fans and supporters.

The P2P edition of the film features special scenes of The Yes Men’s prank at the National Press Club, which resulted in a lawsuit being filed against them by the U.S. Chamber of Commerce.

Don’t miss the Yes Men, the Shared Film Festival, and the rest of the activities at this year’s Open Video Conference. Register today, and join us October 1 & 2 in New York City!



August 31, 2010 03:25 PM

Browse Blogs

Rise in use of EUPL for publishing open source software

OSOR has published an update on the adoption of the European Union Public Licence (EUPL): "A third of the projects available on the European Commission's software development site, the OSOR Forge, 47 out of 147 projects, are published using the EUPL. On Sourceforge, a commercial venture for open source software development based in the US, the licence is now selected by 49 projects."

August 31, 2010 01:36 PM

Free Open Source Academia Conference (fOSSa), November 8-10, Grenoble

The goal of fOSSa (Free Open Source Academia Conference) is to reaffirm the underlying values of open source software: innovation and research in software development.

The second edition will focus on specific aspect we feel are key in a renovation of FOSS: Development, innovation & research, Community management and promotion, Public sector, Education.

November 8-10 2010
Grenoble, France
Web site: http://fossa2010.inrialpes.fr/

August 31, 2010 11:27 AM

Novell Disappoints as Ownership Concerns Continue

Datamation reports that Novell fell short of its guidance for the third fiscal quarter of 2010: "For the quarter, Novell reported revenue of $199 million, a decline of 8 percent from the third quarter of 2009. The company reported net income of $16 million, or $0.04 per share, dipping from the $17 million Novell posted in the third quarter of 2009."

August 31, 2010 09:03 AM

OpenSocial API Blog

Eureka! Lockheed Martin contributes OpenSocial platform to open source

Lockheed Martin Corporation recently announced the release of its first open source software initiative around social media called Eureka Streams. Eureka Streams is a social media platform that integrates activity streams and OpenSocial apps. Lockheed Martin has spent the past several years growing a strategy of Social Software within the Enterprise to bring widely distributed employees together. Eureka Streams takes that vision further by incorporating what works well on the internet and builds a platform based on open standards to expand social media even further.

Eureka Streams, initially built internally, is now being made available
under the Apache License as open source. Shindig version 1.1 (beta)
integration provides the framework to offer the OpenSocial 0.9
features, creating a user focused OpenSocial gadget container that can
access the user profiles and activity data created within the tool.
The UI has been developed using Google Web Toolkit to provide a
flexible JavaScript front end developed in Java.

Eureka Streams is currently released to open source at version
0.9. The team has placed a heavy focus on user interaction,
performance, and scalability to this point, but is shifting their
focus to the developer and looking for support from the open source
community.

For more information and to learn how to get started, please visit the Eureka Streams web site.

Posted on behalf of Steve Terlecki, Lockheed Martin Corp, by Mark Weitzel, President, OpenSocial Foundation

August 31, 2010 06:52 AM

August 30, 2010

Dublin Core Metadata Initiative

New Task Groups for revising the User Guide and reviewing the DCMI Abstract Model

2010-08-30, Two new DCMI Task Groups have been formed: the DCMI User Guide Task Group that will work on a revision of the popular but outdated document "Using Dublin Core" and the DCMI Abstract Model Review Task Group that will prepare a review of the DCMI Abstract Model, both for discussion at DC-2010 in October 2010. Discussion will take place on the DC-Glossary and DC-Architecture mailing lists, respectively. Participation by interested members of the Dublin Core community is welcomed and encouraged; please contact Tom Baker for further information.

August 30, 2010 11:59 PM

NISO/DCMI Webinar slides published

2010-08-30, The slides from the Joint NISO/DCMI Webinar "Dublin Core: The Road from Metadata Formats to Linked Data" held on 25 August 2010 are now available at the Metadata Training Resources page.

August 30, 2010 11:59 PM

tesseract-ocr Google Group

OCR of Screenshots

I understand the resolutions of screenshots are typically inadequate
for OCR, but besides rescaling to a higher resolution, say, 300 DPI,
what other preprocessing operations may be needed on the images to
yield optimal OCR results?

Thanks.

August 30, 2010 10:46 PM

EFF.org Updates

Reading, Writing, and RFID Chips: A Scary Back-to-School Future in California

Scary news from California's Contra Costa County — school officials there have reportedly decided to track some preschoolers with RFID chips, thanks to a federal grant supplying the funding.

According to a story from the Associated Press, the students will wear a jersey at school that has the RFID tag attached. The tag will track the children's movements and collect other data, like if the child has eaten or not. According to a Contra Costa County official, this is a cost-savings move, as teachers used to have to manually keep track of a child's attendance and meal schedule.

But of course, an RFID chip allows for far more than that minimal record-keeping. Instead, it provides the potential for nearly constant monitoring of a child's physical location. If readings are taken often enough, you could create an extraordinarily detailed portrait of a child's school day — one that's easy to imagine being misused, particularly as the chips substitute for direct adult monitoring and judgment. If RFID records show a child moving around a lot, could she be tagged as hyper-active? If he doesn't move around a lot, could he get a reputation for laziness? How long will this data and the conclusions rightly or wrongly drawn from it be stored in these children's school records? Can parents opt-out of this invasive tracking? How many other federal grants are underwriting programs like these?

These are questions that desperately need answers. California is in the middle of a terrible budget crunch, but the solution is not federally funded surveillance of children who are too young to understand the implications.

August 30, 2010 07:27 PM

Public Knowledge - Blogging, Events, and Action Alerts

The Intellectual Property Breakfast Club

September 14, 2010 - 8:00am - 10:00am

The Role of the Obama Administration’s IP Enforcement Program

For the first time, a presidential administration has prioritized enforcement of intellectual property rights by appointing a high-level administration official charged with coordinating policy and enforcement. Join a wide-ranging discussion on how the Obama Administration is approaching international and domestic controversies surrounding intellectual property.


Click here for more information.

August 30, 2010 07:02 PM

Wikimedia Technical Blog

Google Summer of Code conclusion

This past week marked this year’s conclusion of Google Summer of Code.  This has turned out to be a very successful year for us and we hope for the students as well.  Here are this year’s projects:

More detailed information on all of these projects can be found on our GSoC 2010 projects page.  Also, Wikipedia Signpost is highlighting this work over the coming weeks, starting with a summary of Brian Wolff’s XMP metadata project.

Though not all projects were finished completely as specified, all were completed to a sufficient degree that we felt very comfortable passing all of the students, and all of the students produced code we’re very happy to have.  Note that there is no guarantee that anything here will get beyond the proof-of-concept stage.  However, we’re hopeful that much of this work will find broader adoption, and we’re looking forward to that.

We hope that all of the students stick around as MediaWiki contributors long after the summer is over.  Please join us in thanking them for their participation this year!

August 30, 2010 06:30 PM

Open Knowledge Foundation Blog

Slides and notes from Data Driven Journalism event

Last week I attended the Data-driven journalism in Amsterdam (which we blogged about here) run by the European Journalism (who interviewed me here). My slides from the event are now up here: Open Data and Data Driven JournalismView more presentations from jwyg. Below are some lovely lofi graphical notes from Anna Lena Schiller: It was [...] Related posts:

  1. Data Driven Journalism, Amsterdam, 24th August 2010
  2. Data Journalism Meetup, Berlin, 1st September 2010
  3. Interview with European Journalism Centre on Data Driven Journalism

August 30, 2010 05:32 PM

Public Knowledge - Blogging, Events, and Action Alerts

PK In the Know Podcast: Interview with WFMU's Ken Freedman

A transcript is available here. You can download and listen to the audio by clicking here (MP3) or stream it using the player below:

Want to subscribe to our podcast? Click here for the MP3 feed and here for the mixed audio/video feed.

read more

August 30, 2010 04:39 PM

Open Hardware Summit 2010

Welcome

8:30 am Breakfast
9:30 am Welcome and Opening Notes

WHY DO OPEN HARDWARE?

10:00 am: Limor Fried, Adafruit
10:30 am: Gerald Coley, Texas Instruments & Beagle Board
11:00 am: Bruce Perens, founder: OSI
11:30 am: John Wilbanks, Creative Commons

12:00 am: Institutional Sprint talks

•    Amanda Mc Donald Crowley, EYEBEAM
•    Jim Barkley & Sam Sayer, MITRE: “ARx: Almost-Ready-to-Anthing”
•    Rich Gibson, NASA

LUNCH

12:30 – 1:30 pm: Lunch (will be provided)

read more

August 30, 2010 03:59 PM

The Digital Broadband Migration: The Dynamics of Disruptive Innovation

February 13, 2011 (All day) - February 14, 2011 (All day)

The conference will begin with a tutorial overview of the evolution of the Internet, including recent disruptive developments. The first panel will put these developments in perspective by addressing such questions as: (1) what creates the necessary conditions for innovation in networked industries; (2) how those conditions can be cultivated; and (3) what conditions tend to smother rather than encourage innovation?

University of Colorado-Boulder
February 13, 2011 - February 14, 2011

Click here for more information.

August 30, 2010 03:55 PM

Elphel Development Blog

Elphel-Eyesis, assembled

Elphel-Eyesis 1

On July 8,  we have the first panoramic camera completely assembled and ready for the test ride. The total height is 1300 mm [4' 3"]; it weighs  10 kg or about 22 lbs . The power consumption is 36W when camera is in operation, measured at the AC (110/220VAC) input. Camera head has eight  5 Mpix Color sensors around and one pointing up, with the full resolution of ~38 MPix  (45 MPix before stitching).  The data storage box (also waterproof) – at the bottom of the leg contains 3 swappable 2.5″ hard drives 500 GB each, which is enough to record up to 12 hours of images taken at 5 fps (max frame rate) at full resolution.  Each image is geotagged via external GPS unit attached through the sealed USB connector.

The 8 high-resolution lenses are arranged very compact (distance between entrance pupils is 29.5mm), which allows for very small parallax. The high-res Fish-eye lens is pointed to the sky.

Camera head is 210mm [8.3"] in diameter , is waterproof, contains 3 Elphel 10353 processor boards and 3 Elphel 10369 extension boards, which provide IDE, SATA, USB, RS232, and other interfaces (only SATA, USB and sync I/Os are used in Eyesis configuration). Nine sensor boards (10338D) are connected through the three 10359A multiplexer boards that provide temporary storage for the images – all 3 sensors attached to the same 10359A board are triggered simultaneously, but data is transferred to the system boards one at a time.

Camera data storage box also contains the power supply for camera and hard drives, Gigabit Ethernet switch and USB connector (IP68) for the GPS receiver. Dimensions of the box are 280mm x 120mm x170mm [11" x 4.7" x 6.7"].

Test ride images are coming soon.

total height is 1300mm (4'-3")

August 30, 2010 03:17 PM

Science Commons

University Public Access Policy Whitepaper Part 2

When we published Open Doors and Open Minds, we promised a companion piece that discusses in detail some of the legal considerations that university administrators and university general counsels may wish to consider in adopting a public access policy. I’m happy to say that this is now available. This excellent companion piece, providing a thorough [...]

August 30, 2010 02:10 PM

Open Video Conference

Announcing the Shared Film Festival at OVC

The Open Video Conference is already chock full of panels, talks, and workshops—exploring open technology, the future of mass media, and everything in between.  Today we’re pleased to announce that on both days of the Open Video Conference, the discussion around shared culture and peer-to-peer distribution will continue into the evening with the Shared Film Festival.

The Shared Film Festival at OVC is a showcase for the emerging world of free-to-share films. We’re teaming with our friends at BitTorrent, hand-picking notable films from creators who are experimenting with alternative business models and distribution methods.

Each night following OVC, we’ll screen a short film, a feature length production, and then sit down to a discussion with the filmmakers, learning about the stories behind the films, their production experiences and business strategies. Can you make a living by giving it away?

The marquee feature at the Shared Film Festival is definitely something you won’t want to miss. Check back tomorrow to get a peek at the feature lineup!

The Shared Film Festival is for both creators and audiences, and it’s free to all attendees of the Open Video Conference.

August 30, 2010 01:34 PM

Browse Blogs

Open Contact with Open Compliance Officers

There’s nothing quite like having an urgent issue to pursue with a company – a real thorn in your side – and lacking a name or phone number to contact for follow-up.   (Once upon a time, I reserved a domain name, customerfeedbackplace.com, intending to aggregate all the world’s corporate customer feedback sites in one place for consumer convenience.  But that’s a story for another day.)

August 30, 2010 01:00 PM

Software Freedom Law Center to Announce Opening of Branch in India

The Software Freedom Law Center (SFLC) will announce the opening of its new international organization in India at the upcoming Software Patents and the Commons conference in New Delhi: "Under the direction of founder Mishi Choudhary, the SFLC's India organziation will provide reliable advice to FLOSS developers about how to organize, license and protect the freedom of the software they make and distribute."

August 30, 2010 09:41 AM

Should Open Source Communities Avoid Contributor Agreements?

Simon Phipps asks whether open source communities should avoid contributor agreements: "What are "contributor agreements", why do they exist, and are they a good thing? The need often arises from the interaction with open source of certain approaches to business. They serve a need of those approaches, but they can come at a significant cost to the health of the project."

August 30, 2010 09:37 AM

Linked Data Blog Aggregator

A Brief Survey of Ontology Development Methodologies

The Recent Pace of Ontology Development Appears to Have Waned

The development of ontologies goes by the names of ontology engineering or ontology building, and can also be investigated under the rubric of ontology learning. This paper summarizes key papers and links to this topic [18].

For the last twenty years there have been many methods put forward for how to develop ontologies. These methodological activities have actually diminished somewhat in recent years.

The main thrust of the papers listed herein is on domain ontologies, which model particular domains or topic areas. (As opposed to reference, upper or theoretical ontologies, which are more general or encompassing.) Also, little commentary is offered on any of the individual methodologies; please see the referenced papers for more details.

General Surveys

One of the first comprehensive surveys was done by Jones et al. in 1998 [1]. This study began to elucidate common stages and noted there are typically separate stages to produce first an informal description of the ontology and then its formal embodiment in an ontology language. The existence of these two descriptions is an important characteristic of many ontologies, with the informal description often carrying through to the formal description.

The next major survey was done by Corcho et al. in 2003 [2]. This built on the earlier Jones survey and added more recent methods. The survey also characterized the methods by tools and tool readiness.

More recently the work of Simperl and her colleagues has focused on empirical results of ontology costing and related topics. This series has been the richest source of methodology insight in recent years [3, 4, 5, 6]. More on this work is described below.

Though not a survey of methods, one of the more attainable descriptions of ontology building is Noy and McGuinness’ well-known Ontology Development 101 [7]. Also really helpful are Alan Rector’s various lecture slides on ontology building [8].

However, one general observation is that the pace of new methodology development seems to have waned in the past five years or so. This does not appear to be the result of an accepted methodology having emerged.

Some Specific Methodologies

Some of the leading methodologies, presented in rough order from the oldest to newest, are as follows:

Please note that many individual projects also describe their specific methodologies; these are purposefully not included. In addition, Ensan and Du look at some specific ontology frameworks (e.g., PROMPT, OntoLearn, etc.) from a domain-specific perspective [17].

Some Flowcharts

Here is the general methodology as presented in the various Simperl et al. papers [c.f., Fig. 1 in 3]:

Ontology Engineering from Simperl et al.

The Corcho et al. survey also presented a general view of the tools plus framework necessary for a complete ontology engineering environment [Fig. 4 from 2]:

Ontology Tools and Framework from Corcho et al.There are more examples that show ontology development workflows. Here is one again from the Simperl et al. efforts [Fig. 2 in 5]:

Ontology Learning Flowchart from Simperl et al.However, what is most striking about the review of the literature is the paucity of methodology figures and the generality of those that do exist. From this basis, it is unclear what the degree of use is for real, actionable methods.

Best Practices Observations

The Simperl and Tempich paper [3], besides being a rich source of references, also provides some recommended best practices based on their comparative survey. These are:

General Recommendations

Process Recommendations

Organizational Recommendations

Technological Recommendations

Summary of Observations

This review has not set out to characterize specific methodologies, nor their strengths and weaknesses. Yet the research seems to indicate this state of methodology development in the field:


[1] D.M. Jones, T.J.M. Bench-Caponand, P.R.S. Visser, 1998.“Methodologies for Ontology Development,” in Proceedings of the IT and KNOWS Conference of the 15th FIP World Computer Congress, 1998. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.2437&rep=rep1&type=pdf. [2] O. Corcho, M. Fernandez and A. Gomez-Perez, 2003. “Methodologies, Tools and Languages for Building Ontologies: Where is the Meeting Point?,” in Data & Knowledge Engineering 46, 2003. See http://www.dia.fi.upm.es/~ocorcho/documents/DKE2003_CorchoEtAl.pdf. [3] Elena Paslaru Bontas Simperl and Christoph Tempich, 2006. Ontology Engineering: A Reality Check, in Proceedings of the 5th International Conference on Ontologies, Databases, and Applications of Semantics ODBASE2006, 2006. See http://citeseerx.ist.psu.edu/icons/pdf.gif;jsessionid=DE3414C0282C76F0EA787A06039941D2. [4] Elena Paslaru Bontas Simperl, Christoph Tempich, and York Sure, 2006. “ONTOCOM: A Cost Estimation Model for Ontology Engineering,” presented at ISWC 2006; see http://ontocom.ag-nbi.de/docs/iswc2006.pdf. [5] Elena Simperl, Christoph Tempich and Denny Vrandečić, 2008. “A Methodology for Ontology Learning,” in Frontiers in Artificial Intelligence and Applications 167 from the Proceedings of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 225-249, 2008. See http://wtlab.um.ac.ir/parameters/wtlab/filemanager/resources/Ontology%20Learning/ONTOLOGY%20LEARNING%20AND%20POPULATION%20BRIDGING% 20THE%20GAP%20BETWEEN%20TEXT%20AND%20KNOWLEDGE.pdf#page=241. [6] Elena Simperl, Malgorzata Mochol and Tobias Burger, 2010. “Achieving Maturity: the State of Practice in Ontology Engineering in 2009,” in International Journal of Computer Science and Applications, 7(1), pp. 45 – 65, 2010. See http://www.tmrfindia.org/ijcsa/v7i13.pdf. [7] Natalya F. Noy and Deborah L. McGuinness, 2001. “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford University Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. See http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html. [8] See http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.pdf; http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Lect-2-Ontology-building-2007.ppt; or http://www.cs.man.ac.uk/~rector/modules/CS646/Lecture-Handouts/Ontology-bulding-2005-Lect-5.ppt. [9] Stephen L. Reed and Douglas B. Lenat, 2002. Mapping Ontologies into Cyc, paper presented at AAAI 2002 Conference Workshop on Ontologies For The Semantic Web, Edmonton, Canada, July 2002. See http://www.cyc.com/doc/white_papers/mapping-ontologies-into-cyc_v31.pdf . Also, as presented by Doug Foxvog, Ontology Mapping with Cyc, at WMSO, June 14, 2004; see www.wsmo.org/wsml/papers/presentations/Ontology%20Mapping%20at%20Cycorp.ppt. Also, see Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock, 2007. “Autonomous Classification of Knowledge into an Ontology,” in The 20th International FLAIRS Conference (FLAIRS), Key West, Florida, May 2007. See http://www.cyc.com/doc/white_papers/FLAIRS07-AutoClassificationIntoAnOntology.pdf. [10] M. Gruninger and M.S. Fox, 1994. “The Design and Evaluation of Ontologies for Enterprise Engineering”, Workshop on Implemented Ontologies, European Conference on Artificial Intelligence 1994, Amsterdam, NL. See http://stl.mie.utoronto.ca/publications/gruninger-onto-ecai94.pdf. [11] KBSI, 1994. “The IDEF5 Ontology Description Capture Method Overview”, Knowledge Based Systems, Inc. (KBSI) Report, Texas. The report describes the stages of: 1) organizing and scoping; 2) data collection; 3) data analysis; 4) initial ontology development; and 5) ontology refinement and validation. See http://en.wikipedia.org/wiki/IDEF5. [12] A. Gangemi, G. Steve and F. Giacomelli, 1996. “ONIONS: An Ontological Methodology for Taxonomic Knowledge Integration”, ECAI-96 Workshop on Ontological Engineering, Budapest, August 13th. See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3972&rep=rep1&type=pdf. [13] The COINS approach was developed by Madnick et al. over the past two decades or so at the MIT Sloan School of Management. See further http://web.mit.edu/smadnick/www/wp/CISL-Sloan%20WP%20spreadsheet.htm for a listing of papers from this program; some are use cases, and some are architecture-related. For the most detailed treatment, see Aykut Firat, 2003. Information Integration Using Contextual Knowledge and Ontology Merging, Ph.D. Thesis for the Sloan School of Management, MIT, 151 pp. See http://www.mit.edu/~bgrosof/paps/phd-thesis-aykut-firat.pdf. [14] M. Fernandez, A. Gomez-Perez and N. Juristo, 1997. “METHONTOLOGY: From Ontological Art Towards Ontological Engineering”, AAAI-97 Spring Symposium on Ontological Engineering, Stanford University, March 24-26th, 1997. [15] York Sure, Christoph Tempich and Denny Vrandecic , 2006. “Ontology Engineering Methodologies,” in Semantic Web Technologies: Trends and Research in Ontology-based Systems, pp. 171-187, Wiley. The general phases of the approach are: 1) feasibility study; 2) kickoff; 3) refinement; 4) evaluation; and 5) application and evolution. [16] A. De Nicola, M. Missikoff, R. Navigli, 2009. “A Software Engineering Approach to Ontology Building”. Information Systems, 34(2), Elsevier, 2009, pp. 258-275. [17] Faezeh Ensan and Weichang Du, 2007. Towards Domain-Centric Ontology Development and Maintenance Frameworks; see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.8915&rep=rep1&type=pdf. [18] This document is permanently archived on the OpenStructs TechWiki. This document is part of a current series on ontology development and tools to be completed over the coming weeks.

August 30, 2010 05:53 AM

August 29, 2010

Browse Blogs

Let a thousand flowers bloom...or be trampled under foot?

It's 4 a.m., dark outside, the phone rings, your mobile goes off, you're in a convention hotel an ocean away from home in a different time zone. The server's fallen over, you need to bounce it remotely from a thousand miles away. You have to take the server down and bring it back up then restart the application. Good job the hotel has a connection and you have a signal.

August 29, 2010 08:32 PM

EFF.org Updates

Good News: Security Researcher Released on Bail

Hari Prasad, the Indian security researcher arrested for allegedly stealing an electronic voting machine, has been released on bail.

Earlier this year, an anonymous source gave the machine to Prasad and a team of researchers, who discovered critical security flaws. Under questioning by authorities last weekend, Prasad refused to divulge the identity of the source who gave them the machine. He was then arrested and reportedly charged with theft and trespass on the theory that he stole the machine himself.

According to the Indian news agency PTI, the magistrate who released Prasad on bail noted that "no offence was disclosed with Hari Prasad's arrest and even if it was assumed that [the electronic voting machine] was stolen it appears that there was no dishonest intention on his part...he was trying to show how [electronic voting] machines can be tampered with."

The court reportedly also asked the Election Commission of India to confirm or disprove Prasad's claim that the country's electronic voting machines can be compromised. If Prasad's claims are false, action could be taken against him, the magistrate said.

August 29, 2010 01:00 AM

August 28, 2010

tesseract-ocr Google Group

Init() returning -1

When using the following code:

tesseract::TessBaseAPI tess;
int result = tesseract.Init(argv[0], lang);

Init will return -1, indication that something went wrong. I know the
tessdata is in the right location (if I move it I get an actual error
message), but I can't seem to figure out why Init() is not working

August 28, 2010 08:54 PM

Koha Library Software Community

Book chapters proposals about Koha

Dear friends:

We are working on the development of a reference book on library automation and opac 2.0, entitled “Library Automation and OPAC 2.0: Information Access and Services in the 2.0 Landscape.” This book will be published by IGI Global in 2011. We believe that you may be interested in participating, so we encourage you to submit proposals about Koha developments and case studies in accordance with the requirements and the thematic areas set out in
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=10166&copyownerid=12444
or
http://igi-global.com/AuthorsEditors/AuthorEditorResources/CallForBookChapters/CallForChapterDetails.aspx?CallForContentId=4ae6e1c4-904b-4d83-8a4a-9ece2171fccb

Thanks for your attention,

Jesús

August 28, 2010 08:47 PM

tesseract-ocr Google Group

Tesseract Training Problem (under Mac)

Hi All,

Currently I am trying to use Tesseract(2.04) to recognize my own data,
with Mac OS X Snow Leopard.
I find this [link]
and I am trying to follow this tutorial.
My questions are:
1. I already have my train.tif ready, but I am not sure where I should

August 28, 2010 06:45 AM

August 27, 2010

Planet Drumbeat

The reviews are in: Drumbeat’s “Popcorn” is tasty


Brett Gaylor (left) and the WebMadeMovies community released their first public demo of "popcorn" last week -- and the reviews are pretty sweet.

read more

August 27, 2010 07:28 PM

tesseract-ocr Google Group

i want to add my own language

how can i contribute to tesseract-ocr ? i wish to add my Bengali
language to the OCR? or does it already exist? If so then plz tell me
how to use that.

If you show me some way to train tesseract for Bengali then it would
be great

August 27, 2010 03:47 PM

natural language processing blog

Calibrating Reviews and Ratings

NIPS decision are going out soon, and then we're done with submitting and reviewing for a blessed few months. Except for journals, of course.

If you're not interested in paper reviews, but are interested in sentiment analysis, please skip the first two paragraphs :).

One thing that anyone who has ever area chaired, or probably even ever reviewed, has noticed is that different people have different "baseline" ratings. Conferences try to adjust for this, for instance NIPS defines their 1-10 rating scale as something like "8 = Top 50% of papers accepted to NIPS" or something like that. Even so, some people are just harsher than others in scoring, and it seems like the area chair's job to calibrate for this. (For instance, I know I tend to be fairly harsh -- I probably only give one 5 (out of 5) for every ten papers I review, and I probably give two or three 1s in the same size batch. I have friends who never give a one -- except in the case of something just being wrong -- and often give 5s. Perhaps I should be nicer; I know CS tends to be harder on itself than other fiends.) As an aside, this is one reason why I'm generally in favor of fewer reviewers and more reviews per reviewer: it allows easier calibration.

There's also the issue of areas. Some areas simply seem to be harder to get papers into than others (which can lead to some gaming of the system). For instance, if I have a "new machine learning technique applied to parsing," do I want it reviewed by parsing people or machine learning people? How do you calibrate across areas, other than by some form of affirmative action for less-represented areas?

A similar phenomenon occurs in sentiment analysis, as was pointed out to me at ACL this year by Franz Och. The example he gives is very nice. If you go to TripAdvisor and look up The French Laundry, which is definitely one of the best restaurants in the U.S. (some people say the best), you'll see that it got 4.0/5.0 stars, and a 79% recommendation. On the other hand, if you look up In'N'Out Burger, a LA-based burger chain (which, having grown up in LA, was admittedly one of my favorite places to eat in high school, back when I ate stuff like that) you see another 4.0/5.0 stars and a 95% recommendation.

So now, we train a machine learning system to predict that the rating for The French Laundry is 79% and In'N'Out Burger is 95%. And we expect this to work?!

Probably the main issue here is calibrating for expectations. As a teacher, I've figured out quickly that managing student expectations is a big part of getting good teaching reviews. If you go to In'N'Out, and have expectations for a Big Mac, you'll be pleasantly surprised. If you go to The French Laundry with expectations of having a meal worth selling your soul, your children's souls, etc., for, then you'll probably be disappointed (though I can't really say: I've never been).

One way that a similar problem has been dealt with on Hotels.com is that they'll show you ratings for the hotel you're looking at, and statistics of ratings for other hotels within a 10 mile radius (or something). You could do something similar for restaurants, though distance probably isn't the right categorization: maybe price. For "$", In'N'Out is probably near the top, and for "$$$$" The French Laundry probably is.

(Anticipating comments, I don't think this is just an "aspect" issue. I don't care how bad your palate is, even just considering the "quality of food" aspect, Laundry has to trump In'N'Out by a large margin.)

I think the problem is that in all of these cases -- papers, restaurants, hotels -- and others (movies, books, etc.) there simply isn't a total order on the "quality" of the objects you're looking at. (For instance, as soon as a book becomes a best seller, or is advocated by Oprah, I am probably less likely to read it.) There is maybe a situation-depend order, and the distance to hotel, or "$" rating, or area classes are heuristics for describing this "situation." Bit without knowing the situation, or having a way to approximate it, I worry that we might be entering a garbage-in-garbage-out scenario here.

August 27, 2010 12:14 PM

Browse Blogs

Sun RPC is finally free software

Tom Callaway reports that all of the Sun RPC code (which is part of glibc) has finally been relicensed under a free software license: "So, we restarted the effort with Oracle, and on August 18, 2010, Wim Coekaerts, on behalf of Oracle America, gave permission for the remaining files that we knew about under the Sun RPC license (netkit-rusers, krb5, and glibc) to be relicensed under the 3 clause BSD license."

August 27, 2010 09:24 AM

Planet Drumbeat

Mark Surman: 10 days of freedom in Barcelona

I just had a fun breakfast with Simona Levi from ExGAE/ / oXcars. What I learned: Learning, Freedom and the Web isn’t the only interesting thing happening in Barcelona two months from now. There are at least seven open internet / open education / free culture events happening over the span of 10 days.

Between October 28 and November 6, Barcelona will host: the 2000 person oXcars free culture festival; the Free Culture Forum; the P2PU summit; Open Education 2010; Drumbeat Learning, Freedom and the Web; an open ed play day in the Raval; and possibly a Communia meeting. Phew.

We should find a way to shout and promote all of this. Barcelona will be the global epicentre of free culture / open education / open web stuff for 10 days this fall! We need a phrase or a name for it. ‘10 days of freedom‘? ‘Barcelona abierto‘? Not sure, but Simona and I agreed to call out for suggestions. If you have ideas, post them below.

PS. goes w/o saying -> book an extended trip to Barcelona if you can.

Filed under: drumbeat, education, mozilla, open, openeverything

August 27, 2010 09:04 AM

calibre Changelog

calibre 0.7.16

New Features

Bug Fixes

August 27, 2010 06:00 AM

EFF.org Updates

Colbert's Word: Control-Self-Delete

Just a few weeks after his interview with EFF Legal Director Cindy Cohn, American hero Stephen Colbert has returned to the subject of digital rights. And in his show on Tuesday, he came up with a great solution to the problem of privacy and online social networks: Control-Self-Delete.

The Colbert Report Mon - Thurs 11:30pm / 10:30c
The Word - Control-Self-Delete
www.colbertnation.com
Colbert Report Full Episodes 2010 Election Fox News

As Colbert suggests, the CEOs of Google and Facebook can be astonishingly tone deaf when it comes to the question of the privacy of their customers. As these experts in social media ought to know, the fact that a person chooses to share some information about themselves online is no indication that they prefer to share everything — nor does it indicate that control of personal data is not something they care deeply about.
">
">
">Study
after study has shown the opposite to be true: users care about privacy, and demand control of their own data.

We like Colbert's basic point, saved for the end of this clip: if anyone should change their behavior to address the problem of online privacy, it isn't young people who have uploaded some racy pics — it's the companies that have made themselves the guardians of our personal data.

August 27, 2010 12:47 AM

August 26, 2010

EFF.org Updates

Facebook Should Stop Censoring Marijuana Legalization Campaign Ads

Facebook is facing down another embarrassing episode of censorship this week after refusing to show ads submitted by the Just Say Now marijuana legalization campaign. The gag is an important reminder that social networks like Facebook — while useful, interesting, and pretty — are "walled gardens" with overseers whose interests can overwrite free speech, open communication, and in this case, essential political debate. (In this they have something in common with Apple.)

Most recently, Facebook was caught censoring mentions of Power.com, an online tool designed to help users collect their information from Facebook to facilitate migration to other social networks. To this day, users are still blocked from sending messages or posting status updates containing the word "Power.com," preventing users from spreading the word about a convenient way to "make the move" to Orkut, or LinkedIn, or any other social networking service that may crop up to compete. The block even stopped law professor Eric Goldman from commenting on Facebook’s lawsuit against Power.com (Disclosure: EFF filed an amicus brief in support of Power in that case).

Facebook's censorship for anticompetitive reasons is petty and lame to be sure, but silencing Just Say Now's marijuana legalization ad campaign is even worse. Voters in various districts nationwide will have to make important political decisions about marijuana this year (California's Proposition 19 is one example). Facebook's decision, reportedly an attempt to be consistent with its ad policies restricting smoking and/or marijuana-related content, is instead primarily silencing an important, motivated voice in a politically significant debate.

Facebook should lift the ban and show Just Say Now's political ads. For better or worse, Facebook has become a important means of communication and organization for candidates and political campaigns. In this role, Facebook functions best as a neutral platform, hosting the debate without entering it. Whether or not Facebook wants to restrict depictions of smoking in commercial ads, it should not prohibit the open and robust political debate central to the value and promise of the Internet.

August 26, 2010 11:08 PM

The Freemap blog, revisited

OpenTrailView: Route making

A significant update since my last post: you can now create and modify routes on OTV’s main page. As you’re probably aware, the key thing about OTV is to allow contributors to connect photos together, to make a route of interlinked photos which end-users will be able to walk along to create a StreetView like experience. The essentials of this are now done – you can create a new route (select “New route” on the main map page) by connecting together existing photos, and you can also add photos to an existing route by selecting “Move” on the main page and dragging the chosen photo onto a route. More information on the Howto page.

So do have a go contributing some of your own photos and making routes. Being in development, the odd bug could well come up so do let me know if you’re having problems.

As already suggested, next thing will be to start working on a prototype viewer for end-users though a range of other things like work commitments, moving house and a holiday are going to be occupying most of my time for the next three weeks or so, so it *may* be some time before the next update. But don’t go away, in the autumn and winter months I’ll hopefully be doing a fair bit of OTV development!

August 26, 2010 10:22 PM

Planet Drumbeat

Drumbeat Festival: registration is now open!

Registration for the 2010 Mozilla Drumbeat Festival is now open! Join teachers, learners and technologists from around the world November 3 – 5 in Barcelona to teach, hack, shape and invent the future of education and the web.

read more

August 26, 2010 06:06 PM

BioMed Central

Facilitating standardized genome annotations

Faster and more reliable genome sequencing has meant that the number of personal genome sequences available is increasing rapidly, yet the analysis of personal human genome sequences has been hampered by the lack of a standard file format to facilitate comparative analyses. In this month’s issue of Genome Biology, Karen Eilbeck and colleagues present GVF, the Genome Variation Format. GVF is an extension of the already widely-used GFF3 standard for describing genome annotations.  The utility of GVF is demonstrated by the analysis of the first 10 publicly-available personal human genomes. The authors term this dataset "10Gen" and hope that this will become the standard reference set to facilitate the analysis of future personal genomes.


GVF and the 10Gen dataset are available at http://www.sequenceontology.org/gvf.html and are also included with the article published on the Genome Biology website here.

August 26, 2010 04:27 PM

Public Knowledge - Blogging, Events, and Action Alerts

Public Knowledge Statement on GAO Cellular Industry Report

For Immediate Release:  August 26, 2010

Earlier today, the Government Accountability Office released a report, “Enhanced Data Collection Could Help FCC Better Monitor Competition in the Wireless Industry.”  A copy of the report is here.

The following statement is attributed to Gigi B. Sohn, president and co-founder of Public Knowledge:

“Today’s GAO report adds more evidence to the argument that any rules governing an open Internet should apply to the wireless sector as well as to the wired.  The report paints a disturbing picture of an industry in which the top four carriers control 90 percent of the market, and industry consolidation is strangling smaller, regional carriers.

read more

August 26, 2010 03:46 PM

tesseract-ocr Google Group

math formulas

Hi,

I need an open OCR library which is able to scan complex printed math
formulas (for example some formulas which were generated via LaTeX). I
want to get some LaTeX-like output (or just some AST-like data).

Can Tesseract do this? Is there something like this already? Or are
current OCR technics just able to parse line-oriented text?

August 26, 2010 03:27 PM

BioMed Central

Arthritis Research & Therapy – published online only from 2011

The contents of the last regular print edition of Arthritis Research & Therapy will be finalized at the end of 2010, which marks the latest evolution of the journal and reflects the undeniable shift to electronic communication of science in the past decade. The Editors-in-Chief, Prof Peter Lipsky and Prof Sir Ravinder Maini, discuss in an editorial the reasons behind, and opportunities presented by, the journal’s decision to become an exclusively online publication.

Although BioMed Central was the first commercial open access publisher – and the Internet is fundamental to open access  – BioMed Central has continued to publish a small but decreasing number of print journals, until now.

Arthritis Research & Therapy, first published by Current Science Ltd in 1999, was conceived with a strategy to take full advantage the benefits of online publishing. It has previously made innovative decisions in the rheumatology community, such as making all research open access and, latterly, publishing only the abstracts of research articles in print to help remove limitations to article length and to reduce publication times. This move to online-only publication will benefit readers, as they will see more cutting-edge review articles, and authors, who will no longer be faced with the choice of paying for color figures in non-research articles, as well as further limiting the environmental impact of the journal.

We expect that more innovations in rheumatology research publishing will be facilitated by the journal’s transfer to BioMed Central’s newly-designed journal platform in the coming months, and we will be communicating with the journal’s registered users via an online survey to establish what other online features would most benefit this rapidly-changing field.

By innovation and investment in new services for our readers, authors and reviewers we hope the journal will continue to readily drive and adapt to the change (or is that disruption?) the Internet has caused to publishing arthritis and rheumatic autoimmune disease research.

August 26, 2010 02:12 PM

Global Text Project

Dr. Jim Feher, GTP textbook author, talks about working with Global Text Project to create and publish an open textbook

Associate Editor: Tell us your perspective on creating your textbook, Digital Logic with Laboratory Exercises, with Global Text Project.Dr. Jim Feher: What can I say, I may be biased, but I think the Global Text Project (GTP) is just a fantastic organization. I've always been a huge proponent of open source software and the free exchange of ideas that is made possible by using the Creative

August 26, 2010 12:26 PM

August 25, 2010

EFF.org Updates

Musopen Wants to Give Classical Music to the Public Domain

Music lovers take note: the classical music archive Musopen needs your help to liberate some classic symphonies from copyright entanglement. Museopen is looking to solve a difficult problem: while symphonies written by Beethoven, Brahms, Sibelius, and Tchaikovsky are in the public domain, many modern arrangements and sound recordings of those works are copyrighted. That means that even after purchasing a CD or collection of MP3s of this music, you may not be able to freely exercise all the rights you'd associate with works in the public domain, like sharing the music using a peer-to-peer network or using the music in a film project.

To fix this, Musopen is asking backers to join an effort to hire a world-class orchestra to record sublime digital performances of the symphonies by the composers mentioned above. Musopen will then relinquish all rights to the recordings, giving the public the freedom to experience these works in full: to download, share, derive, and remix without limit. The fundraising campaign is taking place on Kickstarter, a site where users can pledge money to various creative projects. (Users pledge an amount towards a project, but the money doesn't actually go to the project unless the specified funding goal is reached. Kickstarter has a great explanation for their "all-or-nothing funding" design on their FAQ.)

It’s too bad such seminal, cultural works have been effectively buried by copyright interests — despite their age, ubiquity, and importance. (Note problems like this are exacerbated by discrepancies in international laws that create different "public domains" that copyright owners can exploit to stop online archives.) The Musopen campaign presents a creative solution that could help ensure that such essential music is preserved and shared for generations to come. Music lovers and copyfighters — vote with your wallet and support Museopen's work!

August 25, 2010 08:33 PM

EFF's Cindy Cohn Wins IP Vanguard Award from State Bar of California

We're pleased to announce that EFF's Legal Director, Cindy Cohn, has won a 2010 Intellectual Property Institute Vanguard Award from the State Bar of California.

Cindy was one of four legal professionals honored for spearheading new developments in the world of intellectual property. We're proud to see the work that we do to preserve balance in copyright, trademark, and patent law recognized, and we'll continue to fight for the fans, the tinkerers, independent journalists and bloggers, and consumers.

The 2nd Annual IP Vanguard Award will be presented to Cindy during an awards Luncheon on Friday, October 29, at the 2010 Annual IP Institute meeting in Napa, California.

August 25, 2010 08:33 PM

EFF Seeks to Help Righthaven Defendants

The Electronic Frontier Foundation is seeking to assist defendants in the Righthaven copyright troll lawsuits. Righthaven, founded in March of 2010, files hundreds of copyright infringement lawsuits on behalf of newspaper publishers against bloggers who make use of news content without permission. To that end, Righthaven searches the internet for stories and parts of stories from the newspapers that they represent. Once they find content that has been re-published, Righthaven purchases the copyright to the article and sues the owner of the blog.

Just like the US Copyright Group shakedowns, and the RIAA shakedowns of the recent past, Righthaven relies on the threat of enormous statutory damages associated with the Copyright Act to scare defendants, often individual bloggers operating non-commercial websites, into a quick settlement, reportedly ranging from two to five thousand dollars. The Righthaven lawsuits are of particular concern because they sometimes target the operators of political websites who re-publish newspaper stories, chilling political speech. Righthaven has also targeted the newspaper's source for the very articles allegedly infringed.

If you are the target for a Righthaven lawsuit in need of representation, please contact Eva Galperin at eva@eff.org. Please understand that we have a relatively small number of very hard-working attorneys, so we do not have the resources to defend everyone who asks, no matter how deserving. However, if we cannot represent you directly, we will make every effort to put you in touch with attorneys who can.

August 25, 2010 06:04 PM

if:book

open peer review

The New York Times ran a front-page story yesterday about open peer review, featuring an experiment conducted by MediaCommons for The Shakespeare Quarterly using CommentPress. The article is here and the experiment itself is here. Both MediaCommons and CommentPress were born at the institute; it's exciting to see our efforts get such prominent notice.

August 25, 2010 04:05 PM

The WebM Open Media Project Blog

HTML5Rocks <video> tag tutorial

The HTML5Rocks team has published a tutorial on the HTML5 <video> tag. It includes clear explanations of the video formats supported by the various browsers and code snippets for supporting each in your pages. Check it out.

August 25, 2010 03:38 PM

August 24, 2010

MusicBrainz Blog

MusixMatch becomes our customer!

MusixMatch, a new lyrics start-up company in Bologna, Italy just signed up to be our latest customer!

MusixMatch aims to license lyrics from all over the world (and not just the usual US/western Europe suspects) and aims to make accessing and licensing lyrics much easier than it currently is. I spent three days in June with the whole MusixMatch team to figure out how MusicBrainz and MusixMatch can work together, and we found a number of interesting ways in which we can help each other.

MusixMatch needs to match lyric publisher data to music metadata like the data in MusicBrainz. This matching will enable MusixMatch to instantly license lyrics to anyone who speaks MBIDs or anyone who can match their data to our metadata. And MusicBrainz will benefit from this relationship by being able to show lyrics on MusicBrainz pages, which enriches MusicBrainz and takes us one step further on our road to being a comprehensive music encyclopedia.

However, it should be noted that MusicBrainz is not getting into the lyrics business. We will never store Lyrics in our database since those are copyrighted! We plan to fetch lyrics from the MusixMatch servers to display them on our site. MusixMatch, however, plans to offer our music metadata and lyrics in a package deal once we’ve matched our data and have lyric support on musicbrainz.org.

All of this lyrics work will come after we’ve shipped NGS — until NGS we will not adapt any new features! We are really keeping our focus on delivering NGS as soon as we’re happy with its stability.

August 24, 2010 10:21 PM

Track level Advanced Relationships for NGS

As you may know, in our Next Generation Schema release we are including support for musical Works. Our definition of a Work is a musical composition that will at some point be performed and possibly recorded, in which case it will become a Recording. In the current MusicBrainz implementation we do not have the concept of a Work and a lot of the Advanced Relationships (ARs) we have are muddled between the concept of a Work or a Recording.

This left us with the tricky task of reviewing all track level ARs and prying apart which ARs should be moved to Works and which ones to Recording. Or both! To accomplish this task, Brianfreud had compiled a list of open issues, which Ian Corvidae has adopted and nutured. Today we convened an IRC meeting with Nikki, Pete Marsh from the BBC, Ian and myself. If you’re interested in how we reached the decisions we did, please take a look at the chatlog.

Our decisions have been captured in this wiki page — please take a look at it and see if we’ve missed anything or if there is anything you disagree with. If we do not hear any feedback on this topic, we will change our NGS data conversion script to convert the data as decided in this page.

Thanks to Ian, Pete and Nikki for your help in this meeting! And big thanks also go to Murdos for all of your help in steering me towards getting all Works related issues on to the table!

August 24, 2010 10:20 PM

Public Knowledge - Blogging, Events, and Action Alerts

Verizon Defense of Veroogle Plan Falls Short

Tom Tauke, Verizon’s erudite executive vice president for public affairs, made a valiant attempt the other day to try to salvage the policy deal his company made with Google.  In a speech at the Technology Policy Institute’s telecom forum in Aspen, he brought out arguments old and new to argue why it was that an agreement forged between two big companies to their benefit should be accepted.

read more

August 24, 2010 08:56 PM

BioMed Central

Melatonin therapy effective in treating primary insomnia

Insomnia is a highly prevalent condition, with up to a third of the general adult populace thought to suffer from insomnia at some time. Insomnia is generally associated with a negative impact on day-to-day functioning and has been noted to have co-morbid associations with a variety of psychiatric conditions.

Melatonin, an endogenous sleep regulating hormone, has been mooted as a potential therapy for this debilitating condition. Endogenous melatonin production is known to decrease as a person ages, therefore it has been hypothesised that treatment with this hormone may be efficacious in treating insomnia in the elderly population. However results from studies have often proved contentious, with a lack of consistency in the results seen in differing age groups exposed to melatonin therapy.

Results from a recently published randomized controlled trial in BMC Medicine have now shed new light on this controversial subject. Wade et al examined the use of prolonged release melatonin (PRM) in sufferers of primary insomnia across a wide range of ages. Their results showed that PRM is particularly effective and well tolerated in patients aged 65 years and over, with the treatment response increasing and being sustained over a 6 month period.

If you wish to learn more about this fascinating result and an array of other high impact articles visit the BMC Medicine website.

August 24, 2010 09:53 AM

EFF.org Updates

Jury Invalidates One of EFF's 'Most Wanted' Patents

Good news in the fight against bad software patents: a jury in the Eastern District of Texas recently found the Firepond/Polaris patent (U.S. Patent No. 6,411,947) invalid. This patent was on EFF's "Most Wanted" list, targeted because it claimed nothing more than a system using natural language processing to respond to customers' online inquires by email.

EFF was not involved in this case, in which Bright Response, LLC — the technical owner of the patent — sued Google, Inc., Yahoo!, Inc. and eight other companies, alleging that Google's AdWords and Yahoo!'s Sponsored Search infringes the Firepond/Polaris patent. The jury found three of the patent's claims invalid based on the public use bar, obviousness, and for lacking written description. The jury also found that neither Google nor Yahoo! infringed those claims. Finally, the jury found the entire patent invalid due to improper inventorship.

In addition to the jury's findings, the Patent and Trademark Office is nearing completion of a reexamination of the patent, instituted by Google, that narrows the scope of that patent's claims.

"This is a great outcome and good news for people and developers who create new products related to customer service or email," said Patrick King, one of the attorneys assisting EFF on this matter.

Because the court has not yet entered a final judgment, Bright Response could still, in theory, attempt to prohibit others from using the basic natural language processing technology in its patent. EFF is on the lookout for this threatening behavior, so please make sure to let us know if you hear of any. EFF will continue to monitor this case — and the corresponding reexam — and will take action as necessary to fight any additional efforts to use the Firepond/Polaris patent to quash competition and hurt innovation.

"We are still waiting for the court case to finish up and to see if Bright Responses will appeal the decision. If any of the patent is still alive after that, we will do whatever we can to invalidate it, and allow competitors to use this simple technology, which was well known prior to the patent filing," said Gina M. Steele, another attorney assisting EFF with this matter.

The Firepond/Polaris patent was one of the ten original Top Ten Patents targeted by EFF’s Patent Busting Project, which combats the chilling effects of bad patents on the public and consumer interests. So far nine patents targeted by EFF have been busted, invalidated, narrowed, or had a reexamination granted by the Patent Office.

August 24, 2010 12:36 AM

August 23, 2010

EFF.org Updates

Steve Jobs Is Watching You: Apple Seeking to Patent Spyware

It looks like Apple, Inc., is exploring a new business opportunity: spyware and what we're calling "traitorware." While users were celebrating the new jailbreaking and unlocking exemptions, Apple was quietly preparing to apply for a patent on technology that, among other things, would allow Apple to identify and punish users who take advantage of those exemptions or otherwise tinker with their devices. This patent application does nothing short of providing a roadmap for how Apple can — and presumably will — spy on its customers and control the way its customers use Apple products. As Sony-BMG learned, spying on your customers is bad for business. And the kind of spying enabled here is especially creepy — it's not just spyware, it's "traitorware," since it is designed to allow Apple to retaliate against you if you do something Apple doesn't like.

Essentially, Apple's patent provides for a device to investigate a user's identity, ostensibly to determine if and when that user is "unauthorized," or, in other words, stolen. More specifically, the technology would allow Apple to record the voice of the device's user, take a photo of the device's user's current location or even detect and record the heartbeat of the device's user. Once an unauthorized user is identified, Apple could wipe the device and remotely store the user's "sensitive data." Apple's patent application suggests it may use the technology not just to limit "unauthorized" uses of its phones but also shut down the phone if and when it has been stolen.

However, Apple's new technology would do much more. This patented device enables Apple to secretly collect, store and potentially use sensitive biometric information about you. This is dangerous in two ways: First, it is far more than what is needed just to protect you against a lost or stolen phone. It's extremely privacy-invasive and it puts you at great risk if Apple's data on you are compromised. But it's not only the biometric data that are a concern. Second, Apple's technology includes various types of usage monitoring — also very privacy-invasive. This patented process could be used to retaliate against you if you jailbreak or tinker with your device in ways that Apple views as "unauthorized" even if it is perfectly legal under copyright law.

Here's a sample of the kinds of information Apple plans to collect:

In other words, Apple will know who you are, where you are, and what you are doing and saying and even how fast your heart is beating. In some embodiments of Apple's "invention," this information "can be gathered every time the electronic device is turned on, unlocked, or used." When an "unauthorized use" is detected, Apple can contact a "responsible party." A "responsible party" may be the device's owner, it may also be "proper authorities or the police."

Apple does not explain what it will do with all of this collected information on its users, how long it will maintain this information, how it will use this information, or if it will share this information with other third parties. We know based on long experience that if Apple collects this information, law enforcement will come for it, and may even order Apple to turn it on for reasons other than simply returning a lost phone to its owner.

This patent is downright creepy and invasive — certainly far more than would be needed to respond to the possible loss of a phone. Spyware, and its new cousin traitorware, will hurt customers and companies alike — Apple should shelve this idea before it backfires on both it and its customers.

August 23, 2010 11:55 PM

UPDATED: Security Researcher Arrested for Refusing to Disclose Anonymous Source

An Indian computer scientist was arrested this weekend when he refused to disclose an anonymous source who provided an electronic voting machine to a team of security researchers.

Hari Prasad is the managing director of Netindia Ltd., an Indian research and development firm. He and other researchers have long questioned the security of India's paperless electronic voting machines. Despite repeated reports of election irregularities and concerns about fraud, the Election Commission of India insists that the machines are tamper-proof.

In 2009, the commission publicly challenged Prasad to show that India's voting machines could be compromised, but refused to give him access to the machines to perform a review. Earlier this year, an anonymous source provided an Indian voting machine to a research team led by Prasad, Alex Halderman, and Rop Gonggrijp. The team exposed security flaws that could allow an attacker to change election results and compromise ballot secrecy. They published a paper detailing their findings, which you can read here.

According to Halderman, Prasad was questioned Saturday morning at his home in Hyderabad by authorities who wanted to know the identity of the source who gave the voting machine to the research team. Prasad was ultimately arrested and taken to Mumbai, though reportedly hadn't been charged with a crime.

This turn of events is deeply troubling. Prasad is a respected researcher who helped to discover a critical flaw in India's voting system. He and his fellow researchers would never have been able to document the weaknesses in India's voting machines without the help of their anonymous source. This is precisely why anonymity is important: it allows people to make important contributions to the public dialogue without fear of retribution.

The Election Commission of India should have given researchers access to the voting machines in the first place. Rather than attempting to persecute Prasad and the anonymous source, the government should be focusing its attention and resources on the real problem: electronic voting machines with no mechanism for accountability.

UPDATE: According to the Times of India and Reuters, Prasad has been charged in connection with the alleged theft of the voting machine studied by the research team. He has been remanded to police custody until Thursday, August 26.

August 23, 2010 10:50 PM

BioMed Central

Does genetic test allow prediction of patients’ response to tamoxifen?

Various studies have suggested that a genetic test for the efficacy of the commonly used breast cancer drug, tamoxifen, is an effective predictor of how patients will respond to the drug. Tamoxifen undergoes metabolism upon oral administration, and it is widely accepted that the majority of the anti-proliferative effects of tamoxifen occur via its active metabolites. The CYP2D6 gene plays an important role in these metabolic pathways, and a genetic test is available which establishes which variant of the CYP2D6 gene the patient has. Some experts recommend that this test should be used in clinical practice, particularly in the case of postmenopausal women.

Research published in Breast Cancer Research sheds new light on the matter. The study looked at 6640 breast cancer patients from the United Kingdom and evaluated the association between genotype and breast cancer specific survival (BCSS), finding weak evidence that the poor-metaboliser variant, CYP2D6*6, is associated with decreased BCSS. This suggests that the use of this test in a clinical setting should be avoided until larger studies confirming any associations are available.

There are currently 500,000 women in the U.S.A. taking tamoxifen, so this outcome has the potential to affect hundreds of thousands of people. This fresh evidence reflects recent doubts about the test, as an editorial published recently in the Journal of Clinical Oncology stated that "routine use should await more reliable evidence from well-designed studies."

Anita Bock
Assistant Editor - Breast Cancer Research

August 23, 2010 10:56 AM

natural language processing blog

Finite State NLP with Unlabeled Data on Both Sides

(Can you tell, by the recent frequency of posts, that I'm try not to work on getting ready for classes next week?)

[This post is based partially on some conversations with Kevin Duh, though not in the finite state models formalism.]

The finite state machine approach to NLP is very appealing (I mean both string and tree automata) because you get to build little things in isolation and then chain them together in cool ways. Kevin Knight has a great slide about how to put these things together that I can't seem to find right now, but trust me that it's awesome, especially when he explains it to you :).

The other thing that's cool about them is that because you get to build them in isolation, you can use different data sets, which means data sets with different assumptions about the existence of "labels", to build each part. For instance, to do speech to speech transliteration from English to Japanese, you might build a component system like:

English speech --A--> English phonemes --B--> Japanese phonemes --C--> Japanese speech --D--> Japanese speech LM

You'll need a language model (D) for Japanese speech, that can be trained just on acoustic Japanese signals, then parallel Japanese speech/phonemes (for C), parallel English speech/phonemes (for A) and parallel English phonemes/Japanese phonemes (for B). [Plus, of course, if you're missing any of these, EM comes to your rescue!]

Let's take a simpler example, though the point I want to make applies to long chains, too.

Suppose I want to just do translation from French to English. I build an English language model (off of monolingual English text) and then an English-to-French transducer (remember that in the noisy channel, things flip direction). For the E2F transducer, I'll need parallel English/French text, of course. The English LM gives me p(e) and the transducer gives me p(f|e), which I can put together via Bayes' rule to get something proportional to p(e|f), which will let me translate new sentences.

But, presumably, I also have lots of monolingual French text. Forgetting math for a moment, which seems to suggest that this can't help me, we can ask: why should this help?

Well, it probably won't help with my English language model, but it should be able to help with my transducer. Why? Because my transducer is supposed to give me p(f|e). If I have some French sentence in my GigaFrench corpus to which my transducer assigns zero probability (for instance, max_e p(f|e) = 0), then this is probably a sign that something bad is happening.

More generally, I feel like the following two operations should probably give roughly the same probabilities:

  1. Drawing an English sentence from the language model p(e).
  2. Picking a French sentence at random from GigaFrench, and drawing an English sentence from p(e|f), where p(e|f) is the composition of the English LM and the transducer.
If you buy this, then perhaps one thing you could do is to try to learn a transducer q(f|e) that has low KL divergence between 1 and 2, above. If you work through the (short) make, and throw away terms that are independent of the transducer, then you end up wanting to minimize [ sum_e p(e) log sum_f q(f|e) ]. Here, the sum over f is a finite sum over GigaFrench, and the sum over e is an infinite sum over positive probability English sentences given my the English LM p(e).

One could then apply something like posterior regularization (Kuzman Ganchev, Graça and Taskar) to do the learning. There's the nasty bit about how to compute these things, but that's why you get to be friends with Jason Eisner so he can tell you how to do anything you could ever want to do with finite state models.

Anyway, it seems like an interesting idea. I'm definitely not aware if anyone has tried it.

August 23, 2010 11:11 AM

Linked Data Blog Aggregator

Listing of 185 Ontology Building Tools

AI3's Ontologies category

Earlier Listing is Expanded by More than 30%

At the beginning of this year Structured Dynamics assembled a listing of ontology building tools at the request of a client. That listing was presented as The Sweet Compendium of Ontology Building Tools. Now, again because of some client and internal work, we have researched the space again and updated the listing [1].

All new tools are marked with <New> (new only means newly discovered; some had yet to be discovered in the prior listing). There are now a total of 185 tools in the listing, 31 of which are recently new, and 45 added at various times since the first release. <Newest> reflects updates — most from the developers themselves — since the original publication of this post.

Comprehensive Ontology Tools

Not Apparently in Active Use

Vocabulary Prompting Tools

Initial Ontology Development

Ontology Editing

Not Apparently in Active Use

Ontology Mapping

Not Apparently in Active Use

Ontology Visualization/Analysis

Though all are not relevant, see my post from a couple of years back on large-scale RDF graph software.

Miscellaneous Ontology Tools

Not Apparently in Active Use

[1] This listing is maintained on a permanent basis on the OpenStructsTechWiki.

August 23, 2010 05:28 AM

tesseract-ocr Google Group

Tess4J - a Java wrapper for Tesseract OCR DLL

A JNA-based wrapper for Tesseract OCR DLL, the library provides
optical character recognition (OCR) support for:

* TIFF, JPEG, GIF, PNG, and BMP image formats
* Multi-page TIFF images
* PDF document format

[link]

August 23, 2010 02:35 AM

Public Knowledge - Blogging, Events, and Action Alerts

Appreciation: W. Adam Thomas, Public Knowledge Staff Attorney

Our hearts are heavy today, having learned of the passing yesterday morning of our beloved colleague, Public Knowledge Staff Attorney Adam Thomas.  Adam was a rare individual in this town - willing to take on any task no matter how small, always upbeat, eager for feedback be it positive or negative.  But what really set Adam apart was his courage.  Just 30 years old and thrice afflicted with Medulloblastoma - a rare and highly malignant form of brain cancer - he fought and beat it each time, until it returned a fourth time just a few weeks ago with a force too strong to overcome.  

read more

August 23, 2010 12:43 AM

August 21, 2010

Open Knowledge Foundation Blog

Beginnings of an Object Description Mapper

The analogue to an Object-Relational Mapper for RDF. Helping to make OWL Description Logic accessible from Python in a way that will seem familiar to people who are accustomed to things like SQLAlchemy and Django. http://packages.python.org/ordf/odm.html Share This Related posts:CKAN 0.7 ReleasedORDF [...] Related posts:

  1. CKAN 0.7 Released
  2. ORDF - the OKFN RDF Library
  3. KForge v0.16 Released

August 21, 2010 02:42 PM

Data.gov.uk releases CKAN Drupal Module

We’re delighted to see that the data.gov.uk folks have released the code for their CKAN Drupal module. As many will know, the OKF’s CKAN powers data.gov.uk as well as over a dozen other data catalogues around the world. From the blog post: As part of the government’s ongoing work around transparency, today we are releasing some of [...] Related posts:

  1. Canadian citizen-driven data catalogue datadotgc.ca is powered by CKAN
  2. Data.gov.uk goes public - and its using CKAN!
  3. Data.gov.uk Launched - and it’s Using CKAN

August 21, 2010 12:13 PM

natural language processing blog

Readers kill blogs?

I try to avoid making meta-posts, but the timing here was just too impeccable for me to avoid a short post on something that's been bothering me for a year or so.

I actually complete agree with both points. The problem is that I worry that they are actually fairly opposed. I comment much less on other people's blogs now that I use reader, because the 10 second overhead of clicking on the blog, being redirected, entering a comment, blah blah blah, is just too high. Plus, I worry that no one (except the blog author) will see my comment, since most readers don't (by default) show comments in with posts.

Hopefully the architects behind readers will pick up on this and make these things (adding and viewing comments, within the reader -- yes, I realize that it's then not such a "reader") easier. That is, unless they want to lose out to tweets!

Until then, I'd like to encourage people to continue commenting here.

August 21, 2010 12:49 PM

tesseract-ocr Google Group

recognition languages sets? with hierarchy?

Is it possible for Tesseract to make ocr with languages put in ordered
set? I have lots of text to ocr consisting primarily of lang1, with
small portions in lang2 and lang3 (quotes and refs). It would be ideal
for Tesseract to recognise "what it can" in lang1 (e.g., to 90%
match), then switch to the lang2 for the unmatched, then to lang3.

August 21, 2010 09:12 AM

Any idea of Tesseract 3.0 release date

Dear all, I am working on a project that badly needed a 3.0 release to
support the image conversion to Chinese. I am wondering if anyone know
the release date of 3.0? Will it release before the end of the year?
Any information is greatly appreciated.

Maggie.

August 21, 2010 03:51 AM

Wikimedia Technical Blog

Usability Improvements: Final Phase of Rollout

Hi, I’m Alolita Sharma, and I’ve recently started working at the Wikimedia Foundation to help program-manage usability and feature-related software development.

I wanted to send everyone an update on Phase V of the Usability Initiative Rollout.  This is the final phase of the rollout and we are planning to deploy the usability features (the new “Vector” skin and enhanced editing features) to all remaining projects that have not yet been switched.  The release date has been set for Sep 1, 2010 at 10am PDT / 5pm UTC.

In preparation for the release, we’re doing (among other things) a push to identify and fix critical blockers.  We’re running a Central Notice on all remaining projects asking for your help to facilitate the effort by testing gadgets, extensions, and custom scripts on Vector.  We’d also like to ask readers of this blog to contribute as well.  If you’re working on one of the Phase V projects (that is, if your project is still showing the “Monbook” skin by default), please help us identify blockers by trying the beta and posting bugs either in Bugzilla (file under “Usability Initiative”) or our bug report page.

We’ve also created an Ambassadors mailing list (Wikitech-ambassadors) for anyone interested in helping coordinate or follow-up on release activities.  We will also be available on the newly created #wikimedia-dev IRC channel to respond to any questions or feedback.

To give feedback on the rollout process, please leave a comment here.

– Alolita Sharma, Features Engineering Program Manager, Wikimedia Foundation

August 21, 2010 12:03 AM

August 20, 2010

Public Knowledge - Blogging, Events, and Action Alerts

ISPs Want to Have Their First Amendment Cake and Eat it Too

While some ISPs are busy arguing to the FCC that the First Amendment makes net neutrality rules illegal, Congress is considering a bill (HR 3817) that would exempt ISPs from liability for providing fraudulent information to their customers. ISPs, of course, love this. Limitations from liability are great!

read more

August 20, 2010 08:07 PM

The Incredible Shrinking FCC

When Federal Communications Commissioner (FCC) Michael Copps issued a brief, two-sentence reaction to the news of a policy agreement between Verizon and Google over Net Neutrality, he deliberately emphasized one word.  In bold face and italics, Copps said that a “decision” had to be made, to guarantee an open Internet.

"Some will claim this announcement moves the discussion forward.  That’s one of its many problems.  It is time to move a decision forward—a decision to reassert FCC authority over broadband telecommunications, to guarantee an open Internet now and forever, and to put the interests of consumers in front of the interests of giant corporations.”

read more

August 20, 2010 07:50 PM

Open Knowledge Foundation Blog

Data Journalism Meetup, Berlin, 1st September 2010

We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows: When? 1st September 2010 Where? Fjord Office, Friedrichstrasse 210, Berlin Register? You can register here! Speakers will include: Martin Belam, The Guardian Jonathan Gray, The Open Knowledge Foundation Christian Heise, ZEIT Online Gerd [...] Related posts:

  1. Data Driven Journalism, Amsterdam, 24th August 2010
  2. Open Everything Berlin + CC Salon Berlin
  3. Slides and notes from Data Driven Journalism event

August 20, 2010 05:58 PM

Planet Drumbeat

Education for the open web fellowship: new deadline

In May, Mozilla and the Shuttleworth Foundation announced a new Education for the Open Web Fellowship. The aim is to support practical ideas that help people learn about, improve and promote the open nature of the internet, as part of our commitment to supporting leaders working at the intersection of open education and the open web.

read more

August 20, 2010 03:45 PM

Public Knowledge - Blogging, Events, and Action Alerts

Why I'm Amused Rather Than Outraged Over New "Industry Negotiations" -- And What The Democrats Need To Understand

I occassionally suspect my colleagues in the Public Interest community lack a sense of humor -- although perhaps it is simply that I am in a more relaxed frame of mind after my annual vacation from the 21st Century. I am neither surprised nor outraged at the recent news that members of the Information Technology Industry Council (ITIC) are picking up where the FCC "secret meetings" left off and trying to come up with a net neutrality consensus framework. To me, it seems rather sad and funny. My only surprise is that even in Washington, the notion of an industry trade association working with its members is anything unusual or significant. I mean, that's what industry trade associations do after all.

read more

August 20, 2010 02:36 PM

tesseract-ocr Google Group

Line of equals symbols not recognized

Using tesseract 3.00 on Opensuse 11.2. From CLI as in
tesseract file.tif file

In an image that contains a line of '=' signs the recognition is much
worse than if these lines are removed, eg:

line 1 and stuff
=======================
line 3 and stuff

line 1 will be recognized, but the second and third lines will be

August 20, 2010 11:53 AM

calibre Changelog

calibre 0.7.15

New Features

Bug Fixes

August 20, 2010 06:00 AM

August 19, 2010

if:book

hospice for publishers

One of my best friends' parents both became very ill this year. Her mother, 87, elected to have a feeding tube inserted permanently. She is confined to her bed, alone much of the time, and in constant pain, waiting for the inevitable end, which thanks to the feeding tube may be many miserable months ahead. Her father, 90, elected to enter a hospice facility where he spent his last three weeks eating yogurt, sipping the occasional last whiskey, and having long wonderful visits with his three children, their spouses and his beloved grown grandchildren. By all accounts it was a very good death.

Thinking about my friend's parents makes we wonder why their couldn't be a "hospice" option for publishers, many of whom -- my low-end guess is at least 50% -- won't survive the transition from print to networked screens.   If a publisher doesn't have the requisite vision, desire and resources to embrace digital, what's wrong with saying, "Gee, it's been a great 25, 50, 100-year run. Instead of beating our heads against a wall and dying an ugly death, why don't we go out in style." Once this difficult decision is arrived at, it would be a matter of selling the assets that can be sold, providing staff with generous severance and really helping them to find new jobs, and then at the very end giving some wonderful parties, celebrating the end of an era. A death with integrity and dignity intact.

Please understand that I make this suggestion with huge love and respect for publishers. At their best they have played a crucial role in the complex discourse that moves society forward. Like a beloved parent, there's no reason why they should suffer more than necessary at the end of a full and productive life.

August 19, 2010 11:25 PM

The WebM Open Media Project Blog

WebM Semantic Video Demo

Brett Gaylor at WebMadeMovies has posted an HTML5 demo of popcorn.js, “a javascript library for manipulating open video on the web.” The demo plays a video while using semantic data in the video to trigger machine-translated subtitles, map lookups, Twitter feeds and other elements on the page. If you’re using a WebM-enabled browser the page serves a WebM video, otherwise it serves an Ogg or MP4 video depending on the browser's capabilities.

See Brett’s post or the popcorn.js wiki page for more info. You can also download the source from the Mozilla github repo.

August 19, 2010 05:19 PM

FFmpeg VP8 Decoder Implementation

When we started the WebM project, one of our goals was to promote rapid innovation in video technology through open development. Just two months after WebM debuted, Jason Garret Glaser, Ronald Bultje and David Conrad created a VP8 video decoder implementation for FFmpeg called ffvp8.

The ffvp8 implementation decodes even faster than the WebM Project reference implementation (libvpx), and we congratulate the FFmpeg team on their achievement. It illustrates why we open-sourced VP8, and why we believe the pace of innovation in open web video technology will accelerate.

August 19, 2010 05:17 PM

natural language processing blog

Multi-task learning: should our hypothesis classes be the same?

It is almost an unspoken assumption in multitask learning (and domain adaptation) that you use the same type of classifier (or, more formally, the same hypothesis class) for all tasks. In NLP-land, this usually means that everything is a linear classifier, and the feature sets are the same for all tasks; in ML-land, this usually means that the same kernel is used for every task. In neural-networks land (ala Rich Caruana), this is enforced by the symmetric structure of the networks used.

I probably would have gone on not even considering this unspoken assumption, until a few years ago I saw a couple papers that challenged it, albeit indirectly. One was Factorizing Complex Models: A Case Study in Mention Detection by Radu (Hans) Florian, Hongyan Jing, Nanda Kambhatla and Imed Zitouni, all from IBM. They're actually considering solving tasks separately rather than jointly, but joint learning and multi-task learning are very closely related. What they see is that different features are useful for spotting entity spans, and for labeling entity types.

That year, or the next, I saw another paper (can't remember who or what -- if someone knows what I'm talking about, please comment!) that basically showed a similar thing, where a linear kernel was doing best for spotting entity spans, and a polynomial kernel was doing best for labeling the entity types (with the same feature sets, if I recall correctly).

Now, to some degree this is not surprising. If I put on my feature engineering hat, then I probably would design slightly different features for these two tasks. On the other hand, coming from a multitask learning perspective, this is surprising: if I believe that these tasks are related, shouldn't I also believe that I can do well solving them in the same hypothesis space?

This raises an important (IMO) question: if I want to allow my hypothesis classes to be different, what can I do?

One way is to punt: you can just concatenate your feature vectors and cross your fingers. Or, more nuanced, you can have some set of shared features and some set of features unique to each task. This is similar (the nuanced version, not the punting version) to what Jenny Finkel and Chris Manning did in their ACL paper this year, Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data.

An alternative approach is to let the two classifiers "talk" via unlabeled data. Although motivated differently, this was something of the idea behind my EMNLP 2008 paper on Cross-Task Knowledge-Constrained Self Training, where we run two models on unlabeled data and look for where they "agree."

A final idea that comes to mind, though I don't know if anyone has tried anything like this, would be to try to do some feature extraction over the two data sets. That is, basically think of it as a combination of multi-view learning (since we have two different hypothesis classes) and multi-task learning. Under the assumption that we have access to examples labeled for both tasks simultaneously (i.e., not the settings for either Jenny's paper or my paper), then one could do a 4-way kernel CCA, where data points are represented in terms of their task-1 kernel, task-2 kernel, task-1 label and task-2 label. This would be sort of a blending of CCA-for-multiview-learning and CCA-for-multi-task learning.

I'm not sure what the right way to go about this is, but I think it's something important to consider, especially since it's an assumption that usually goes unstated, even though empirical evidence seems to suggest it's not (always) the right assumption.

August 19, 2010 02:09 PM

Open Knowledge Foundation Blog

Vote Raw Data Now at SXSW panelpicker - ends 27 August

Announcement below — voting ends 27 August Raw Data Now: Building an Open Data Ecosystem Rufus Pollock and Jordan Hatcher of the Open Knowledge Foundation have submitted a proposal for a workshop highlighting the great work of the Open Knowledge Foundation, including Where Does My Money Go?, Open Shakespeare, CKAN, the Open Definition, and Open Data Commons [...] Related posts:

  1. Opening Up Government Data: Give it to Us Raw, Give it to Us Now
  2. Data Driven Journalism, Amsterdam, 24th August 2010
  3. Vote for ‘Where Does My Money Go?’ at the Show Us A Better Way poll!

August 19, 2010 10:59 AM

tesseract-ocr Google Group

Which revision of tesseract 3.0 for win7 64bit

Dear Sir or Madam,

I would like to know which revision of tesseract 3.0 is recommendable
to use under win7 64bit for OCR purposes at the moment? I have
recently tried several revisions: I compiled them with VS2008 in
release mode and tested the OCR functionality by running tesseract.exe
with the tif images attached to the source code. Without more ado

August 19, 2010 10:23 AM

Global Text Project

Frank W. Spencer PHD on GTP text Educational Psychology, a review

Posted August 13, on his blog (http://www.frankwspencer.com/)"As part of the Global Text Project, Kelvin Seifert and Rosemary Sutton have written Educational Psychology: Second Edition. It is a textbook, covering such topics as student development, diversity, special needs, classroom management, instructional methods, assessment, and teaching thinking skills. It is written for teachers. I'll

August 19, 2010 08:44 AM

ocropus Google Group

Appending new ground truth to the default language model

I'm trying to build my own language model by extending the default one
at /usr/local/share/ocropus/model s/default.fst. Following the example
of ocropus-linefst and fstutils, I'm doing the following:

fst = openfst.StdVectorFst.Read("/us r/local/share/ocropus/models/
default.fst")
filenames = glob.glob("training/*.gt.txt")

August 19, 2010 07:06 AM

August 18, 2010

Planet Drumbeat

Mark Surman: Brett Gaylor joins Drumbeat team

I’m very happy to announce that Brett Gaylor officially joined the Mozilla Drumbeat team earlier this month. He’ll be playing the role of project producer — leading his own Web Made Movies project and helping to find new Drumbeat projects over time. Brett will also be directing a documentary series about Mozilla and the future of the web.


Photo: CC-BY, Joi Ito

Brett’s already made great strides setting up the Web Made Movies lab initiative with Seneca College. The idea is to get filmmakers and web developers collaborating on new tech tools that shape what cinema will look like on the open web. The first project coming out of this lab is popcorn.js, which was demo’ed in early alpha at Whistler. A polished version of that demo is here:

Also, Brett has started work on a documentary where Mozillians will paint a picture of the open web that we’re building. He interviewed about a dozen people at Whistler and has a number of other shoots set up. Footage and a call for participation will start leaking out through the fall, with first episodes or edited clips coming by the end of the year.

For those haven’t heard of Brett before: he is the director of RIP: A Remix Manifesto, an awesome film on copyright and culture that has been broadcast in over 20 countries and seen by millions He also founded OpenSourceCinema.org, an experiment in applying open source principles to filmmaking which was used to get thousands of people to contribute to the making of RIP. In many ways, Web Made Movies is a continuation of the Open Source Cinema experiment.

Filed under: drumbeat, mozilla, webmademovie

August 18, 2010 09:29 PM

BioMed Central

Hereditary Angioedema: Management Consensus 2010 – a thematic series

Allergy, Asthma and Clinical Immunology has published its first thematic series, reviewing the current  consensus on the treatment of the potentially fatal condition, hereditary angioedema.

Hereditary angioedema is a rare genetic disease that causes the rapid swelling of the limbs, face, intestinal tract, larynx or trachaea. The disease, which affects 1 in 50,000 people globally, is caused when a protein called C1 inhibitor is either deficient or non-functional. The symptoms of the disease cannot be controlled by conventional treatment with antihistamines or corticosteroids, and can lead to sudden death. The thematic series reviews the current international approach to the diagnosis, treatment and management of the disease. This includes investigating the management of the disease in children, which represents 50 % of clinical cases, and in women, who are more susceptible to the symptoms because of hormonal factors. The series also incorporates a comprehensive review of past, current and potential therapies for the disease.

The articles in the series were presented at the Toronto Consensus meeting organized by the Canadian Hereditary Angioedema Network, the Canadian Society of Allergy and Clinical Immunology and the University of Calgary. A final consensus document outlining the current global guidelines for the management of hereditary angioedema was agreed and authored by scientists who attended the meeting in Toronto. 

August 18, 2010 08:17 AM

W3C Semantic Web Activity News

Public W3C Questionnaire on RDF Evolution

As has been reported earlier, W3C held an "RDF Next Steps" workshop in June 2010 and has published the Report of the Workshop in early July. That workshop discussed the possibility of an RDF Working Group. The overall goal would be to extend RDF to include some of the features that the community has identified as both desirable and important for interoperability based on experience with the 2004 version of the standard, but without having a negative effect on existing deployment efforts. The Workshop has listed a number of work items that might be of interest for such a Working Group, and has also conducted an informal poll as for the relative priority of those items (with links to the detailed description of the items themselves). As a next step, a public questionnaire has been created listing, essentially, those items (although some of them have been regrouped for a better readability). The goal of the questionnaire is to poll the Web community at large so that the upcoming charter would reflect the real needs for the years to come. So… if you are interested in the evolution of RDF, here is the chance to make your opinion heard. All the results of the questionnaire will be public. The questionnaire will stay open until the 13th of September.

August 18, 2010 04:57 AM

August 17, 2010

Open Knowledge Foundation Blog

Workshop on Open Bibliographic Data and the Public Domain

We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows: Where? Rooms 108/108a, FU Berlin, Garystr. 21, 14195 Berlin When? 7th October 2010 Registration? http://publicdomain.eventbrite.com/ Hashtag? #pdobd Notes? http://okfnpad.org/pdobd Here’s the blurb: This one day workshop will focus on open bibliographic data and the public domain. In particular it [...] Related posts:

  1. Open bibliographic data promotes knowledge of the public domain
  2. Which works fall into the public domain in 2010?
  3. Public Domain Calculators at Europeana

August 17, 2010 05:45 PM

SourceForge.net: SF.net Project News: VuFind (including full news text)

VuFind featured in Converge magazine

Converge, an online magazine focusing on technology in education, has published a feature article about VuFind. Past, present and future of the project are discussed, and several key players are quoted. Take a look here: http://bit.ly/aDhRXe .

August 17, 2010 04:42 PM

Zotero: The Next-Generation Research Tool

Zotero Basics: Getting Stuff Into Zotero

There are tons of ways to get, books, articles, web pages, and any other kind of item into Zotero. So many, in fact, that we thought we needed this to make this short screencast.  It covers six ways to get things into Zotero. You might just be surprised at how many ways there are to [...]

August 17, 2010 03:22 PM

Google Book Search Blog

Chocolate... in a nutshell!



If you thought you knew everything there is to know about chocolate, think again! This world famous decadent dessert certainly has some dark secrets of its own - a treasure that has been enriched over the past three centuries. Try the following trivia and sharpen your knowledge of the indulgent, yet exquisite confection. Check out the links and learn more about your favorite sweet on Google Books.


(Photo by Suat Eman)

Q: Which ancient civilizations were the first to discover chocolate?
A. The Aztecs and the Mayans of Central America - (The taste of chocolate has only been perfected ever since.)

Q. Where is the world’s largest chocolate museum?
A. Cologne Chocolate Museum in Germany - (Here’s where the flavours are immortalized.)

Q. In which city was the world’s largest chocolate sculpted?
A. Milan, Italy. In May, 2010, Italian chocolatier, Mirco Della Vecchia sculpted a 1.5 meters tall, Dome of Milan, to bag the Guinness World Record for the largest ever chocolate art. (Beat that!)

Q. Where is the world’s largest chocolate factory?
A. No, it’s neither Willy Wonka’s nor Charlie’s chocolate factory. It's Hershey's, in Pennsylvania.


(In 1940, an emergency ration: a Hershey’s chocolate bar, served at Fort Myers. Photo: LIFE Magazine)

Q. In which city is Ghirardelli headquartered?
A. San Leandro, California (Did I hear San Fransisco? If yes, give yourself half a point, as it was first incorporated and formerly headquartered in San Francisco.)

Q. Which country is the largest consumer of chocolate?
A. Switzerland... Swiss Chocolate, anyone?

Q. Which country is the largest cocoa bean producer?
A. Côte d'Ivoire (44% of all the cocoa beans exported in the world come from this West African nation.)

Q. What is the scientific name for chocolate?
A. Theobroma cacao (Try saying that five times fast!)

Q. Name a beneficial health effect of chocolate?
A. Chocolate enhances the circulatory system. (Flavanoids in chocolate increase antioxidants in the blood, protecting against heart damage.)

Q. Name the author of the best-selling book, Chocolat, which was later made into a Hollywood blockbuster starring Juliette Binoche and Johnny Depp?
A. Joanne Harris (Why is it that the book is always better than the movie?)

Scores:

August 17, 2010 01:34 PM

Happy Birthday, Emily Brontë!

Portrait of Emily Jane Brontë (Source: LIFE Magazine)

No coward soul is mine,
No trembler in the world's storm-troubled sphere:
I see Heaven's glories shine,
And faith shines equal, arming me from fear.
-- Emily Brontë

The indomitable spirit that defined the Yorkshire poet and novelist Emily Brontë also formed the very essence of the classic Wuthering Heights -- her only novel.

In an age when contemporary English society refused to take women’s contributions to literature seriously, Emily and her sisters, Charlotte and Anne, adopted ambiguous pen names to have their works published and accepted. In 1846, the Brontë sisters collaboratively published Poems by Currer, Ellis, and Acton Bell.

The Brontë sisters--Anne, Emily and Charlotte--painted by their brother Bramwell (Source: LIFE Magazine)

While Charlotte Brontë assumed the pseudonym Currer Bell and went on to write Jane Eyre, Anne Brontë settled for Acton Bell and produced Agnes Grey. Emily preferred to be called Ellis Bell in the first edition of Wuthering Heights, which was published in 1847.

And ever since, her creations of Heathcliff and Catherine have captivated audiences worldwide, making Emily Brontë not just a household name, but also a stalwart of romantic fiction. In combination, the courage and passion of her characters, the unusually innovative Gothic structure of her novel and the brilliance of her prose, enabled her to create one of the finest Romantic works.

Actors Merle Oberon and Laurence Olivier during filming of Wuthering Heights in 1939 (Source: LIFE Magazine)

Although Emily unfortunately succumbed to tuberculosis at the young age of 30, her spirit continues to live on through her works -- a tribute to her genius.

Here’s remembering you, Emily Brontë! Happy Birthday!

August 17, 2010 01:13 PM

BioMed Central

BioMed Central to take on Nature in 10K charity run

Gulliver in trainingOn September 14th, a team of runners from BioMed Central will be taking part in a 10K race against our friends (and rivals) at Nature Publishing Group. BioMed Central’s team will be raising money for  our partner charity Computer Aid International, which works to recycle computer equipment for use in developing countries.

You can support BioMed Central’s open access David as we take on the traditional publishing Goliath by sponsoring us via the BioMed Central team’s fundraising page.

Our plucky open access mascot turtle Gulliver is already in training, and he will be joined by  around 15 BioMed Central staff, all of whom are aiming to complete the course in under an hour. For the latest updates on Gulliver’s progress, or to sponsor him, see his blog and/or Facebook page.

About Computer Aid and BioMed Central
Computer Aid International provides professionally refurbished computers for reuse in education, health and not-for-profit organizations in developing countries.

Computer Aid has provided over 170,000 PCs to where they are most needed in more than 100 countries across Africa and South America, and is the world's largest and most experienced ICT for Development provider.

BioMed Central has supported Computer Aid for some time, and the funds we have raised will be used to send a container-load of reconditioned computer equipment to Kenyatta University in Nairobi later this summer. You can also support Computer Aid by buying a BioMed Central journal T-shirt

Read more about Computer Aid’s activities in this recent guest blog post by Computer Aid’s Stephen Campbell.

August 17, 2010 09:47 AM

August 16, 2010

Dublin Core Metadata Initiative

Further details added to DC-2010 program

2010-08-16, Further details have been added to the program and the description of the sessions at DC-2010, the tenth International Conference on Dublin Core and Metadata Applications, to be held in Pittsburgh, PA, USA, 20-22 October 2010. Additional details of the sessions of DCMI Communities and Task Groups will be posted on the DCMI mailing lists and Wikis. Online registration is open; early-bird discount is available until 10 September 2010.

August 16, 2010 11:59 PM

Presentation opportunities for DCMI Partners at DC-2010

2010-08-16, This year, we will be offering presentation opportunities at DC-2010 for DCMI Partners. If your organization is interested to become a DCMI Partner and present your product or service that is built on Dublin Core metadata, please contact DCMI at info@dublincore.org with "Partnership" in the subject line.

August 16, 2010 11:59 PM

OStatus

OStatus interview with Tyler Gillies

Today's interviewee for the OStatus interview is Tyler Gillies, the resident hacker at ReadWriteWeb.com.

Give us an overview of your software. What is it, and what does it do?

Tyler: I've been using StatusNet since the first day identi.ca was released. You can find me at Tyler or tjgillies.

My first implementation of OStatus was robin. You can find it at robin. It is a rails based app that implements all the main features of status.net (webfinger/salmon/pubsubhubbub, etc), however, it is not currently being actively maintained. Please feel free to fork and commit patches. I have a plan to re-implement robin using "upgraded" technology, probably nodejs and redis.

Why did you decide to implement social web federation?

Tyler: I chose to pick social web federation because I only had two choices. Either federate, or don't. I didn't want to live in a walled garden. (I own http://opengard.in)

What problems did you have?

Tyler: Honestly, the specs on salmon are a little weird, and it was frustrating back then, because status.net, me and cliqset.com were the only ones who actually had a working implementation of salmon, so there wasn't a big support community. Also the documentation for the ruby ssl library is almost non existent.

How can users try out OStatus in your software?

Tyler: I don't currently have a website up running robin, but if they are familiar with rails, they can download it and try to get it running themselves. I am currently working on a location based app that will probably end up using OStatus to federate messages.

Check out geoloqi, I am developing the nodejs server.

- Tyler

Over the next couple of weeks there will be more OStatus interviews posted right here, so stay tuned!

August 16, 2010 09:03 PM

Open Knowledge Foundation Blog

Gathering, Preserving and Reusing our Cultural Heritage - the OKFN Cultural Heritage Working Group.

An announcement about the newly formed OKFN open heritage working group - Supporting open access to out cultural heritage. Related posts:

  1. Study on use of open licenses by UK cultural heritage organisations
  2. Eduserv study on open content licensing in cultural heritage sector published
  3. New working group on open bibliographic data!

August 16, 2010 08:28 PM

Open Text Book

P2PU

P2PU is an initiative designed to promote direct teaching/learning opportunities. You can participate as a student by signing up for a course or as a teacher by designing and running a course.

Una Daly, Associate Director College Open Textbooks Collaborative has proposed a course that should interest anyone reading this blog: Adopting Open Textbooks.

http://wiki.p2pu.org/Adopting-Open-Textbooks

P2PU is certainly in the spirit of things “open.” Sign up for the course and learn more.

August 16, 2010 05:47 PM

Open Video Conference

Ethan Zuckerman of Berkman and Global Voices at OVC

Ethan Zuckerman is a senior researcher at the Berkman Center for Internet and Society at Harvard University. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists.

With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.

In 2000, Ethan founded Geekcorps, a technology volunteer corps that sends IT specialists to work on projects in developing nations, with a focus on West Africa. Previously Ethan helped found Tripod.com, one of the web’s first “personal publishing” sites. He blogs at http://ethanzuckerman.com/blog.

Register today for the Open Video Conference, October 1-2 in New York City!

Photo: dweinberger

August 16, 2010 02:27 PM

Planet Drumbeat

Open Video Alliance: Ethan Zuckerman of Berkman and Global Voices at OVC

Ethan Zuckerman is a senior researcher at the Berkman Center for Internet and Society at Harvard University. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists.

With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.

In 2000, Ethan founded Geekcorps, a technology volunteer corps that sends IT specialists to work on projects in developing nations, with a focus on West Africa. Previously Ethan helped found Tripod.com, one of the web’s first “personal publishing” sites. He blogs at http://ethanzuckerman.com/blog.

Register today for the Open Video Conference, October 1-2 in New York City!

Photo: dweinberger

August 16, 2010 02:27 PM

Open Knowledge Foundation Blog

B-Open: Open Data from Bristol City Council

The following guest post is from Stephen Hilton, Programme Lead of the Connecting Bristol initiative. Unusually perhaps, for a city council, we recognise and relish the fact that our city is a quirky, unorthodox, hot-bed of creative digital activity and activism. Bristol City Council has been promoting local e-democracy for the last decade. And it [...] Related posts:

  1. How to open up local data: notes from Warwickshire council
  2. Open Definition Advisory Council launched
  3. Talking at Open Up the City in Helsinki

August 16, 2010 10:58 AM

Linked Data Blog Aggregator

I Have Yet to Metadata I Didn’t Like

Ecumenical

Contrasted with Some Observations on Linked Data

At the SemTech conference earlier this summer there was a kind of vuvuzela-like buzzing in the background. And, like the World Cup games on television, in play at the same time as the conference, I found the droning to be just as irritating.

That droning was a combination of the sense of righteousness in the superiority of linked data matched with a reprise of the “chicken-and-egg” argument that plagued the early years of semantic Web advocacy [1]. I think both of these premises are misplaced. So, while I have been a fan and explicator of linked data for some time, I do not worship at its altar [2]. And, for those that do, this post argues for a greater sense of ecumenism.

My main points are not against linked data. I think it a very useful technique and good (if not best) practice in many circumstances. But my main points get at whether linked data is an objective in itself. By making it such, I argue our eye misses the ball. And, in so doing, we miss making the connection with meaningful, interoperable information, which should be our true objective. We need to look elsewhere than linked data for root causes.

Observation #1: What Problem Are We Solving?

When I began this blog more than five years ago — and when I left my career in population genetics nearly three decades before that — I did so because of my belief in the value of information to confer adaptive advantage. My perspective then, and my perspective now, was that adaptive information through genetics and evolution was being uniquely supplanted within the human species. This change has occurred because humanity is able to record and carry forward all information gained in its experiences.

Adaptive innovations from writing to bulk printing to now electronic form uniquely position the human species to both record its past and anticipate its future. We no longer are limited to evolution and genetic information encoded in surviving offspring to determine what information is retained and moves forward. Now, all information can be retained. Further, we can combine and connect that information in ways that break to smithereens the biological limits of other species.

Yet, despite the electronic volumes and the potentials,