Free Software :: Free Culture & Archiving Planet

Free Culture projects:

Research links:

ToC

  1. NLP News : Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives.
  2. Semantic MediaWiki Forms : FCKeditor always on in SF?
  3. Semantic MediaWiki Forms : size=50 and uploadable garbles values
  4. Tesseract : Detecting simple phone number
  5. Open Access : Housekeeping
  6. W3C Semantic Web : Last Call for Six Rule Interchange Format (RIF) Drafts
  7. NLP News : Take a tour through Panasonic's Crt & flat panel TV recycling center
  8. BioMed OA : Authenticity of some published trials in question
  9. BioMed OA : Reporting of treatment heterogeneity proves challenging
  10. Planet Linked Data : DBpedia 3.3 released
  11. OCRopus : Build problems with OCRopus (rev. 4211561f28) and IUlib (rev. 2040eecf79)
  12. OpenGeoData : ODbL 1.0 launched
  13. BioMed OA : BioPsychoSocial Medicine announces the Winner of the 2008 Ikemi Award
  14. Open Video Conference : Jonathan McIntosh’s “Buffy vs. Edward” Video Goes Viral
  15. W3C Semantic Web : First Draft of SPARQL New Features and Rationale
  16. NLP News : Security beyond guns, guards and gates
  17. Information Aesthetics : USAspending.gov: Where Americans Can See where their Money Goes
  18. NLP News : Federal focus on healthcare IT: a bounty for KM vendors?
  19. Ubiquity : Pushing back the release?
  20. Wikimedia : Power outage in Wikimedia’s European servers
  21. Open Access : Young people and OA
  22. EFF : Judge Overturns Lori Drew Misdemeanor Convictions
  23. Open Access : More on the theory of research sharing
  24. Planet Linked Data : Release of structWSF, conStruct and the Community Web Site
  25. Open Access : More on EOS; OA in Belgium
  26. NLP News : Lance Armstrong: “I Am Ready”
  27. Open Access : Cancellations and OA, the flip side
  28. Open Access : Court orders release of Elsevier license terms
  29. Open Access : Why do publishers participate in developing country access initiatives?
  30. Open Access : Skeptical review of Anderson's Free
  31. Wikimedia : Current events and traffic spikes
  32. Open Access : Presentations from European OA meeting
  33. Open Access : Comparative study says benefits of OA outweigh costs
  34. Ubiquity : Create Bookmarklet Command not matching
  35. Ubiquity : How to control another tab and access its html
  36. Inside Google Book Search : New ways to search within a book
  37. Music Brainz : The Wall Street Journal reviews TuneUp and Picard
  38. Wikimedia : Improving Wikimedia’s Discussion System
  39. Google Research : International Conference on Machine Learning (ICML 2009) in Montreal
  40. NLP News : Systran Sponsors Machine Translation Summit XII
  41. NLP News : Systran Sponsors Machine Translation Summit XII
  42. OpenGeoData : Help me make your map better
  43. Open Access : July SOAN
  44. Open Knowledge Foundation : Open Knowledge Foundation Newsletter No. 11
  45. Ubiquity : Danish translation
  46. OpenGeoData : SOTM now just over 7 days away
  47. Information Aesthetics : Sputnik Observatory for the Study of Contemporary Culture
  48. OpenSocial API : Apache Shindig 1.0-incubating released
  49. Ubiquity : Is it possible to change the default language?
  50. EFF : ASCAP Makes Outlandish Copyright Claims on Cell Phone Ringtones
  51. Planet Linked Data : structWSF: A Framework for Collaboration Networks
  52. Tesseract : Return codes and their descriptions
  53. Wikimedia : First usability release, Acai, is now available.
  54. NLP News : Naughty Feeds
  55. NLP News : Arabic software firm buys US-based Dial Directions
  56. Open Access : Feedback sought on citation sharing service
  57. Science Commons : WisconsinView dedicates 6+ terabytes of data to the public domain
  58. Open Access : Victoria committee recommends encouraging, not requiring, OA
  59. Semantic MediaWiki Forms : Version 1.7.3: fixes for HTML-escaping of characters, SMWSQLStore support removed, etc.
  60. Open Access : BMC adds 'Post to Twitter' button
  61. Open Access : Forthcoming libre OA journal on stem cells
  62. Open Access : Most BMC journal impact factors increase
  63. Open Access : UNESCO releases its first openly licensed publication
  64. Wikimedia : Open Translation Tools 2009 report
  65. Linux Foundation : The Far-reaching Implications of Licence Violation
  66. Open Access : Forthcoming libre OA journal on water
  67. Open Access : OCLC scraps WorldCat data policy, will write new one
  68. Open Access : Milestone for IR at U. Liège
  69. Open Access : WorldWideScience adds new discovery, sharing features
  70. Open Access : Pharmacy Education journal converts to OA
  71. Open Access : Obama Ed. department drafting plan to fund OERs
  72. NLP News : PASW Text Analytics for Surveys (spss) reviewed
  73. BioMed OA : Robotic lower limb exoskeletons – a new thematic series published in Journal of NeuroEngineering and Rehabilitation
  74. Ubiquity : [ubiquity] weekly meeting in 8 hours!
  75. OCRopus : beam search failed?!
  76. OCRopus : ocr-layout directory
  77. NLP News : Informatics in Radiology: Render: An Online Searchable Radiology Study Repository.
  78. Ubiquity : Problem with "weather"
  79. OpenGeoData : Vote Steve
  80. Information Aesthetics : DD4D Conference Best-Of Coverage (Guest Post)
  81. Open Medicine : Top 25 Medical Applications - iPhone 3GS 2009
  82. Inside Google Book Search : Explore a book in 10 seconds
  83. Information Aesthetics : Typographic Reinterpretation of Cunningham's Dancing Hands
  84. Semantic MediaWiki Forms : Mini query inside a form
  85. Journal of Machine Learning : Multi-task Reinforcement Learning in Partially Observable Stochastic Environments; Hui Li, Xuejun Liao, Lawrence Carin; 10(May):1131--1186, 2009.
  86. Journal of Machine Learning : Universal Kernel-Based Learning with Applications to Regular Languages; Leonid (Aryeh) Kontorovich, Boaz Nadler; 10(May):1095--1129, 2009.
  87. Journal of Machine Learning : An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity; Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér; 10(May):1071--1094, 2009.
  88. Journal of Machine Learning : Fourier Theoretic Probabilistic Inference over Permutations; Jonathan Huang, Carlos Guestrin, Leonidas Guibas; 10(May):997--1070, 2009.
  89. Journal of Machine Learning : On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality; Wenxin Jiang; 10(Apr):977--996, 2009.
  90. Journal of Machine Learning : Nonextensive Information Theoretic Kernels on Measures; André F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, Mário A. T. Figueiredo; 10(Apr):935--975, 2009.
  91. Journal of Machine Learning : Java-ML: A Machine Learning Library; Thomas Abeel, Yves Van de Peer, Yvan Saeys; 10(Apr):931--934, 2009.
  92. Journal of Machine Learning : Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods; Holger Höfling, Robert Tibshirani; 10(Apr):883--906, 2009.
  93. Journal of Machine Learning : Stable and Efficient Gaussian Process Calculations; Leslie Foster, Alex Waagen, Nabeela Aijaz, Michael Hurley, Apolonio Luis, Joel Rinsky, Chandrika Satyavolu, Michael J. Way, Paul Gazis, Ashok Srivastava; 10(Apr):857--882, 2009.
  94. Journal of Machine Learning : Consistency and Localizability; Alon Zakai, Ya'acov Ritov; 10(Apr):827--856, 2009.
  95. NLP News : Arabic software firm buys US-based Dial Directions
  96. NLP News : Will Computers Replace Humans?
  97. NLP News : Husserl
  98. NLP News : ICT Tools and Systems Supporting Innovation in Product/Process Development
  99. NLP News : Exploring Concepts’ Semantic Relations for Clustering-Based Query Senses Disambiguation
  100. NLP News : Automated Grammar Checking of Tenses for ESL Writing
  101. NLP News : Web Self-Service Will Make You Great
  102. NLP News : Say What? ‘Dial Directions’ Acquired By Arabic Language Specialist Sakhr Software
  103. Wikimedia : Downtime on en.wikipedia.org resolved
  104. Tesseract : Tesseract 2.04 available as download
  105. Tesseract : Icons
  106. Open Video Conference : OVC Interview with Pirate Bay’s Peter Sunde on Boing Boing
  107. NLP News : It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates
  108. NLP News : Distant supervision for relation extraction without labeled data
  109. Tesseract : Need some recommendations
  110. OCRopus : Help needed to install ocropus on ubuntu
  111. if:book : run, don't walk
  112. Music Brainz : Looking for a new maintainer for pymb2 and libdiscid
  113. Open Archaeology : Archaeology and Computing meetings: the "epic fail" year
  114. Zotero : Follow Libraries and Collections with Feeds
  115. Linux Foundation : LinuxCon Program and Event Details Take Shape
  116. Planet Linked Data : structWSF: A Framework for Data Mixing
  117. Wikimedia : Wikimedia Mobile is Officially Launched
  118. Open Archaeology : SCCH09 -- Scientific Computing & Cultural Heritage
  119. Open Archaeology : International Congress "Cultural Heritage and New Technologies" (Workshop "Archäologie & Computer")
  120. Wikimedia : Firefox 3.5 brings native open video support
  121. Wikimedia : On templates and programming languages
  122. Ubiquity : Problem truing to install from source with manage.py
  123. Semantic MediaWiki Forms : BibTeX import
  124. NLP News : Icml/colt/uai 2009 retrospective
  125. OpenGeoData : Iran maps
  126. NLP News : Enterprise Search Expert Joins EveryZing Executive Team
  127. Ubiquity : Unable to get command hint after 0.1.5
  128. Inside Google Book Search : New Features on Google Books
  129. NLP News : Analyzing online content with OpenAmplify
  130. OpenGeoData : The future of mapping
  131. Open Access : Reading the ground tremors
  132. OpenGeoData : Ubiquitous Geocontext
  133. Open Access : Harvesting ProQuest metadata for an ETD repository
  134. Open Access : Another new OA publisher
  135. Information Aesthetics : Cykelbarometer: Public Copenhagen Urban Bicycle Counter
  136. Information Aesthetics : Communicating the Noise Levels Caused by Heathrow Airport
  137. NLP : ICML/COLT/UAI 2009 retrospective
  138. NLP News : Sakhr Software Acquires Dial Directions
  139. NLP News : Sakhr Software Acquires Dial Directions
  140. Open Access : More on the U. Kansas OA policy
  141. Open Access : Updates on FRPAA
  142. NLP News : Silicon Valley should step up, help Iranians
  143. Wikimedia : Blog Downtime
  144. EFF : Help Protesters in Iran: Run a Tor Bridge or a Tor Relay
  145. Ubiquity : Ubiquity Herd
  146. FRBR : Tillett, Sharing Standards for Bibliographic Data Worldwide
  147. Ubiquity : undo command
  148. Ubiquity : Updated release plans
  149. NLP News : PyGirl: Generating Whole-System VMs from High-Level Prototypes Using PyPy
  150. Open Access : OA to government statistics
  151. Open Access : New OA publisher
  152. AKSW Semantic Web : The Road to OntoWiki 1.0
  153. Planet Linked Data : Virtuoso loads 110,500 triples-per-second on LUBM 8000
  154. Tesseract : Compressing a sequence of spaces
  155. Open Access : Swords and plowshares: harvesting online knowledge
  156. if:book : please discuss
  157. Open Access : No OA impact advantage seen in ophthalmology
  158. Open Knowledge Foundation : Open Database License (ODbL) v1.0 Released
  159. Open Access : New OA journal on virology
  160. BioMed OA : Data publication and openness in the scientific community
  161. Open Access : OA mandate at the Canadian Breast Cancer Research Alliance
  162. Open Access : Version 1.0 of the Open Database License
  163. Open Access : First funding pledge for ELIXIR
  164. Open Access : A career in OA publishing
  165. OpenSocial API : Why Enterprise Software Provider Atlassian Chose OpenSocial
  166. OpenSocial API : A new addition to the OpenSocial family - the ActionScript3 client library!
  167. Open Video Conference : Columbia’s Educational Video Environment Released at OVC
  168. Ubiquity : Display More Results
  169. OCRopus : Patch for genAM.py (ocropus and iulib) for Python 2.3.4
  170. OCRopus : cleanup of wikis and documentation
  171. OCRopus : Confidence value
  172. Tesseract : Confidence value for each character
  173. Open Access : A new model for OA repositories
  174. Ubiquity : 0.5 Conversion: Defining argument prepositions
  175. Open Access : Finland joins SCOAP3
  176. Open Access : More on the history of OA and the preprint culture in physics
  177. Ubiquity : Ubiquity 0.5Pre2 does not work for me
  178. Semantic MediaWiki Forms : internal link to "search by property"
  179. OCRopus : Newbie question: removing artifacts from mobile phone picture
  180. FRBR : FRSAD draft available, FRAD book published
  181. Tesseract : tessnet2.dll signing
  182. Planet Linked Data : 3sat TV magazine features Linked Data and DBpedia
  183. OCRopus : Moderation Required ...
  184. Ubiquity : Case sensitive commands
  185. Information Aesthetics : NYTimes Michael Jackson's Billboard Rankings Over Time
  186. Free Our Data : Michael Cross: setting data free is an easy promise when in opposition – so would a Tory government do it?
  187. EFF : miniLinks for 2009-06-26
  188. Open Access : U. Kansas adopts an OA policy
  189. Open Access : How to build free knowledge
  190. Open Access : More on publishing data
  191. Open Access : Video of Boyle on The Public Domain
  192. Open Access : More on student support for OA
  193. Open Access : Impact factors of Hindawi journals rise
  194. Ubiquity : How can I have Firefox remember all open windows after shutting down?
  195. OCRopus : Training
  196. Open Access : More on FRPAA
  197. Planet Linked Data : Linked Data Rules Simplified
  198. OCRopus : allheaders.h accepted by the compiler, rejected by the preprocessor!
  199. Linux Foundation : TiddlyGuv: An Open-Source Governance System

July 04, 2009

NLP News

Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives.

Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives. Comput Inform Nurs. 2009 July/August;27(4):224-225 Authors: PMID: 19574747 [PubMed - as supplied by publisher]

July 04, 2009 01:53 PM

Semantic Forms Google Group

FCKeditor always on in SF?

Hi,
I'm working on adding SMW to [link], where I have
also installed the FCKeditor. I have a form called 'Publisher' that
includes a free text field [1], that was created using the 'create
template' special page.
When I click 'edit' to edit a particular 'Publisher', I get the
standard FCKeditor links above the text box. i.e. "[Rich Editor] [Open

July 04, 2009 12:09 PM

size=50 and uploadable garbles values

Given: Form:Test, Template:Test, Category:Test according to basic
setup.

Everything works fine when Form:Test has:

{{{for template|Test}}}
{| class="formtable"
|-
!Has author(s):
| {{{field|Has author}}}
|}
{{{end template}}}

I.e. when I use the form to create a apge called Test, this page gets

July 04, 2009 10:34 AM

tesseract-ocr Google Group

Detecting simple phone number

Hi,

I installed tesseract to do some simple OCR on a very basic image - a
phone number in Arial 11pt in a png format ([link]
phone.png) which I convert to a uncompressed tiff using:

convert -monochrome -normalize ./phone.png ./phone.tif

When I run it through tesseract I get an empty file from the following

July 04, 2009 09:52 AM

Open Access News

Housekeeping

Today I step back from systematic daily blogging in order to free up time for my new position at Harvard's Berkman Center and Office for Scholarly Communication.

The blog itself will continue and Gavin will continue at something like his current pace.  I will continue my daily crawl for OA-related news.  I'll continue to tag what I find for the OA tracking project (OATP).  I'll continue to write the monthly SPARC Open Access Newsletter (SOAN).  I'll continue to work full-time for OA. 

I'll even continue to blog, though only sporadically.  Open Access News (OAN) will be smaller and more selective than in the past.  I cannot assure you that the news it covers will be the most important subset.  (That presupposes that Gavin and I will be on top of all new developments and in a position to pick the most important.)  I'll blog what I notice, what moves me, and what I have time for, with the accent on the third criterion.  It should be a eclectic bunch.  I know that I'll notice a lot of important news, thanks to OATP, and I know that I'll be moved to blog a lot of it.  But because of my new projects, even the most important news will be important news that I only have time to tag, not to blog.

For a comprehensive source of OA news, subscribe to the OATP feed, which is available by RSS, email, and a blog-like web page with the most recent items displayed first.  The OATP feed has been more comprehensive than this blog since April and it grows more comprehensive and useful every day.  To help the cause, please join OATP as a tagger and help select new items for inclusion in the feed.  For more details, see the OATP home page or my SOAN article about it from May 2009

In the same May SOAN, I reflect on the losses and gains from this transition.  I'm acutely aware of them both. 

July 04, 2009 09:35 AM

W3C Semantic Web Activity News

Last Call for Six Rule Interchange Format (RIF) Drafts

The W3C Rule Interchange Format (RIF) Working Group has published six Last Call Working Drafts. Together, they allow systems using a variety of rule languages and rule-based technologies to interoperate with each other and with other Semantic Web technologies. Three of the drafts define XML formats with formal semantics for storing and transmitting rules: The RIF Production Rule Dialect (PRD) is designed for the kinds of rules used in modern Business Rule Management systems. The RIF Basic Logic Dialect (BLD) is a foundation for Logic Programming, classical logic, and related formalisms. The RIF Core Dialect is the common subset of PRD and BLD, useful when having a ubiquitous platform is paramount. The other drafts: RIF Datatypes and Builtins (DTB) specifies the datatypes and standard operations (modeled on XPath Functions) available in all RIF dialects RIF RDF and OWL Compatibility specifies how RIF works with RDF, RDFS, OWL 1, and OWL 2. RIF Framework for Logic Dialects (FLD) provides a mechanism for specifying extended dialects, beyond BLD, when more expressive power is required. The Working Group requests comments be sent to public-rif-comments@w3.org by 31 July 2009.

July 04, 2009 06:54 AM

tesseract-ocr Google Group

It says the following on the FAQ, and it seems Ray's
been saying on the bug tracker since
November that these things are fixed in 2.04,
but they
don't seem to be - am I missing something? 
"Without libtiff, Tesseract only reads uncompressed tiff files. Even
then it won't read 32 bit tiff files correctly. Will be fixed in 2.04.

July 04, 2009 06:38 AM

July 03, 2009

NLP News

Take a tour through Panasonic's Crt & flat panel TV recycling center

Take a tour through Panasonic's CRT & flat panel TV recycling centerEngadget HDIf there's anything better than machine translation, old TVs headed for certain doom, and lasers, we have no idea what it is. Please keep your comments ...and more »

July 03, 2009 08:57 PM

BioMed Central

Authenticity of some published trials in question

More than 90% of a sample of randomised controlled trials (RCTs) published in Chinese journals between 1994 and 2005 did not adhere to recognised methodology for randomisation, according to a study published yesterday in Trials, casting doubt on the reliability of research that has the potential to influence medical decision-makers.

Wu and colleagues (Chinese Cochrane Centre at Sichuan University, China and Ottawa Hospital Research Institute) searched the China National Knowledge Infrastructure electronic database for reports published in the Chinese literature between January 1994 and June 2005, that were described by the authors as RCTs or claimed to have used random sequence generation or allocation concealment.

Telephone interviews with the first or co-authors of 2235 reports about randomisation methods and quality-control features of the trial indicated that only 6.8% of the studies be considered “authentic” RCTs. Although only 51.6% of trials supported by government or other official organizations were found to be authentic, all trials of pre-market drugs were identified as such. Wu et al. report that of the first-authors erroneously identifying their studies as RCTs, 85.6% did not fully understand the principles of randomisation, whilst 5.1% mislabelled their trials despite an understanding of the relevant methodology.

Methodology    
Randomized trials published in some Chinese journals: how many are randomized?
Taixiang Wu, Youping Li, Zhaoxiang Bian, Guanjian Liu, David Moher
Trials 2009, 10:46 (2 July 2009)
[Abstract] [Provisional PDF]

The misleading reporting of RCTs is likely a worldwide problem, but the investigators suggest a link between their results and the high proportion of positive trial results published in Chinese journals, noting that inadequate randomisation has been previously shown to result in more favourable estimates of treatment effects. They also highlight the potential for falsely reported RCTs to mislead healthcare providers and policy makers, and impact upon the findings of systematic reviews.

Wu et al. advocate improvements to the education of researchers in the principles of randomisation methodology and scientific reporting. In addition, they suggest that the development of peer review guidelines is needed to help identify poorly randomised studies before publication.

Victoria Thompson
Assistant Journal Development Editor - Trials

July 03, 2009 04:16 PM

Reporting of treatment heterogeneity proves challenging

A review of randomised controlled trials (RCTs) that had been published in five prominent medical journals has revealed that heterogeneity of treatment effects (HTE) is frequently ignored or incorrectly analysed. The results of this study were published last week in Trials.

Some patients will experience more or less benefit from treatment than the averages reported from clinical trials; the magnitude of such variation in therapeutic outcome across a population is termed HTE. Highly variable treatment response rates are known to exist for many common conditions, including ischemic stroke and diabetes. Identifying HTE is therefore necessary to individualise treatment.

Gabler et al., conducted a review of the prevalence of HTE analyses in 319 RCTs published in Annals of Internal Medicine, BMJ, Journal of the American Medical Association, The Lancet, and New England Journal of Medicine. They found that just 29% of studies reported HTE analysis and were only marginally better in 2004 than in 1994. Another 28% reported subgroup-only analyses, without the formal statistical tests of heterogeneity that are recommended by the CONSORT guidelines.  

The authors conclude that HTE reporting in the general medical literature is neither rigorous nor routine and suggest it may be time to develop new standards for reporting.

Dealing with heterogeneity of treatment effects: is the literature up to the challenge?
Nicole B Gabler, Naihua Duan, Diana Liao, Joann G Elmore, Theodore G Ganiats, Richard L Kravitz

These results follow those of a another study published last year in Trials, which revealed that only 31% of RCTs published in the same leading medical journals reliably accounted for missing data when analysing quality of life outcomes.

In addition to original research relating to RCTs, Trials also encourages the publication of study protocols, recognizing that this reduces risk of non-publication of trial results and facilitates methodological discussion. Such published study protocols, while important to the scientific record, are unlikely to be heavily cited. It is therefore all the more impressive that Trials has increased its Impact Factor in the latest 2008 Journal Citation Reports to 1.74 (up from 1.44 last year). For the first time the journal is ranked above competitors such as the official journal of the Society for Clinical Trials, Clinical Trials (2008 Impact Factor 1.69)  and the Elsevier title Contemporary Clinical Trials (2008 Impact Factor 1.42).

For more information about the journal Trials, please contact the editorial office.

Abigail Jones
Senior Assistant Editor – Trials

July 03, 2009 04:15 PM

Linked Data Blog Aggregator

DBpedia 3.3 released

We are pleased to announce the release of DBpedia 3.3. This release is based on Wikipedia dumps of May 2009.

The new release includes the following improvements over DBpedia 3.2:

1. more accurate abstract extraction
2. labels and abstracts in 80 languages
3. several infobox extraction bugfixes
4. new links to Dailymed, Diseasome, Drugbank, Sider, TCM
5. updated Open Cyc links

You can find the datasets here, and the rdf files here. The dataset is available to be queried at our Sparql endpoint.

After eight long months without DBpedia release (due to a lack of Wikipedia dumps), today’s release will bring us up to speed again, and we will release DBpedia datasets much more often in the future.

July 03, 2009 10:59 AM

ocropus Google Group

Build problems with OCRopus (rev. 4211561f28) and IUlib (rev. 2040eecf79)

Hi,
I've already submitted an issue with the new revision of IUlib -
errors arise when I try to build it. This is why you can't build
OCRopus either.

Would appreciate if any of the developers could have a look.

Thanks,
Regards,
Blazej

July 03, 2009 09:27 AM

OpenGeoData

ODbL 1.0 launched

The Open Database License 1.0 has been launched. Check out OSM’s implementation plan. The ODC announcement is here: The Open Database License (ODbL) is an open license for data and databases which includes explicit attribution and share-alike requirements. This license, the first of its kind, is a major step forward for open data. There are currently very few [...]

July 03, 2009 08:56 AM

BioMed Central

BioPsychoSocial Medicine announces the Winner of the 2008 Ikemi Award

The winner of this year’s Ikemi Award was Hiroki Nishimura, MA (National Institute of Mental Health, NCNP, Tokyo) for his article published in BioPsychoSocial Medicine.

Psychological and weight-related characteristics of patients with anorexia nervosa-restricting type who later develop bulimia nervosa
Nishimura H, Komaki G, Ando T, Nakahara T, Oka T, Kawai K, Nagata T, Nishizono A, Okamoto Y, Okabe K, Koide M, Yamaguchi C, Saito S, Ohkuma K, Nagata K, Naruo T, Takii M, Kiriike N, Ishikawa T, Japanese Genetic Research Group for Eating Disorders
BioPsychoSocial Medicine 2008, 2:5 (12 February 2008)

The 2008 Ikemi Award was presented to Mr Nishimura at the 50th Annual Meeting of the Japanese Society of Psychosomatic Medicine.

The Ikemi Award is presented to the first author of the best article (as decided by the selection committee) published in BioPsychoSocial Medicine during the previous year. To be considered for the 2009 Ikemi Award, submit your next manuscript to BioPsychoSocial Medicine. Please see the Award page of the journal website for further details.

BioPsychoSocial Medicine is the official journal of the Japanese Society of Psychosomatic Medicine and publishes research on psychosomatic disorders and diseases. For more information, please see the ‘About’ page or contact the Editorial office.

July 03, 2009 08:50 AM

Open Video Conference

Jonathan McIntosh’s “Buffy vs. Edward” Video Goes Viral

Jonathan McIntosh’s “Buffy vs. Edward (Twilight Remixed)” video has gone viral, claiming over one million views, appearing on blogs and news sites around the web, and being tweeted constantly. Jonathan, of Rebellious Pixels, premiered his short film at the Open Video Conference during his featured talk: “How to Make a Political Remix Video.” Here’s his blurb:

In this remixed narrative, Edward Cullen from the Twilight Series meets Buffy the Vampire Slayer. It’s an example of transformative storytelling serving as a pro-feminist visual critique of Edward’s character and generally creepy behavior. Seen through Buffy’s eyes some of the more sexist gender roles and patriarchal Hollywood themes embedded in the Twilight saga are exposed - in hilarious ways. This transformative remix work constitutes a fair-use of any copyrighted material as provided for in section 107 of the US copyright law. “Buffy vs Edward (Twilight Remixed)” by Jonathan McIntosh is licensed under a Creative Commons BY-NC-3.0 License - permitting non-commercial sharing with attribution.

The video ends by explaining to the viewer that “this transformative work constitutes a ‘fair use’ of any copyrighted material as provided for in section 107 of the US Copyright Law.” Not only has Jonathan made an awesome video and commentary, but he’s doing his part to teach the world about fair use! These remix videos also serve as a great boon to the original works: the number of times “omg i have to watch buffy again” has appeared on message boards, comments, and tweets is tremendous. There have also been quite a few comments out there from fans who, after watching the video, realize that Edward is pretty stalkerish.

His video has already been talked about in the New York Post, Entertainment Weekly, Jezebel, Pepsi’s POPTUB, and—perhaps the biggest testament to the films’ virality—Perez Hilton. He was also recently interviewed by the Los Angeles Times.

You can also help translate the video using dotsub. It has already been translated into eight languages.

UPDATE: The video has recently been written about in the LA Times.

UPDATE 2: Check out why Jonathan made the video.

July 03, 2009 08:38 AM

W3C Semantic Web Activity News

First Draft of SPARQL New Features and Rationale

The W3C SPARQL Working Group has published the First Public Working Draft of SPARQL New Features and Rationale. This document provides an overview of the main new features of SPARQL and their rationale. This is an update to SPARQL adding several new features that have been agreed by the SPARQL WG. These language features were determined based on real applications and user and tool-developer experience.

July 03, 2009 07:52 AM

NLP News

Security beyond guns, guards and gates

Express ComputersSecurity beyond guns, guards and gatesExpress ComputersFor broadcast video news, TALES performs video capture, key frame extraction, automatic speech-to-text conversion, machine translation of the foreign text ...and more »

July 03, 2009 05:18 AM

information aesthetics

USAspending.gov: Where Americans Can See where their Money Goes

usaspending.jpg
USAspending.gov is a new US governmental website designed in accordance to the Federal Funding Accountability and Transparency Act of 2006 (Transparency Act): it is a single searchable website, accessible by the public for free that includes for each Federal award:

1. the name of the entity receiving the award;
2. the amount of the award;
3. information on the award including transaction type, funding agency, etc;
4. the location of the entity receiving the award;
5. a unique identifier of the entity receiving the award.

The data is largely gathered from the Federal Procurement Data System, which contains information about federal contracts, and the Federal Assistance Award Data System, which contains information about federal financial assistance such as grants, loans, insurance, and direct subsidies like Social Security. The underlying technology for USAspending.gov was developed by OMB Watch with the support of The Sunlight Foundation and is used on OMB Watch's website located at FedSpending.org.

Most of the visualizations are displayed in the Federal IT Dashboard. For instance, the current illustrates government spending in the form of charts and lists ranking the largest government contractors (e.g. Lockheed, Boeing, Northrop Grumman, etc.) and assistance recipients (e.g. Department of Healthcare Services, New York State Dept. of Health, Texas Health & Human Services Commission, etc.).

You can watch an explanatory video below.

Thnkx Nick. Via TechCrunch.


July 03, 2009 04:25 AM

NLP News

Federal focus on healthcare IT: a bounty for KM vendors?

Federal focus on healthcare IT: a bounty for KM vendors?KMWorld MagazineNatural language processing is a popular approach. In April 2009, the Mayo Clinic and IBM announced an open source initiative called the Open Health Natural ...and more »

July 03, 2009 04:04 AM

July 02, 2009

ubiquity-firefox Google Group

Pushing back the release?

Hey everybody,
Software is hard.
I feel bad to even suggest this after how hard all the Ubiquity
contributors have been working towards this release over the past few
weeks, but I am thinking about pushing the 0.5 release back a couple
more weeks, until I have returned from my trip. I just got done

July 02, 2009 11:52 PM

Wikimedia Technical Blog

Power outage in Wikimedia’s European servers

This seems to be a power outage at our European proxy caching cluster; we’ll see if we can give more details later.

deadeuro-reqstats-hourly

European traffic has been rerouted to our US servers, but the extra load may cause the sites to be a little sluggish for now. (If your DNS is still seeing the old entries, you can manually configure your browser to use the US proxy: rr.pmtpa.wikimedia.org port 80. You should only do this temporarily, as you won’t be able to access anything *but* Wikipedia and our sister projects. :)

Update 21:13 UTC:

European servers are coming back online, we should have this cleaned up pretty soon.

Update 21:26 UTC:

We’re starting to switch traffic back to Europe. Should be better in a few minutes… In the meantime, amuse yourself reading the Twitter panic. :)

Update 21:40 UTC:

You can also use the SSL interface to Wikipedia, which doesn’t have the proxy overload.

July 02, 2009 10:32 PM

Open Access News

Young people and OA

Lynn Silipigni Connaway, Expectations of the Screenager Generation, presented at RLG Annual Partnership Symposium (Boston, June 3, 2009). (Thanks to Fabrizio Tinti.) Report on a study of 12-18 year olds and their expectations of libraries and information resources.

July 02, 2009 09:52 PM

EFF.org Updates

Judge Overturns Lori Drew Misdemeanor Convictions

A federal district court judge today threw out the misdemeanor convictions of Lori Drew after the judge determined that the federal anti-hacking statute under which Drew was prosecuted was inapplicable to the allegation that she violated MySpace's terms of service. Drew was convicted by a jury in November of 2008 of violating the Computer Fraud and Abuse Act (CFAA) which bars "unauthorized access" to a computer. Prosecutors argued that Drew had violated the CFAA by harassing 13-year-old neighbor Megan Meier through the use of a fake Myspace profile, harassment that prosecutors say directly led to Meier's suicide.

EFF, along with the Center for Democracy and Technology, Public Citizen, and 14 law professors and faculty members, filed an amicus brief in August arguing that the court should dismiss the CFAA claims against Drew because terms of service violations do not constitute crimes under the Act. Regardless of whether Drew could be held criminally liable under a different theory, EFF argued that the theory pursued by prosecutors was inappropriate.

U.S. District Judge George H. Wu stated that his opinion would become final when his written opinion was filed, likely next week.

July 02, 2009 08:30 PM

Open Access News

More on the theory of research sharing

David Wojick, Sharing Results is the Engine of Scientific Progress, OSTIblog, June 17, 2009. (Thanks to Fabrizio Tinti.)

[The Office of Scientific and Technical Information]'s mission is to help scientists share their results, but what role do results play in science? Here we present a simple model of one of the most basic uses of results, namely as the engine of scientific progress. Research results are more than just accumulated knowledge. Research results make possible new questions, which in turn lead to even more knowledge. The resulting pattern of exponential growth in knowledge is called an issue tree. It shows how individual results can have a value far beyond themselves, because they are shared and lead to research by others.

The reader is referred to the Sharing Results Issue Tree. [Note: omitting diagram.] This is an abstract example of a fundamental pattern that occurs throughout science. It begins with Result 1, which is an important finding by a researcher named Smith. Given this result there are three important new questions that can be formulated -- Questions A, B & C. It is important to realize that these questions could not have been asked until Result 1 occurred. Result 1 does much more than simply add to our knowledge, it raises important new questions.

Each of the three questions now becomes the object of new research. It is important to realize that in many cases this new research will be undertaken by researchers other than the one who got Result 1. This could not happen unless these new researchers know about Result 1, which requires sharing of results in some way or other. Thus sharing is essential for scientific progress.

The new questions that grow out of Result 1 yield Results 2 through 9. These new results are obtained mostly by researchers other than Smith, such as Brown, Gupta, Kim, etc. This is a large increase in knowledge, which is only made possible by the sharing of Result 1. Thus Result 1's value extends far beyond its contribution to knowledge. ...

Progress is not just the cumulative product of individual efforts, it requires sharing for its very being. We take this sharing for granted but it is by no means assured, and it is far from being efficient. The Internet promises to greatly improve the process of sharing scientific results, which should speed up progress. But this promise is still largely unmet. This is the challenge that OSTI is working on, how to speed up scientific progress by making sharing efficient.

July 02, 2009 09:08 PM

Linked Data Blog Aggregator

Release of structWSF, conStruct and the Community Web Site

The last few months have been challenging in term of amount of work to get done, in focusing on deliverables and in getting ready for the release of conStruct and structWSF sources codes, documentations, tutorials, web sites and demos.

I am now really happy to be able to finally announce the release of both software code sources along with a new development community website where users and developers can exchange ideas about these two news projects.

The biggest milestone of the last months is now behind us. However, this is just the beginning of everything!

I think that many things have been written about these two projects already. I don’t want to write any tutorial at this point. So the only thing I will do right now is to point you the more relevant documentation, web sites, blog posts and demos about each project. The next step will be to write about specific use cases, features, etc.

Community Web Site

The community Web site is a place where developers and users of structWSF and conStruct can meet to talk about both projects, to report bugs and issues, to submit new enhancements, to find tips and tricks, etc.

I would suggest you to create a new user profile on the community Web site if you are interested in communicating with other members.

structWSF

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies).

The structWSF middleware framework is fully RESTful in design and is based on HTTP and Web protocols and open standards. The initial structWSF framework comes packaged with a baseline set of about a dozen Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML.

conStruct

conStruct is a distro of the Drupal framework that aims to set a new standard in data integration and as a structured content system (SCS). With conStruct, you can let your data and its structure drive your applications. You can easily interoperate your diverse internal information with public content on the Web. And you can leverage a platform designed from the ground up for knowledge management and collaboration.

July 02, 2009 07:59 PM

Open Access News

More on EOS; OA in Belgium

Bernard Rentier, Faux départ !, Bernard Rentier, Recteur, June 22, 2009. Read it in the original French or Google's English.

I certainly rushed some things in announcing last week the launch of the [Enabling Open Scholarship] website. The next day we had an important meeting of the founders of the EOS group that I chair and we have decided that the site still requires some work, some improvements, a more recent update and a finalization of the Advisory Board. ... Embarrassing, especially since ... applications for membership in EOS abounded on every side from the first day! Hopefully this incident will not adversely affect the participation of many universities at the final launch ...

The role of the EOS site, in fact, will be mainly to rally the leaders of universities worldwide, to convince them to set up institutional repositories and help them. Its second goal is to persuade funders of the importance of free access to the publications of research they have funded and the need to develop systems to harvest from institutional repositories. For us, [Fonds de la Recherche Scientifique], signatory of the Berlin Declaration on open access, is expected to speak out soon in this regard. ... French-speaking Belgium thus could become the first "country" to adopt this system in its entirety, which should serve the cause of our researchers and their reputation.

See also our past posts on Enabling Open Scholarship and its predecessor, EurOpenScholar.

July 02, 2009 08:58 PM

NLP News

Lance Armstrong: “I Am Ready”

Lance Armstrong: “I Am Ready”Bike World NewsI've done my best to adjust and tweak the machine translation. I put my changes in parentheses. In an exclusive interview granted to Eurosport, ...and more »

July 02, 2009 07:54 PM

Open Access News

Cancellations and OA, the flip side

Jonathan Eisen, Another reason to publish as Open Access - libraries hurting big time financially and they will be cancelling many subscriptions, The Tree of Life, June 27, 2009.

If you need any more incentive to publish a paper in an Open Access manner if you have a choice - here is one. If you publish in a closed access journal of some kind, it is likely fewer and fewer colleagues will be able to get your paper as libraries are hurting big time and will be canceling a lot of subscriptions. ...

July 02, 2009 08:35 PM

Court orders release of Elsevier license terms

Association of Research Libraries, Elsevier Motion to Block License Release Denied in Open-Records Decision, press release, June 23, 2009.

An injunction filed by Elsevier to block release of information included in a licensing contract between the publisher and Washington State University (WSU) was denied by a court in the state of Washington last week. A public-records request for contract terms had been submitted to the university by researchers gathering data on the terms of large-publisher bundled contracts.

Whitman County Superior Court, State of Washington, ruled Friday, June 19, 2009, in favor of full disclosure for a public-records request submitted to Washington State University by Ted Bergstrom, Paul Courant, and Preston McAfee for license information regarding the WSU-Elsevier contract. On June 9, Elsevier had filed a Motion for Injunction against release of the data. According to court papers, the plaintiff argued that disclosure of the Elsevier-WSU contracts would “disclose aspects of Elsevier’s pricing methods and formula so as to produce private gain and public loss. Such disclosure would violate Elsevier’s rights under Washington statutes…to preserve the confidentiality of its proprietary pricing methods and formulae.” ...

Researchers Ted Bergstrom, Professor of Economics, University of California, Santa Barbara, and Paul Courant, University Librarian, Dean of Libraries, and Professor of Public Policy, Economics, and Information, University of Michigan, said, “We believe that state open-access laws serve the public interest by requiring full transparency of contracts that involve millions of taxpayer dollars. We will continue to collect and analyze the terms of ‘Big Deal’ contracts signed by a large number of universities and to share this information with the library community. We appreciate the efforts of university librarians who have helped us to collect contract information and we are grateful for ARL’s support and encouragement.”

It is not enough for institutions to assume that public-records requests will ensure that information about contracts and licenses can be made publicly accessible. Last month, the Association of Research Libraries (ARL) Board of Directors supported a resolution to encourage its members to refrain from signing nondisclosure agreements with publishers and to share information about their agreements, insofar as possible, with each other. Tom Leonard, President of ARL and University Librarian, University of California, Berkeley, said, “By responding to an open-records case in this manner, Elsevier has only increased our resolve to push for both open contracts and public disclosure of terms in our negotiations. This case is a telling example of why we should not be signing these nondisclosure agreements.”

July 02, 2009 08:27 PM

Why do publishers participate in developing country access initiatives?

Neil Pakenham-Walsh, Why are publishers participating in developing country access initiatives?, post to the Healthcare Information For All by 2015 mailing list, June 30, 2009.

INASP (International Network for the Availability of Scientific Publications) and ACU (Association of Commonwealth Universities), through their Publishers for Development initiative, recently hosted an online discussion on the question, 'Why are publishers participating in developing country access initiatives?'

All participants were learned society and commercial scholarly publishers (publishing in all sectors, including health). The results of the discussion are provided below. ...

There are a number of major access initiatives - and many smaller schemes focused on specific disciplines or even individual titles - which enable developing country researchers and students to access scholarly information freely at point of use. Commonly these are focused on supplying free or proportionately priced access to academic journals and databases, but there are also several support programmes which aim to strengthen the capacity of libraries to access and use these resources more effectively.

Publishers already provide considerable support to these schemes, offering proportionately discounted access to their principal titles - or in some cases free access - most often in electronic form, but occasionally also for print subscriptions where libraries still struggle to make good use of online information. ...

Some key motivations for publishers’ participation:

A moral argument: For many there is an important moral or philanthropic argument. Publishers, committed to advancing scholarly and scientific investigation, wish to extend access as widely as they can, and to ensure as many people as possible can reap the benefits of research. Developing countries are unable to pay ‘market rates’ but publishers can help by making subscriptions more affordable, thereby ensuring the digital and academic divide is narrowed.

The business case: This moral argument is also underpinned by a business case. Publishers’ key objective is to serve their authors as well as they can. Making sure that their publications - and thus their authors’ research - are disseminated as widely as possible is central to this. ... Discussion also noted that as well as serving the authors some publishers serve society partners who have this dissemination as part of their articles of existence. ...

Authors are not so much interested in the quantity of readers, but that the "right" people are reading - that can be those that have influence over their careers or those that could advance their research by putting it into direct practice. ...

See also our past posts on INASP and on developing country access initiatives, such as HINARI.

July 02, 2009 08:15 PM

Skeptical review of Anderson's Free

Malcolm Gladwell, Priced to Sell, New Yorker, July 6, 2009. A review of Chris Anderson's Free: The Past and Future of a Radical Price.

... Anderson’s ... point is that when prices hit zero extraordinary things happen. Anderson describes an experiment conducted by the M.I.T. behavioral economist Dan Ariely, the author of “Predictably Irrational.” Ariely offered a group of subjects a choice between two kinds of chocolate—Hershey’s Kisses, for one cent, and Lindt truffles, for fifteen cents. Three-quarters of the subjects chose the truffles. Then he redid the experiment, reducing the price of both chocolates by one cent. The Kisses were now free. What happened? The order of preference was reversed. Sixty-nine per cent of the subjects chose the Kisses. The price difference between the two chocolates was exactly the same, but that magic word “free” has the power to create a consumer stampede. Amazon has had the same experience with its offer of free shipping for orders over twenty-five dollars. The idea is to induce you to buy a second book, if your first book comes in at less than the twenty-five-dollar threshold. And that’s exactly what it does. In France, however, the offer was mistakenly set at the equivalent of twenty cents—and consumers didn’t buy the second book. “From the consumer’s perspective, there is a huge difference between cheap and free,” Anderson writes. “Give a product away, and it can go viral. Charge a single cent for it and you’re in an entirely different business. . . . The truth is that zero is one market and any other price is another.”

Since the falling costs of digital technology let you make as much stuff as you want, Anderson argues, and the magic of the word “free” creates instant demand among consumers, then Free (Anderson honors it with a capital) represents an enormous business opportunity. Companies ought to be able to make huge amounts of money “around” the thing being given away—as Google gives away its search and e-mail and makes its money on advertising.

... Look at YouTube, he says, the free video archive owned by Google. YouTube lets anyone post a video to its site free, and lets anyone watch a video on its site free ...

The only problem is that in the middle of laying out what he sees as the new business model of the digital age Anderson is forced to admit that one of his main case studies, YouTube, “has so far failed to make any money for Google.” ...

[T]here’s plenty of other information out there that has chosen to run in the opposite direction from Free. The [New York] Times gives away its content on its Web site. But the Wall Street Journal has found that more than a million subscribers are quite happy to pay for the privilege of reading online. Broadcast television—the original practitioner of Free—is struggling. But premium cable, with its stiff monthly charges for specialty content, is doing just fine. ... The only iron law here is the one too obvious to write a book about, which is that the digital age has so transformed the ways in which things are made and sold that there are no iron laws.

See also our past posts on Anderson's Free.

July 02, 2009 07:48 PM

Wikimedia Technical Blog

Current events and traffic spikes

News agencies today are reporting that pop star Michael Jackson has been hospitalized, and perhaps died. We can all think back on how the King of Pop has touched our lives, but today we can also see how high-profile news events can affect a web site… See also past events such as the Popedotting and the 2008 US election.

Here at the office we first noticed something was going on when IM services such as AOL Instant Messenger started logging people out — we quickly noticed that our own servers were hitting load spikes, and suspected there was something going on…

Server CPU load spike (likely several more to come):

load-spike

The actual traffic load spike is subtler; server effects can be disproportionate to the actual traffic:

traffic-spike

Update 22:53 UTC:

The traffic is pretty much holding steady but we’ve still been seeing intermittent load spikes:

load-spike2

These are at least in part due to one of our memcached internal data cache servers going wonky and swapping due to overuse of memory from text storage running on the same node. We’ve reduced traffic on the node and restarted it to even out its memory usage. (Thanks Domas!)

Update 23:00 UTC:

You may see intermittent messages like “(Cannot contact the database server: Unknown error (10.0.6.24))” as temporary database overloads cascade around the system. Sorry for the inconvenience while we work the kinks out; just wait a few minutes and try again…

Update 23:43 UTC:

We believe a large chunk of the CPU overload is due to cache swarming — many visitors simultaneously causing a re-render of the page due to an expired cache version. I’ve put in a temporary hack which will reduce the amount of rendering, but may cause some people to see out of date copies of the page.

Update 2009-07-02:

Here’s a link to Domas’s blog post with technical details on the cash swarming problem.

July 02, 2009 06:13 PM

Open Access News

Presentations from European OA meeting

The presentations from Open Access - What are the Economic Benefits? (Brussels, June 22, 2009) are now online:

July 02, 2009 05:42 PM

Comparative study says benefits of OA outweigh costs

Knowledge Exchange, Benefits of Open Access clearly outweigh costs in three European Countries, press release, July 1, 2009.

For Denmark, the United Kingdom and the Netherlands free access to scholarly materials could offer significant benefits not only to research and higher education but also to society as a whole. This has been calculated by Australian economist Professor John Houghton in studies which have taken place in these three countries on the costs and benefits of scholarly communication. He has now summarised these findings in a report commissioned by Knowledge Exchange, which is a partnership of the IT bodies from Denmark (DEFF), the United Kingdom (JISC), the Netherlands (SURFfoundation) and Germany (DFG). ...

Adopting this model could lead to annual savings of around EUR 70 million in Denmark, EUR 133 million in The Netherlands and EUR 480 in the UK. The report concludes that the advantages would not just be in the long term; in the transitional phase too, more open access to research results would have positive effects. In this case the benefits would also outweigh the costs. ...

See also our past posts on Houghton's research.

July 02, 2009 05:37 PM

ubiquity-firefox Google Group

Create Bookmarklet Command not matching

Hi,

I'm using 0.5pre3 and trying to use "create bookmarklet command". It
fails to match as soon as I start to enter the name of the
bookmarklet.

Bookmarklet name: RSS+

Input: create bookmarklet command
Suggestions: create bookmarklet command, help, disable command, tag,
tinyurl

Input: create bookmarklet command R

July 02, 2009 04:22 PM

How to control another tab and access its html

I use bigstring email and i want to make an ubquity function that
refreshes the tab that has my bigstring email in it and i want it to
display the part of the page which shows the number of unread emails.

July 02, 2009 03:58 PM

Google Book Search Blog

New ways to search within a book



At Google we want to make it easy for you to find the information you need. As such, we've made searching for passages within a book part of the core experience of Google Books.
Earlier this month we revamped the search experience to make searching inside a book easier. You can now view the context of a search result, sort results by relevancy or page order, and flip through results quickly while viewing the book.
Today I'm excited to announce one more addition to the experience of searching a book: search results in your scrollbar. Now when you search in a book, little hints will appear in the margin to indicate where you results are located. When you hover over one of these annotations, you'll get a quick preview of the search results and the option of jumping directly to the associated page. Here I searched Aunt Mary's New England Cook Book for pie recipes:


Previously, it was difficult to get a feel for where results were located in a book. You could count the page numbers and make a guess, but that's hardly efficient. Now there is a strong visual display of result locations, and often clusters will form around particular chapters or passages. This will help you navigate more easily between pages which contain your search term.
These annotations will both make navigation between results quicker and help users jump to the correct result.
As always, feel free to provide feedback. Happy searching!

July 02, 2009 04:48 PM

MusicBrainz Blog

The Wall Street Journal reviews TuneUp and Picard

The WSJ just posted a well balanced review of TuneUp and Picard.

Thanks for the nice write up, Geoff!

July 02, 2009 03:48 PM

Wikimedia Technical Blog

Improving Wikimedia’s Discussion System

Hi all,

Some of you might have already seen my blog posts about LiquidThreads, Wikimedia’s in-development discussion system.

For those who haven’t, this is a quick primer on what LiquidThreads is, and what it’s going to do for Wikimedia’s communities.

Currently, Wikimedia’s discussion system sucks. Here’s why:

Imagine being a new user and trying to figure out how to add your comment to this.

Imagine being a new user and trying to figure out how to add your comment to this.

Enter LiquidThreads. LiquidThreads is a system that makes MediaWiki’s discussion system behave like a forum or comments thread, while still maintaining the unique refinements that make wikis work. It was originally designed by a Google Summer of Code student, David McCabe, and I’ve been making incremental improvements to make it work for Wikimedia.

Overview of the new LiquidThreads interface

Overview of the new LiquidThreads interface

So, what’s changed?

If you’re interested, I’ve put together a test setup for you to play with it.

As always, questions, comments and suggestions are more than welcome, in the comments or elsewhere.

July 02, 2009 03:27 PM

Google Research Blog

International Conference on Machine Learning (ICML 2009) in Montreal



The 26th International Conference on Machine Learning (ICML 2009) was recently held in Montreal in conjunction with the 22nd Conference On Learning Theory (COLT 2009) and the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009). This is one of the major forums for researchers from both industry and academia to share the recent developments in the area of machine learning and artificial intelligence. Machine learning is a central area for Google as it has many applications in extracting useful information from a vast amount of data available on the web. In addition to sponsoring this scientific event, Google contributed intellectually to several scientific forums. Here's a short report of those activities:

Google's main mission is "to organize the world's information and make it universally accessible and useful," and machine learning plays a fundamental role in both of these aspects. As a result, Google has invested significant resources in this area of research, and we look forward to continued participation and collaboration at these conferences for many more years.

July 02, 2009 04:08 PM

NLP News

Systran Sponsors Machine Translation Summit XII

SYSTRAN Sponsors Machine Translation Summit XIIWELT ONLINESYSTRAN, the leading provider of language translation technologies, today announced it is a primary sponsor for the Twelfth Machine Translation Summit, ...and more »

July 02, 2009 02:58 PM

Systran Sponsors Machine Translation Summit XII

SYSTRAN Sponsors Machine Translation Summit XIISYS-CON Media (press release)SYSTRAN, the leading provider of language translation technologies, today announced it is a primary sponsor for the Twelfth Machine Translation Summit, ...and more »

July 02, 2009 02:49 PM

OpenGeoData

Help me make your map better

I’m trying an experiment with walking-papers. Get all my non-mapping friends to print out a map of their area, write on the print out the errors, house numbers etc and then I will do the rest. I’ve tweeted here: “Help me make your map better http://bit.ly/B8F5X – what you think?” You can too. Get your friends, family… [...]

July 02, 2009 01:11 PM

Open Access News

July SOAN

I just mailed the July issue of the SPARC Open Access Newsletter.  This issue takes a close look at OA and the variety of digitization projects.  How far can we defend the principle that the results of publicly-funded digitization projects should be OA?  What if the public funds are supplemented by private funds?  What if the works to be digitized are under copyright?  What if the project wants to provide gratis rather than libre OA?

The round-up section briefly notes 166 OA developments from June.

July 02, 2009 02:00 PM

Open Knowledge Foundation Blog

Open Knowledge Foundation Newsletter No. 11

Open Knowledge Foundation Newsletter No. 11 has just been sent out: Open Knowledge Foundation Newsletter No. 11 Welcome to the eleventh Open Knowledge Foundation newsletter! Contents: The OKF turns five and we need your support! Open Database License (ODbL) goes 1.0 European Open Data Inventory + Summit Launch of the Open Data Grid New developments on Public Domain Works Other news in brief Thanks to [...]

July 02, 2009 12:22 PM

ubiquity-firefox Google Group

Danish translation

Hi guys.
Greetings from roskilde festival!
I am sorry to report that I have not been able to find time (and
facilities) to complete the danish translation - I did some of it by
ssh'ing from my phone->firewall->laptop, but this is a tiresome proces
on a 15x41 char display... I suggest someone apply the msgid changes

July 02, 2009 11:59 AM

OpenGeoData

SOTM now just over 7 days away

Have you registered? Check out State of the Map. And, there are still many cheap travel options with easyJet and others.

July 02, 2009 11:53 AM

information aesthetics

Sputnik Observatory for the Study of Contemporary Culture

sptnk.jpg
Sputnik Observatory for the Study of Contemporary Culture [sptnk.org] is the latest project from interactive information design hero Jonathan Harris, well known from other info-aesthetics pieces such as I Want You to Want Me, Whale Hunt, Universe, Love Lines, We Feel Fine and Ten by Ten.

If you want to know more about some perplexing themes like Interspecies Communication, Urban Metabolism or 21 Senses, then this site is for you.

According to Chris himself: "The project is the result of a 2-year collaboration with New York-based Sputnik, Inc., an organization that documents contemporary culture through intimate video interviews with hundreds of leading thinkers in the arts, sciences and technology, covering a wide range of topics. The central premise of the Sputnik project is that everything is connected to everything else, and that topics and ideas that may seem fringe and even heretical to the mainstream world are in fact being investigated by leading thinkers working in fields as diverse as quantum physics, mathematics, neuroscience, biology, economics, architecture, digital art, video games, computer science and music. Sputnik is dedicated to bringing these crucial ideas from the fringes of thought out into the limelight, so that the world can begin to understand them.

Conducted over more than ten years and previously unavailable to the public, the interviews within the site chronicle some of the most provocative human ideas to have emerged in the last few decades. The site itself aims to highlight the interconnections between seemingly disparate thinkers and ideas, using a simple navigational system with no dead ends, where every thought leads to another thought, akin to swimming the stream of consciousness.

There are about 200 videos on the site today, and there will be thousands more added over the coming weeks, months, and years.

"

More information also at the sptnk blog. Via TEDChris.


July 02, 2009 10:09 AM

OpenSocial API Blog

Apache Shindig 1.0-incubating released

Apache Shindig aims to make it simple to create your own OpenSocial container by providing an open source implementation (in both Java and PHP) of the OpenSocial APIs. The Shindig team recently made creating and maintaining an OpenSocial container even easier, by publishing a release that supports OpenSocial v0.8.1.

Now, instead of checking out a specific revision or trying to keep up with the ever-changing trunk, OpenSocial container developers can use stable releases in their own websites. As issues come up, the Shindig community will fix them and roll them into the stable release, so developers will just need to grab the new version.

The Apache Shindig 1.0-incubating release is available on the downloads page of the Shindig website. If you've been running an older revision or branch, now's the time to update to known-good state. Of course, the Shindig folks have been busy, so if you're interested in new features, like templates and the streamlined JavaScript API, you can get all the OpenSocial v0.9 features by checking out the source -- and a stable release supporting OpenSocial v0.9 is already in the works!


July 02, 2009 11:01 AM

ubiquity-firefox Google Group

Is it possible to change the default language?

Hi,All

it's a great tool, I like it very much,
I have a question that, if I select a sentence and want to
translate it,
but it seemed that it can only translate it to english by default,
if I want to
translate it to another language, I have to type "tr some-sentence to
my-favorite-language",

July 02, 2009 09:47 AM

EFF.org Updates

ASCAP Makes Outlandish Copyright Claims on Cell Phone Ringtones

New York - The Electronic Frontier Foundation (EFF) urged a federal court Wednesday to reject bogus copyright claims in a ringtone royalty battle that could raise costs for consumers, jeopardize consumer rights, and curtail new technological innovation.

Millions of Americans have bought musical ringtones, often clips from favorite popular songs, for their mobile phones. Mobile phone carriers pay royalties to song owners for the right to sell these snippets to their customers. But as part of a ploy to squeeze more money out of the mobile phone companies, the American Society of Composers, Authors, and Publishers (ASCAP) has told a federal court that each time a phone rings in a public place, the phone user has violated copyright law. Therefore, ASCAP argues, phone carriers must pay additional royalties or face legal liability for contributing to what they claim is cell phone users' copyright infringement. In an amicus brief filed Wednesday, EFF points out that copyright law does not reach public performances "without any purpose of direct or indirect commercial advantage" -- clearly the case with cell phone ringtones. If phone users are not infringing copyright law, then mobile phone service providers are not contributing to any infringement.

"This is an outlandish argument from ASCAP," said EFF Senior Intellectual Property Attorney Fred von Lohmann. "Are the millions of people who have bought ringtones breaking the law if they forget to silence their phones in a restaurant? Under this reasoning from ASCAP, it would be a copyright violation for you to play your car radio with the window down!"

ASCAP has responded by saying that it does not plan to charge mobile phone users, just mobile phone service providers. But if ASCAP prevails, consumers could find themselves targeted by other copyright owners for "public performances." Worse, these wrongheaded legal claims cast a shadow over innovators who are building gadgets that help consumers get the most from their copyright privileges.

"Because it is legal for consumers to play music in public, it's also legal for my mobile phone carrier to sell me a ringtone and a phone to do it," said von Lohmann. "Otherwise it would be illegal to sell all kinds of technologies that help us enjoy our fair use, first sale, and other copyright privileges."

The Center for Democracy and Technology and Public Knowledge also joined the EFF brief.

For the full amicus brief:
http://www.eff.org/files/filenode/US_v_ASCAP/US%20v%20ASCAP%20EFF%20ATT%...

For more on this case:
http://www.eff.org/cases/us-v-ascap

Contact:

Rebecca Jeschke
Media Relations Director
Electronic Frontier Foundation
press@eff.org

July 02, 2009 07:32 AM

Linked Data Blog Aggregator

structWSF: A Framework for Collaboration Networks

structWFS

An Innovative, Distributed, Scalable Design with Dataset Access Rights

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via separate ontologies (schema with accompanying vocabularies).

The structWSF middleware framework is fully RESTful in design and is based on HTTP and Web protocols and open standards, conforming to what is known as a Web-oriented architecture. The initial structWSF framework comes packaged with a baseline set of about a dozen Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. It also has direct interfaces to the Virtuoso RDF triple store and the Solr faceted, full-text search engine.

This post follows the release of the alpha version of the open source structWSF code on the OpenStructs Web site. It is available for download under Apache 2 license.

But, Wait! There’s More!

These baseline capabilities are useful enough. But there is another foundation to structWSF that is quite innovative and exciting: Its explicit design to support collaboration networks. It is this aspect that is the focus of this current article.

The collaboration design is a result of the needs of the Bibliographic Knowledge Network (BibKN or BKN) [1]. BibKN has as one of its express purposes creating a network of collaborators in math and statistics, ranging from the individual researcher to departments and universities and various virtual organizations (VOs) representing different communities of interest. Moreover, this nucleus of researchers also has external collaborators ranging from major publishers to software and service providers of various sizes from around the globe.

Thus, one key requirement of the BKN project was to design an infrastructure responsive to this broad spectrum of interests, locations and organizations. And, besides questions of varying scale, locale and distribution, there was also the need to combine public and private data. In some cases, initial work products need to be kept within its sponsoring groups before being made public. Sometimes external publishers want to segregate network members by whether they are already paid subscribers or not. And, most importantly, the project had a mandate to create an easy and open framework for encouraging incipient collaborators and curators to add and take ownership of new datasets.

Boiled down, these requirements represent a completely fluid spectrum of scales, access rights, virtual groups and distributed locations. These requirements were daunting indeed to establish a workable and responsive framework. But, what has resulted from this mandate — structWSF — is a generalized solution that has applicability to collaboration within any knowledge network.

Four Exemplar Deployment Modes

BibKN anticipates and is to include four exemplar types of participants on the network (or “nodes’, which are not to be confused with the different meaning of node in Drupal):

Each of these nodes exposes its data to the rest of the network via a structWSF Web services framework. Each structWSF installation provides an access point and endpoint to the network. Through these installations, data is converted to “canonical” form for use by other nodes on the network with common tools and services provided.

In conceptual, form, then, the network can be represented as follows:

structWSF Data Model Relationships

Each node has a structWSF instance, the common network denominator, shown in blue.

A key aspect of each structWSF installation is dataset registration and access authorization. Only users with proper authorization may access or exercise certain privileges such as write or updates for a given dataset.

The other core Web services provided with structWSF are the CRUD functional services (create - read - update - delete), import and export, browse and search, and a basic templating system [see (3) in the next figure]. These are viewed as core services for any structured dataset. The current alpha release supports CSV, TSV, RDF/XML, RDF/N3 and XML, with JSON forthcoming shortly.

Rights: The Intersection of Web Service, Dataset, Group, Role and CRUD

The controlling Web service in structWSF is the Authentication/Registration WS [see (2) in the figure below]. The current alpha version of structWSF uses registered IP addresses as the basis to grant access and privileges to datasets and functional Web services. Later versions will be expanded to include other authentication methods such as OpenID, keys (à la Amazon EC2), foaf+ssl or oauth. A secure channel (HTTPS, SSH) could also be included.

A simple but elegant system guides access and use rights. First, every Web service is characterized as to whether it supports one or more of the CRUD actions. Second, each user is characterized as to whether they first have access rights to a dataset and, if they do, which of the CRUD permissions they have [see (4, 5)]. We can thus characterize the access and use protocol simply as A + CRUD.

structWSF Data/WS Access

Thereafter, a mapping of dataset access and CRUD rights (see below) determines whether users see a given dataset and what Web services (”tools”) are presented to them and how they might manipulate that data. When expressed in standard user interfaces this leads to a simple contextual display of datasets and tools. For example, under standard search or browse activities the user would only see results sets drawn from the datasets for which they have access. Similarly, users only see the tools that their CRUD rights allow.

At the Web service layer, these access values are part of the GET request. The system, however, is designed to more often be driven by user and group management at the CMS level via a lightweight plug-in or module layer.

Because a CMS may employ its own access system and protocols, the potential combinations can become quite large. Let’s take for an example a VO node in the BibKN scenario which layers Drupal (via the conStruct modules) over the structWSF framework. By including the additional third-party contributed Drupal module of Organic Groups, we also now add an entire dimension of group access to the standard roles access in the base Drupal [5]. So, in this scenario, we theoretically have these potential access and rights combinations:

Since the group and user role categories can be quite extensive, the combinatorial result of these options can also be quite large.

Nonetheless, as a general proposition, these access and rights dimensions can capture most any reasonable use case.

Patterned Profiles Aid Management

One way to ease the management of these choices at the UI level is to create a series of access patterns or templates — called profiles — to which a newly registered dataset can be assigned. While the Drupal site owner could go in and change or tweak any of the individual assignments, the use of such profiles simplify the steps needed for the majority of newly registered datasets (Pareto assumption).

For instance, consider these possible profile patterns:

We can now expand this concept for a given dataset by adding the dimension of user type or category. Four categories of users can illustrate this user dimension:

(Of course, with a multitude of groups, there are potentially many more than four categories of users.)

To illustrate how we can collapse this combinatorial space into something more manageable, let’s look at what one of the profile cases noted above — that is the Public profile — can now be expressed as a pattern or template. In this example, the Public profile means that owners and some groups may curate the data, but everyone can see and access the data. Also note that export is a special case, which could warrant a sub-profile.

We also need to relate this Public profile to a specific dataset. For this dataset, we can characterize our “possible” assignments as described above as to whether a specific user category (O, G, R and P as noted above) has available a given function (), gets permission rights to that function by virtue of the assigned profile (), or whether that function may also be limited to a specific group or groups () or not.

Thus, we can now see this example profile matrix for the Public profile for an example dataset with respect to the available structWSF Web services:

Data Access Matrix

Note, of course, that these options and categories and assignments are purely arbitrary for our illustrative discussion. Your own needs and circumstances may vary wildly from this example.

Matrices such as this seem complex, but that is why profiles can collapse and simplify the potential assignments into a manageable number of discrete options. The relevant question, with a quick answer, is for you to assemble profiles responsive to your own specific circumstances.

And, of course, if your pre-packaged profiles need to be tweaked or adjusted for a particular circumstance, the CMS enables all assignments to be accessed in individual detail.

A Powerful Vision

Via this design, knowledge and collaboration networks can be deployed that support an unlimited number of configurations and options, all in a scalable, Web-accessible manner. The data that is accessed is automatically expressed as linked data. This same framework can be layered over in situ existing data assets to provide data federation and interoperable functionality, all responsive to standard enterprise concerns regarding data access, rights and permissions.

This is not science fiction, and this is not complex. When combined with its data mixing and conversion potentials [3], we can now see emerging a general framework that enables access and interoperability to virtually any data source and for virtually any purpose, with permissions and rights built in, anywhere and everywhere across the Web.

These are exciting prospects that were not possible until Web-oriented architectures with structured RDF data came to the fore. There are no longer any barriers to the powerful vision of complete data access and interoperability without disrupting existing assets.

And the mere thought of that, is, disruptive, indeed.

Note: The alpha version of structWSF and its related conStruct modules are somewhat raw or incomplete in some ways. A few of the functions expressed in this posting have not yet been released in these code bases.
[1] BibKN is a project to develop a suite of tools and services to encourage formation of virtual organizations in scientific communities of various types. The project started in September 2008 with funding by the NSF Cyber-enabled Discovery and Innovation (CDI) Program. The major participating organizations are the American Institute of Mathematics (AIM), Harvard University, Stanford University and the University of California, Berkeley. Research support to BibKN has come in part from NSF Award 0835851. [2] structWSF is actually combined with the conStruct structured content system and Drupal for the delivery of the VO nodes. [3] See the earlier posting on, structWSF: A Framework for Data Mixing, for discussion about structWSF data formats. [4] BibJSON is the standard, human-readable and editable data exchange format used within the BKN project. It has a standard attribute vocabulary geared to bibliographic material and is based on the JSON (JavaScript Object Notation) data notation. [5] Though the specifics may differ, including the modules and add-ins, other leading CMS systems provide similar functionality.

July 02, 2009 06:13 AM

tesseract-ocr Google Group

Return codes and their descriptions

Does anyone know where I can find all the possible return (or error)
codes that Tesseract may output, and their corresponding descriptions?
For instance, 0 is returned for successful recognition. What about 1,
29, 31, or others? What do they mean? I want to display meaningful
error messages when Tesseract fails.

July 02, 2009 03:27 AM

Wikimedia Technical Blog

First usability release, Acai, is now available.

Screenshot-Editing July 1 Wikipedia

The first usability release, Acai, hit Wikipedia and sister projects this afternoon. The new skin, Vector, and the enhanced toolbar can be turned on from the user preference under “Appearance” and “Editing”. Search result page now has a new layout with less daunting information. Vector is only available for left-to-right languages at a moment due to IE6 incompatibility. However, the enhanced toolbar can be selected from all languages and the new search result page is enabled globally. We could not roll out two features we had planned. First, warning messages for unsaved changes when a user switches away from the edit tab did not work properly thus they are disabled. So please be careful when you switch away from the edit tab. Secondly importing language specific configuration for special characters were not graceful, so we disabled special character function from the toolbar. We are working on the fixes and plan to roll them out as soon as we have stable solutions. The usability project wiki has Vector and the new toolbar as a default, so if you prefer to check them out without changing your preferences it is a good place to visit first. Let us know what you think. We would love to hear from you.

Best,

Naoko

July 02, 2009 02:55 AM

NLP News

Naughty Feeds

Do you have a naughty feed? Come on, admit it. You deliberately left out the title, or did you put in an empty summary? Maybe you’re the one who doesn’t put in any dates, or perhaps you set the permalink...

July 02, 2009 02:26 AM

Arabic software firm buys US-based Dial Directions

Arabic software firm buys US-based Dial DirectionsTMCnetSakhr, which has a strong presence both globally and in the Middle East, is known for its rich Arabic natural language processing (NLP) knowledge base with ...and more »

July 02, 2009 02:20 AM

Open Access News

Feedback sought on citation sharing service

A Citation Services draft project proposal, drafted at a recent workshop in Amsterdam, is now soliciting feedback. For background, see posts by the JISC Information Environment Team and Alma Swan.

July 02, 2009 12:18 AM

July 01, 2009

Science Commons

WisconsinView dedicates 6+ terabytes of data to the public domain

As of July 1, WisconsinView, an effort to make available a variety of types of imagery for the state of Wisconsin, will make their data available in the public domain via CC0. This news was brought to us by Puneet Kishor, a Science Commons fellow. From the press release: “Since 2004, WisconsinView  has made aerial photography and [...]

July 01, 2009 11:12 PM

July 02, 2009

Open Access News

Victoria committee recommends encouraging, not requiring, OA

The Economic Development and Infrastructure Committee of the Parliament of Victoria, Australia on June 24 released the final report of its Inquiry into Improving Access to Victorian Public Sector Information and Data. (Thanks to Dave Bath.)

See especially Recommendation 8:

That the Victorian Government encourage as part of its funding agreements with research agencies and higher education institutions that research results be deposited in open access journals or repositories. The Government should consider providing additional funds to these agencies to allow them to publish in open access journals that charge a fee for publication.

From the report:

In its report Public sector support for science and innovation, the [Australian Government] Productivity Commission argued that mandatory requirements would better meet the aim of free and public access to publicly-funded research results. This is despite claims that requiring publicly funded research to be made available via open access could have a detrimental impact on the journal publishing industry. According to the Australian Publishers Association, the increasing availability of peer-reviewed manuscripts in repositories “will lead to cancellations and the eventual demise of the journal upon which their peer-reviewed process depends.” A possible solution, as noted by the Productivity Commission, is the ”author pays” approach whereby authors are responsible for paying publishers or repositories a fee on the basis that the publication is publicly and freely accessible. ...

While it would be difficult for the Victorian Government to require research agencies and higher education institutions to completely comply with an open access policy, it does have a role in encouraging this practice. The Government should encourage, as part of its funding agreements with these organisations, that research results be deposited in open access journals or repositories. The Committee believes this is an important step to maximise the value of the Government’s research and development investment, and further contribute to scientific research and innovation.

July 02, 2009 12:00 AM

July 01, 2009

Semantic Forms Google Group

Version 1.7.3: fixes for HTML-escaping of characters, SMWSQLStore support removed, etc.

Hi all,
Version 1.7.3 of Semantic Forms has been released. In this version:
- there were further fixes for HTML-escaping of characters. which I thought
I had fixed in the last version, but I had actually made the problem
somewhat worse. The issue is that the handling of HTML-escaped characters
(like "&") has to be different for different kinds of inputs: regular

July 01, 2009 10:22 PM

Open Access News

BMC adds 'Post to Twitter' button

Matthew Cockerill, BioMed Central and Twitter, BioMed Central Blog, June 24, 2009.

Recently we have noticed more and more researchers using Twitter as an informal channel to share thoughts on the latest open access research published in our journals. We're always keen to facilitate such discussions, and with that in mind we have recently added 'Post to Twitter' as a convenient option in the right hand toolbar of each BioMed Central journal article.

We've also in the early stages of using Twittter ourselves - you can follow us as BioMedCentral.

So far, our Twitter feed includes blog posts and hot article notifications, along with various short updates and links relating to BioMed Central and open access publishing. ...

July 01, 2009 11:10 PM

Forthcoming libre OA journal on stem cells

Stem Cell Research & Therapy is a forthcoming peer-reviewed OA journal published by BioMed Central. See the June 26 announcement. Authors retain copyright and articles are published under the Creative Commons Attribution License. The article-processing charge is $1690, subject to discounts or waiver.

July 01, 2009 10:59 PM

Most BMC journal impact factors increase

Matthew Cockerill, New and improved impact factors for BioMed Central journals in the 2008 JCR, BioMed Central Blog, June 24, 2009.

The latest edition of Thomson Reuter's Journal Citation Reports has just been released, with official Impact Factors for a total of 58 BioMed Central journals [Note: 59 to my count]. Impact factors are by no means a perfect quality metric, but these journal citation data provide strong evidence of the growing success of BioMed Central's open access journal portfolio.

Highlights include:

Of the 59 IFs for BMC journals listed in the post, 12 are new, 29 are improved, and 18 are not improved.

July 01, 2009 10:52 PM

UNESCO releases its first openly licensed publication

UNESCO releases new publication on open educational resources, press release, June 26, 2009. (Thanks to Mike Linksvayer.)

UNESCO has released its first openly licensed publication. Open Educational Resources: Conversations in Cyberspace brings together the background papers and reports from the first three years of activities in the UNESCO OER Community. Access the online edition – or buy the book! ...

In particular, the license is Creative Commons Attribution-Noncommercial-Share Alike.

July 01, 2009 10:26 PM

Wikimedia Technical Blog

Open Translation Tools 2009 report

View of the towers of De Waag, Amsterdam With six projects in over 250 languages, multilingual communication and content translation are big priorities for us. That’s one reason I was excited to go to the Open Translation Tools 2009 conference and be in the same room with 80 other translators, content providers and developers all working in the open translation space. Another reason is that the conference was held in Amsterdam in the old city center, in a beautiful venue right by one of the canals.

We have some amazing opportunities to collaborate with folks on other projects, from translation memory based systems like that in use by the World Wide Lexicon to source code string repository interfaces like Transifex. As one person put it, the perfect testbed for crowd-sourced translation is Wikipedia; if we can’t make it work there, where can it work? I also had a chance to talk with Gerard Meijssen and Siebrand Mazeland about new ways to facilitate tighter integration with translatewiki.net and to encourage more projects to make use of the translatewiki facilities. It should be a really productive year.

Folks told me to go visit the Van Gogh Museum, so I was dismayed to find that they don’t allow photography. However, the Wiki Loves Art NL project, organized by the NL Wikimedia chapter, had reached an agreement with the museum to allow two small groups in for photographs, during the week I happened to be there! So, come Tuesday morning, I was one of 20 lucky Wikimedia community members and photojournalists to be given private access to the Van Gogh collection. Some photos from the group are already available on the flickr group from which they will be uploaded to the Commons.

Right after the conference I went to the first two days of the OTT book sprint, which had as its goal the production of a comprehensive manual for beginner volunteer translators of open content with open tools. Once again we were in an awesome venue (see the picture; we were in one of the turrets!) and under the expert guidance of Adam Hyde we got a huge amount of content generated in just a few days.

On the last day I skipped town to go visit a colleague on one of the Wikimedia projects; we’ve worked closely together for over two years and had never met face to face. Perhaps that was the most important part of the whole trip: bringing our virtual community into the real world one person at a time.

July 01, 2009 09:19 PM

Browse Blogs

The Far-reaching Implications of Licence Violation

When we think about the implications of non-compliance with F/OSS licensing the considerations tend to be around legal exposure and concern that the conditions of a reciprocal license, e.g. the GPL, may propagate into proprietary code. When risk is assessed it is usually in terms of the possibility of litigation and the associated costs, and/or the weakening of business models that are based upon exercising exclusive rights in connection with associated proprietary IP.
 

read more

July 01, 2009 08:00 PM

Open Access News

Forthcoming libre OA journal on water

Water is a forthcoming peer-reviewed OA journal on "the ecology and management of water resources" published by Molecular Diversity Preservation International. Authors retain copyright and articles are published under the Creative Commons Attribution license. There are no article-processing charges in 2009; I can't tell if there will be later.

July 01, 2009 08:50 PM

OCLC scraps WorldCat data policy, will write new one

OCLC, Review Board on Principles of Shared Data Creation and Stewardship releases final report, press release, June 26, 2009.

The Review Board on Principles of Shared Data Creation and Stewardship, convened jointly by the OCLC Board of Trustees and Members Council to represent the membership and inform OCLC on matters concerning shared data, has issued its final report recommending that the proposed Policy on Use and Transfer of WorldCat Records be withdrawn and a new policy drafted.

After review of the recommendations, OCLC has formally withdrawn the proposed policy. A new group will soon be assembled to begin work to draft a new policy with more input and participation from the OCLC membership. ...

In May, Jennifer Younger, Review Board Chair, and Edward H. Arnold Director of Hesburgh Libraries, University of Notre Dame, presented a report to OCLC Members Council recommending that the proposed policy be formally withdrawn and a new policy should be drafted. "We affirm that a policy is needed, but not this policy," said Dr. Younger. ...

[S]aid Jay Jordan, OCLC President and CEO: "Soon we will announce a new initiative to develop a record use policy that reflects both the rights of individual libraries and the needs of the cooperative to sustain and grow WorldCat for future generations. ..."

A new group will be named to begin work to draft a new policy. Until a new policy is in place, OCLC has reaffirmed the existence and applicability of the “Guidelines for the Use and Transfer of OCLC-Derived Records,” which have been in place since 1987, as recommended by the Review Board. ...

See also our past posts on WorldCat or OCLC.

July 01, 2009 08:37 PM

Milestone for IR at U. Liège

Myriam Bastin, 12,000 references in ORBi, the institutional repository of the University of Liège, announcement, June 26, 2009.

Just six months after its official launch (November 2008), ORBi, the institutional repository of the University of Liège (ULg), has reached 12,000 deposits and gives access to the full texts of almost 9,000 publications! These impressive figures are the results of a voluntary Open Access policy at the University of Liège, which has defined the "mandate ULg", ie the obligation for all researchers to deposit in ORBi the references of all scientific publications since 2002 and the full texts of all scientific articles since the same year. Free access to them is conditioned by respect for copyright. This success also reflects the very positive reaction of by researchers regarding this policy and this new way of visibility. ...
See also our past posts on ORBi and the University of Liège.

July 01, 2009 08:30 PM

WorldWideScience adds new discovery, sharing features

U.S. Department of Energy Office of Scientific and Technical Information, Find and share global research with new tools at WorldWideScience.org, press release, June 26, 2009.

You can now quickly hone your research results list to the documents you need and then share them via social networking sites using the new features at WorldWideScience.org. This free online science gateway to global databases now offers clustering of results by publication and author, as well as by topic and date. This enhancement allows you to quickly narrow a results list from the databases ...

Using a quick share tool, you can add your results to social networking sites to discuss and share with friends and colleagues. In addition, you can easily bookmark your search topic as well as set up weekly alerts.

WorldWideScience.org has been upgraded for increased speed and improved relevance ranking. WorldWideScience.org searches more than 375 million pages of research information in real time via a single query. ...

See also our past posts on WorldWideScience.

July 01, 2009 08:29 PM

Pharmacy Education journal converts to OA

International Pharmaceutical Federation, FIP Re-Launches Pharmacy Education, An International Journal for Pharmaceutical Education, announcement, June 30, 2009.

The International Pharmaceutical Federation (FIP) is pleased to announce the online re-release of Pharmacy Education, an International Journal for Pharmaceutical Education. Previously published in hard copy circulation by Informa Publishing, Pharmacy Education is now an official FIP Electronic Publication, available online free of charge. The online publishing and re-release has been made possible by the support of the World Health Organization (WHO) and in collaboration with the European Association of Faculties of Pharmacy (EAFP).

Pharmacy Education will continue to be an independent, peer-reviewed academic publication ...

A new online format provides a comprehensive and interactive environment which encourages increased feedback and communication on published articles (including all previously published archives since 2000), related international events and relevant global issues in the field of pharmacy and pharmaceutical sciences education.

"The journal has always aimed to disseminate the latest research and information in pharmacy education," said Professor Ian Bates, Editor-in-Chief of the journal. "The new, open access format will allow for a broader reach to all audiences, especially to researchers from low income countries seeking engagement with the wider global community." ...

Note that access to the full text requires free registration.

July 01, 2009 08:09 PM

Obama Ed. department drafting plan to fund OERs

Scott Jaschik, U.S. Push for Free Online Courses, Inside Higher Ed, June 29, 2009. (Thanks to Kevin Donovan.)

Community colleges and high schools would receive federal funds to create free, online courses in a program that is in the final stages of being drafted by the Obama administration.

The program is part of a series of efforts to help community colleges reach more students and to link basic skills education to job training. The proposals are outlined in administration discussion drafts obtained by Inside Higher Ed. A formal announcement could come in the next few weeks. ...

John White, press secretary for the Education Department, said Sunday that the department would discuss the plans "when the time is right." He said that there is a lot of "high level discussion and excitement" around these ideas related to community colleges.

The funds envisioned for open courses -- $50 million a year -- may be small in comparison to the other ideas being discussed. But in proposing that the federal government pay for (and own) courses that would be free for all, as well as setting up a system to assess learning in those courses, and creating a "National Skills College" to coordinate these efforts, the plan could be significant far beyond its dollars.

The draft language suggests that the administration is throwing its weight behind the movement to put more courses online -- and offer them free -- and is also pushing that movement in the direction of community colleges. ...

According to the draft materials from the administration, the program would support the development of 20-25 "high quality" courses a year, with a mix of high school and community college courses. Initial preference would go to "career oriented" courses. The courses would be owned by the government and would be free for anyone to take. ...

While the program is described as one that emphasizes community colleges and high schools, it would be open to public agencies and to private for-profit or nonprofit groups.

Advocates for open courses guess that the proposal reflects the ideas of Martha J. Kanter, the under secretary of education. Kanter was previously chancellor of the Foothill-De Anza Community College District. In that position, she helped to create the Community College Consortium for Open Education Resources, which has pioneered the idea of making textbooks and other course materials for community college students available free and online. ...

July 01, 2009 08:05 PM

NLP News

PASW Text Analytics for Surveys (spss) reviewed

PASW Text Analytics for Surveys (SPSS) reviewedResearch MagazineSPSS designed TAfS around the natural language processing method of text analysis. This is based on recognising words or word stems, and uses their ...and more »

July 01, 2009 04:21 PM

BioMed Central

Robotic lower limb exoskeletons – a new thematic series published in Journal of NeuroEngineering and Rehabilitation

Recent advances in materials and technology mean that the field of robotic exoskeletons is full of new and exciting potential. The purposes of and uses for exoskeletons are continually expanding, as is demonstrated in the series Robotic lower limb exoskeletons, edited by Dr Daniel Ferris and published in Journal of NeuroEngineering and Rehabilitation.
 
Introduced by Dr Ferris’ commentary ‘The exoskeletons are here’, the nine articles in this series cover diverse topics ranging from robotic movement training after neurological injurygait training after stroke, and energy harvesting exoskeletons that function by converting mechanical work at the knee into electrical energy.

Journal of NeuroEngineering and Rehabilitation is overseen by Editor-in-Chief Paolo Bonato and a prestigious Editorial Board. For more information, please see the journals ‘About’ page.
 

July 01, 2009 04:08 PM

ubiquity-firefox Google Group

[ubiquity] weekly meeting in 8 hours!

Our next Ubiquity weekly meeting is 8 hours from now. Join us to discuss
Ubiquity 0.5 official release and more!
WHEN: 5pm pacific - or, in your timezone:
[link]
WHERE/HOW:
• IRC channel: #ubiquity
• Dial in:

July 01, 2009 04:01 PM

ocropus Google Group

beam search failed?!

Hi guys,

After no lucking trying to install Ocropus on my Centos box, I now
followed closely and successfully installed it on my ubunbu 9.04. I
tested about 10 images and none of them seems to work. I copied some
of output and hope someone can help me out.

I got

test@ubuntu:~/ocrtest$ ocropus page 32047.png

July 01, 2009 03:51 PM

ocr-layout directory

Hi,
In my adventure with OCRopus I stumbled upon the contents of the ocr-
layout directory. I wanted to try out some page segmentation algorithm
implementations present there (besides RAST). I wanted to compile the
ocr-pageseg-wcuts.cc file, but got a lot of errors of missing var
types (probably because of wrong headers included)...

July 01, 2009 03:16 PM

NLP News

Informatics in Radiology: Render: An Online Searchable Radiology Study Repository.

Related Articles Informatics in Radiology: Render: An Online Searchable Radiology Study Repository. Radiographics. 2009 Jun 29; Authors: Dang PA, Kalra MK, Schultz TJ, Graham SA, Dreyer KJ Radiology departments are a rich source of information in the form of digital radiology reports and images obtained in patients with a wide spectrum of clinical conditions. A free text radiology report and image search application known as Render was created to allow users to find pertinent cases for a variety of purposes. Render is a radiology report and image repository that pools researchable information derived from multiple systems in near real time with use of (a) Health Level 7 links for radiology information system data, (b) periodic file transfers from the picture archiving and communication system, and (c) the results of natural language processing (NLP) analysis. Users can perform more structured and detailed searches with this application by combining different imaging and patient characteristics such as examination number; patient age, gender, and medical record number; and imaging modality. Use of NLP analysis allows a more effective search for reports with positive findings, resulting in the retrieval of more cases and terms having greater relevance. From the retrieved results, users can save images, bookmark examinations, and navigate to an external search engine such as Google. Render has applications in the fields of radiology education, research, and clinical decision support. (c) RSNA, 2009. PMID: 19564253 [PubMed - as supplied by publisher]

July 01, 2009 01:36 PM

ubiquity-firefox Google Group

Problem with "weather"

Typing "weather london, uk" gets me a top choice of "tag weather
london,uk", which is clearly a bug. The command "weather near london,
uk" works fine, but I don't want to have to type "near".

Also, I'm pretty sure that in the stable version just typing "weather"
gives you your local weather in the preview pane; but on my current

July 01, 2009 11:09 AM

OpenGeoData

Vote Steve

Directions Magazine has opened up a vote for the “Most Influential” in Geospatial for the next 5 years. Others up include Jack and Ed

July 01, 2009 09:31 AM

information aesthetics

DD4D Conference Best-Of Coverage (Guest Post)

dd4d3.jpg
From June 18-20 we attended the DD4D (= Data Designed for Decisions) [dd4d.net] conference at the OECD Conference center in Paris. The conference was organized both by the IIID and the OECD with the goal to bring together statisticians, information designers, visualization researchers, and practitioners (or as the conference stated: "intermediaries between data, knowledge and empowerment"). The conference organizers had invited a number of amazing speakers, including such celebrities as Hans Rosling and Robert Horn.

The overarching question raised by most speakers was how to go from data and information to decisions and actions with a focus on both traditional and emerging deciders such as politicians and executives and also citizens and consumers. To address the challenge of making sense of large data quantities, several speakers discussed the potential of storytelling for communicating complex issues as well as the power of numbers in the form of social indicators and benchmarks. There were too many interesting talks/sessions to write about here, so below you will find our personal (of course highly subjective) best-of list.


July 01, 2009 09:25 AM

Open Medicine Blog blogs

Top 25 Medical Applications - iPhone 3GS 2009

List of 25 iPhone medical applications

  1. 3D4Medical Skeletal System Application for the iPhone3GS
  2. AirStrip OB - remote patient monitoring
  3. Apple iPhone3GS demonstration
  4. DoctorCalc: Medical Apps for the iPhone and iPod touch new2.gif
  5. Doctors Hangout - Social Networking "microblogging" on iPhone
  6. DynaMed and the iPhone - stay tuned for other EBSCO announcements
  7. Epocrates demonstration
  8. Glasgow Coma Scale
  9. Heart It iPhone demonstration
  10. ICD9 Consult for the iPhone 2000 new2.gif
  11. iChart Sync - CareTools for the iPhone new2.gif
  12. MacPractice iPhone Interface
  13. Mediquations - Medical Calculator for iPhone and iPod Touch
  14. Medical eponyms database for handheld devices
  15. Merck Medicus - PDA tools
  16. MIMvista - Presents Multi-modality Imaging on the iPhone™
  17. Modality Learning: we make small screens smarter and potential iPhone applications for medical students
  18. Podcasts and Videocasts
  19. PubGet Mobile
  20. QxMD - Free medical software for the iPhone
  21. SonoAccess™ Medical Ultrasound iPhone application new2.gif
  22. Skyscape on the iPhone
  23. Unbound Medicine - iPhone Medical Applications
  24. Unbound MEDLINE
  25. Wikipanion - wikipedia for mobiles new2.gif

See also Yale's Cushing/Whitney Library Mobile site and PDAs, Handhelds and Mobile Technologies in Libraries


July 01, 2009 09:16 AM

Google Book Search Blog

Explore a book in 10 seconds



In his 1979 novel Se una notte d'inverno un viaggiatore (If on a winter's night a traveler), Italian writer Italo Calvino imagines a character, Lotaria, who uses an "electronic brain" to read her books. Her computer can read a book "in a few minutes", and show her all the words in it, sorted by frequency. In fact, Calvino was fascinated by the research of Mario Alinei, who in the late 1960s created Spogli Elettronici dell'Italiano Contemporaneo, an academic analysis of Italian literary masterworks (including Calvino's Il sentiero dei nidi di ragno).

Alinei's team looked at words used in the Italian language over time, noting changes in their frequency. You can imagine how this work was done forty years ago: operators punching computing cards, a big mainframe computer being fed words overnight, and an encoded output that had to be typeset again into book form.

Now our computing infrastructure can do Alinei's work in a few seconds. Starting today, you'll find a cloud of "Common Terms and Phrases" on the Book Overview page for some of our books. This cloud represents the distribution of words in a book: big terms are more common in the book, while small terms are rarer.



As with the other features on the Book Overview page, the word cloud is meant to offer a new way to explore our catalog. If you are trying to learn about Italian art, a search in our index will find many good books on the Renaissance period. Use the cloud of common terms to tell what each book is about. For example, The Renaissance is more focused on the "canon" of art (see the emphasis to beauty, Greek models, poetry of art), while Renaissance Art casts light on the role of patrons in the art scene (patrons, commission, family). After this 10-second glance at the contents, you can choose which book to study next. Happy reading!

July 01, 2009 10:04 AM

information aesthetics

Typographic Reinterpretation of Cunningham's Dancing Hands

ascenders_descenders.jpg
Ascenders & Descenders is a typographic reinterpretation of Merce Cunningham's dancing hands as recorded by OpenEnded Group for the Loops project.

The piece is a Cunningham dance work reconstructed from textual deconstructions of other Cunningham dance works. Each finger has an associated excerpt from an article, review, or essay on Cunningham from the last 5 decades. These texts become the "ink" with which each finger manifests its movements. Each text is dynamically typeset in 3 dimensional space along the curves traced by his fingertips.

The software keeps track of various movement parameters which it uses to modulate aspects of the visualization such as letter size, camera position, angle, and zoom. Merce not only dances the dance, but becomes typesetter and cinematographer, conducting the audience's view of the dance.

What, from the outside, appear to be subtle manipulations of the hands become a beautiful tangle of diving flocks and waterfalls of letters. Presenting dance in this way, we hope to get closer to the experience of the dance from the inside out.

Watch the video below.

Thnkx John.


July 01, 2009 08:51 AM

Semantic Forms Google Group

Mini query inside a form

I am creating a wiki that has a wiki pages of tests. Test pages
automatically have a page name of test ID to ensure that each has a
unique name. They also have a name and description. They are created
with a semantic form.

When an article is tested and users enter the results through a
semantic form, they need to specify the ID of what test was conducted

July 01, 2009 07:06 AM

JMLR

Multi-task Reinforcement Learning in Partially Observable Stochastic Environments; Hui Li, Xuejun Liao, Lawrence Carin; 10(May):1131--1186, 2009.

We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent's choice of actions is directly based on this conditional distribution, without an intervening model to characterize the environment itself. We propose off-policy batch algorithms to learn the parameters of the RPRs, using episodic data collected when following a behavior policy, and show their linkage to policy iteration. We employ the Dirichlet process as a nonparametric prior over

July 01, 2009 06:55 AM

Universal Kernel-Based Learning with Applications to Regular Languages; Leonid (Aryeh) Kontorovich, Boaz Nadler; 10(May):1095--1129, 2009.

We propose a novel framework for supervised learning of discrete concepts. Since the 1970's, the standard computational primitive has been to find the most consistent hypothesis in a given complexity class. In contrast, in this paper we propose a new basic operation: for each pair of input instances, count how many concepts of bounded complexity contain both of them. Our approach maps instances to a Hilbert space, whose metric is induced by a universal kernel coinciding with our computational primitive, and identifies concepts with half-spaces. We prove that all concepts are linearly separable under this mapping. Hence, given a labeled sample and

July 01, 2009 06:55 AM

An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity; Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér; 10(May):1071--1094, 2009.

We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that

July 01, 2009 06:55 AM

Fourier Theoretic Probabilistic Inference over Permutations; Jonathan Huang, Carlos Guestrin, Leonidas Guibas; 10(May):997--1070, 2009.

Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the "low-frequency" terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for

July 01, 2009 06:55 AM

On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality; Wenxin Jiang; 10(Apr):977--996, 2009.

The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing. We will consider two examples with high dimensional predictors: autoregression (AR) with l1-loss, and ARX model with variable selection for sign classification, which uses both lagged responses and exogenous predictors.

July 01, 2009 06:55 AM

Nonextensive Information Theoretic Kernels on Measures; André F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, Mário A. T. Figueiredo; 10(Apr):935--975, 2009.

Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and the Jensen-Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon's information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type divergences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon's entropy. The notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we introduce

July 01, 2009 06:55 AM

Java-ML: A Machine Learning Library; Thomas Abeel, Yves Van de Peer, Yvan Saeys; 10(Apr):931--934, 2009.

Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license.

July 01, 2009 06:55 AM

Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods; Holger Höfling, Robert Tibshirani; 10(Apr):883--906, 2009.

We consider the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact algorithm starts with the pseudo-likelihood solution and then adjusts the pseudo-likelihood criterion so that each additional iterations moves it closer to the exact solution. Our results show that this procedure is faster than the competing exact method proposed by Lee, Ganapathi, and Koller (2006a). However, we also find that

July 01, 2009 06:55 AM

Stable and Efficient Gaussian Process Calculations; Leslie Foster, Alex Waagen, Nabeela Aijaz, Michael Hurley, Apolonio Luis, Joel Rinsky, Chandrika Satyavolu, Michael J. Way, Paul Gazis, Ashok Srivastava; 10(Apr):857--882, 2009.

The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.

July 01, 2009 06:55 AM

Consistency and Localizability; Alon Zakai, Ya'acov Ritov; 10(Apr):827--856, 2009.

We show that all consistent learning methods---that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y)---are necessarily localizable, by which we mean that they do not significantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be defined in a non-local manner, such as support vector machines in classification and least-squares estimators in regression. Aside from showing that consistency implies a specific form of localizability, we also show that

July 01, 2009 06:55 AM

NLP News

Arabic software firm buys US-based Dial Directions

Combined technology can turn any iPhone, BlackBerry or Windows Mobile device into a voice and text translator

July 01, 2009 06:47 AM

Will Computers Replace Humans?

The Computer, one can safely predict, will be adjudged to have been the ultimate technological symbol of our century. Although not nearly as common as a car or a TV set (the runners-up in the race for the ultimate technological symbol), it has affected our views, our attitudes, and our outlook in more subtle and disquieting ways. Content Type Book ChapterDOI 10.1007/978-0-8176-4775-9_18 Book Series Modern Birkhäuser Classics Book Discrete ThoughtsDOI 10.1007/978-0-8176-4775-9Online ISBN 978-0-8176-4775-9Print ISBN 978-0-8176-4774-2 Book Part Part 3

July 01, 2009 06:10 AM

Husserl

The best philosophers of our century suffer from a common deficiency of expression. They seem bent upon making an already difficult message all but unintelligible by irritating mannerisms of style. For example, in Wittgenstein we meet a barrage of epigrammatic cryptography suited only for the Oxbridge market; in Heidegger truth is subordinated to alliteration and to a cunning desire to anger the reader by histrionic displays of German archaisms; Ortega would bury his finest insights in prefaces to his friends’ collections of Andalusian poems or in Sunday supplements of Argentine dailies, while feeding the grand public a dubious Kitsch calculated to keep himself financially afloat; Croce would use his pen to fly away from unpleasant Fascist reality into the anecdotes of the Kingdom of Naples of yore; Nicolai Hartmann was subject to attacks of graphomania; and so on, all the way to Sartre. Small wonder that the intellectual public, repelled by such antics, should fall into the arms of a demimonde of facile simplifiers and sweeping generalizes. The Russells, the Spenglers, the Toynbees, and their third-rate cohorts have lowered the understanding of philosophy to a level unseen since the seventh century. Content Type Book ChapterDOI 10.1007/978-0-8176-4775-9_15 Book Series Modern Birkhäuser Classics Book Discrete ThoughtsDOI 10.1007/978-0-8176-4775-9Online ISBN 978-0-8176-4775-9Print ISBN 978-0-8176-4774-2 Book Part Part 3

July 01, 2009 06:10 AM

ICT Tools and Systems Supporting Innovation in Product/Process Development

Information and communication technology (ICT) plays a key role in modern innovation and new product/process development. The chapter is dedicated to ICT tools supporting product/process innovation achieved throughout the development process. A general overview of such ICT tools is provided, specifically addressing Knowledge-based engineering systems and reasoning methods/tools, as well as tools to support innovation process in the Extended Enterprise context. Standardization aspects are of special relevance for application of ICT systems in industrial innovation processes. Due to their importance for the modern innovation processes, ICT to support collaborative product/process development and innovation and ontology management are addressed in more detail. Content Type Book ChapterDOI 10.1007/978-1-84882-545-1_4 Book Innovating in Product/Process DevelopmentDOI 10.1007/978-1-84882-545-1Online ISBN 978-1-84882-545-1Print ISBN 978-1-84882-544-4

July 01, 2009 06:10 AM

Exploring Concepts’ Semantic Relations for Clustering-Based Query Senses Disambiguation

For most Web searching applications, queries are commonly ambiguous because words usually contain several senses. Traditional Word Sense Disambiguation (WSD) methods use statistic models or ontology-based knowledge models to find the most appropriate sense for the ambiguous word. Since queries are usually short and may not provide enough context information for disambiguating queries, more than one appropriate interpretation for ambiguous queries may be found. Thus, it is not always reasonable for finding only one interpretation of the query. In this paper, we propose a cluster-based WSD method, which finds out all appropriate interpretations for the query. Because some senses of one ambiguous word usually have very close semantic relations, we may group those similar senses together for explaining the ambiguous word in one interpretation. Content Type Book ChapterDOI 10.1007/978-3-642-02962-2_85Authors Yan Chen, Georgia State University Atlanta GA 30302 USAYan-Qing Zhang, Georgia State University Atlanta GA 30302 USA Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5589/2009 Book Rough Sets and Knowledge TechnologyDOI 10.1007/978-3-642-02962-2Print ISBN 978-3-642-02961-5

July 01, 2009 05:45 AM

Automated Grammar Checking of Tenses for ESL Writing

Various word-processing system have been developed to identify grammatical errors and mark learners’ essays. However, they are not specifically developed for Malaysian ESL (English as a second language) learners. A marking tool which is capable to identify errors in ESL writing for these learners is very much needed. Though there are numerous techniques adopted in grammar checking and automated essay marking system, research on the formation and use of heuristics to aid the construction of automated essay marking system has been scarce. This paper aims to introduce a heuristics based approach that can be utilized for grammar checking of tenses. This approach, which uses natural language processing technique, can be applied as part of the software requirement for a CBEM (Computer Based Essay Marking) system for ESL learners. The preliminary result based on the training set shows that the heuristics are useful and can improve the effectiveness of automated essay marking tool for detecting grammatical errors of tenses in ESL writing. Content Type Book ChapterDOI 10.1007/978-3-642-02962-2_60Authors Nazlia Omar, Universiti Kebangsaan Malaysia Faculty of Information Science and Technology 43600 Bangi Selangor MalaysiaNur Asma Mohd. Razali, Universiti Kebangsaan Malaysia Faculty of Information Science and Technology 43600 Bangi Selangor MalaysiaSaadiyah Darus, Universiti Kebangsaan Malaysia Faculty of Information Science and Technology 43600 Bangi Selangor Malaysia Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5589/2009 Book Rough Sets and Knowledge TechnologyDOI 10.1007/978-3-642-02962-2Print ISBN 978-3-642-02961-5

July 01, 2009 05:44 AM

Web Self-Service Will Make You Great

Web Self-Service Will Make You GreatDestination CRMOver the years, chatbot technology has matured and incorporates multilingual natural language processing, enhanced subject-matter expertise and even ...and more »

July 01, 2009 04:13 AM

Say What? ‘Dial Directions’ Acquired By Arabic Language Specialist Sakhr Software

Bet you didn't see this one coming. Back in 2007 we wrote about a service called Dial Directions which lets you call a special phone number and verbally ask for directions, which are immediately sent to you via SMS. Today comes news that the company has been acquired by Sakhr Software , a development house specializing in Arabic natural language processing (NLP). And with their powers combined, ...

July 01, 2009 01:20 AM

Wikimedia Technical Blog

Downtime on en.wikipedia.org resolved

We had 52 minutes of downtime on the English-language Wikipedia site today; only en.wikipedia.org was affected. Our master database server was thrown into a funky state in which hundreds of access threads were stuck in the “statistics” state — which seems to be MySQL’s way of saying “I’ve fallen and I can’t get up”.

It’s unclear exactly what set it off, but basically nothing works until you restart MySQL. After switching the site to an alternate master database, all has been well.

At 52 minutes from start of event, this took us a bit longer than I’d like to resolve — we had to percolate through a couple levels of alert calls before we finished diagnosing it and getting the DB switch pushed through. (Sorry to wake you up early Tim!)

A similar event in future should be fixable within a few minutes, thanks to Tim’s work on making the master-switch system more foolproof. We’re fixing up our internal documentation so all our site ops will now know  how to run the database master switch script next time!

sad-wiki

– brion

July 01, 2009 12:10 AM

June 30, 2009

tesseract-ocr Google Group

Tesseract 2.04 available as download

I have just completed the upload of the tesseract-2.04.tar.gz source
archive, and a corresponding tesseract-2.04.exe.tar.gz containing prebuilt
executables for windows.This source archive corresponds to revision 279 in
svn.
NOTE that the 2008 vcproj and sln files for vc++2008 are obsoleted and not
included in the source archive. The non-numbered files should be used

June 30, 2009 10:50 PM

Icons

Hey guys, could this possibly be used to identify icons on a rather large
resolution CAD drawing?(Rasterized)
It's a symbol that looks like a [T] with diagonal lines in the square....By
chance would I be able to add my own font for recognition? Thanks!

June 30, 2009 10:42 PM

Open Video Conference

OVC Interview with Pirate Bay’s Peter Sunde on Boing Boing

Boing Boing’s Xeni Jardin, who interviewed The Pirate Bay’s co-founder Peter Sunde at the Open Video Conference, has posted a video of the conversation online. Peter made a guest appearance at the conference to share his views on The Pirate Bay trial, copyright, and the future of media. During his talk, Peter promised some shocking news was to come soon, and yesterday The Pirate Bay announced that it was sold to Global Gaming Factory X AB.

Be sure to check out our updated Videos page, featuring talks from the main auditorium. We’ll be adding more videos soon, so stay tuned. We also understand that these videos are available as Flash videos currently, and we’re working on getting Ogg support with our media player.

June 30, 2009 09:55 PM

NLP News

It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates

Rajesh Ranganath, Dan Jurafsky, and Dan McFarland. 2009. It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates. In Proceedings of EMNLP 2009.

June 30, 2009 09:38 PM

Distant supervision for relation extraction without labeled data

Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of ACL-AJCNLP 2009.

June 30, 2009 09:38 PM

tesseract-ocr Google Group

Need some recommendations

Hi guys,

I am new to tessearact and I hope someone can help me or advice me in
the right direction.
My current project involves scanning of medical insurance cards. I
tested about 10 different insurance cards my clients uploaded with
tessearact with the default English language data. However, the

June 30, 2009 09:17 PM

ocropus Google Group

Help needed to install ocropus on ubuntu

apt-get install libavfilter-dev
E: Could not open lock file /var/lib/dpkg/lock - open (13 Permission
denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are
you root?
synamed@ubuntu:~$ sudo apt-get install libavfilter-dev
Reading package lists... Done
Building dependency tree

June 30, 2009 08:56 PM

if:book

run, don't walk

jonathan harris, one of the most brilliant designer/thinkers around has just launched an awesome new project -- Sputnik Observatory.

June 30, 2009 07:18 PM

MusicBrainz Blog

Looking for a new maintainer for pymb2 and libdiscid

Matthias Friedrich (yalaforge) finds himself with little spare time on his hands these days and has asked me to find a new maintainer for the Python MusicBrainz library (pymusicbrainz2) and for the C based libdiscid library. While libdiscid doesn’t require immediate work, the pymb2 library needs to have support for Release Groups (from the 2009-05-24 server update) added in.

Matthias suggested that anyone interested in becoming the maintainer should write a patch to add the needed release groups support. I think this is a good idea — anyone interested?

If so, please post a comment!

And thanks for your hard work on these projects Matthias!

UPDATE: Our Help Wanted page has been updated to reflect our current needs.

June 30, 2009 07:15 PM

IOSA.it

Archaeology and Computing meetings: the "epic fail" year

Every year we try to go at least to one large archaeoinformatics meeting in Europe (other than our Italian workshop). It's a neat way to meet new and old friends, keep ourselves (and our readers) updated about the latest achievements in the field, and let the world know what we have been doing lately, possibly gaining an increasingly wider audience for the whole “Open Archaeology” concept.
This year marks an epic fail in the organization of such conferences/meetings/workshops. Between November, 16th and 18th you can choose where you want to be: Heidelberg or Wien, “Archäologie und Computer” or “Scientific Computing and Cultural Heritage”. Both venues will do because both events have the very same dates, you see.

read more

June 30, 2009 06:59 PM

Zotero: The Next-Generation Research Tool

Follow Libraries and Collections with Feeds

Anyone with a feed reader can now follow public Zotero libraries simply by clicking the feed icon at the right-hand side of the browser address bar. Feeds are generated at the library and collection level, and for group libraries as well as individual libraries. This feature provides a great way for people both inside and outside [...]

June 30, 2009 06:57 PM

Browse Blogs

LinuxCon Program and Event Details Take Shape

This year is moving by quickly and it seems that LinuxCon is now just a few months away.  It has been very exciting watching the event take shape, and I know it will be a success for this, and years to come.  Some of the highlights that I see for this year include:

The speakers: We’ve got the big names, some you see at many other events, and some that are rarely seen.  Check out who is speaking.

June 30, 2009 06:54 PM

Linked Data Blog Aggregator

structWSF: A Framework for Data Mixing

Random Colour Swirl photo courtesy from PD Nathan at Photobucket

Interoperable Naïve Data Structs, Datasets and Canonical RDF

As I noted in my review of SemTech 2009, one of the key themes of the conference was data federation. Unfortunately, data federation has been a term a bit out of vogue for a while. (Though I still think it best captures the space.)

The current vernacular has been pushing forward an alternative: data mixing. One of the larger product pushes at the conference was by Zepheira for its new Freemix service and product. Freemix is a hosted service largely built around the Exhibit data display application, aided by some tools to make creating an exhibit easier. Exhibit is an attractive presentation system; for nearly three years AI3’s own Sweet Tools dataset listing of semantic Web and -related tools has been presented via Exhibit.

Freemix looks promising and is now being offered in beta. But one thing caught my ear when listening to the company’s announcement: they are not yet able and ready to show the “data mixing” part of the system. Its release is apparently being delayed until later this year because of the difficulties encountered.

This post coincides with the release of the alpha version of the structWSF code on the OpenStructs Web site. It is available for download under Apache 2 license. We’ll be blogging a few more times in the coming days regarding other possible uses and applications for this platform-independent Web services framework.

What is Data Mixing and Why is it So Hard?

As a new term there is no “official” definition of data mixing. However, I think we can consider it as generally equivalent to the older data federation concept.

Data federation is the bringing together of data from heterogeneous and often physically distributed data sources into a single, coherent view. Sometimes this is the result of searching across multiple sources, in which case it is called federated search. But it is not limited to search. Data federation is a key concept in business intelligence and data warehousing and a driver behind master data management (MDM).

As I first wrote about data federation about five years ago [1]:

Data federation first became a research emphasis within the biology and computer science communities in the 1980s. At that time, extreme diversity in physical hardware, operating systems, databases, software and immature networking protocols hampered the sharing of data. Yet it is easy to overlook the massive strides in overcoming these obstacles in the past two decades. Climbing the Data Federation Pyramid

The Internet and its TCP/IP and Web HTTP protocols and XML standards in particular, have been major contributors to overcoming respective physical and syntactical and data exchange heterogeneities. The current challenge is to resolve differences in meaning, or semantics, between disparate data sources. Your “glad” may be someone else’s “happy” and you may organize the world into countries while others organize by regions or cultures.

Resolving semantic heterogeneities is also called semantic mediation or data mediation. Though it displays as a small portion of the pyramid above, resolving semantics is a complicated task and may involve structural conflicts (such as naming, generalization, aggregation), domain conflicts (such as schemas or units), or data conflicts (such as synonyms or missing values). Researchers have identified nearly 40 distinct types of possible semantic heterogeneities [2].

Ontologies provide a means to define and describe these different worldviews. Referentially integral languages such as RDF (Resource Description Framework) and its schema implementation (RDF-S) or the Web ontological description language OWL are leading standards among other emerging ones for machine-readable means to communicate the semantics of data.

Fortunately, we have climbed most of this data federation pyramid. The stumbling block now are the semantics. This is made all the harder when we place too much burden on the data transmission or “packet” itself. In other words, does exchange also carry with it the burden of meaning? The rest of this post tries to explain what I mean by this and how it relates to our new structWSF Web services framework.

Is it Apples or Oranges?

Not to pick on any one thing or any individuals, but three recent threads on semantic Web-related mailing lists help illustrate in various ways some interesting mindsets. While there is much on each of these threads of other value, I’m only focusing on a narrow topic from each based on my thesis at hand.

And, what is that thesis? It is simply that we too often mix instance record and attribute assertions with schema representations and world views. And, when we do, we sometimes make mountains out of molehills (or mix apples and oranges to completely mix metaphors).

Example 1: Squeezing RDF into JSON

JSON (JavaScript Object Notation) is a data notation or syntax, easily created and widely used for current Web apps. It has a rather simple syntax for representing attribute-value pairs. Many useful tools and parsers for the serialization exist.

In keeping with his general and broad criticisms of how the semantic Web standards and approaches have been promulgated by the W3C to date, John Sowa most recently expressed his ideas in a posting to the ontolog-forum mailing list under the heading of ‘Semantic Systems’ [3]. In this thread, John proposes:

1. The recommended exchange form for RDF will become JSON. Any JSON documents that are limited to triples can use the old XML-based RDF form, but they can also use the more compact and more general full JSON.

Then, in a subsequent posting to that thread he notes:

5. The W3C made a major blunder with a one-size-fits-all approach that tried to use a document tagging language as a knowledge representation language. The result was the *worst* notation for logic ever invented.

Finally, he goes on to note in a further post:

JSON could be used as an alternative to XML for the syntax, but the lack of a standard semantics for JSON means that it could *not* be used as a replacement for RDF *unless* an official standard were adopted for mapping RDF to and from a particular subset of JSON whose semantics was defined in Common Logic.

All of this John proposes in the spirit of:

The goal of my proposal is nothing less than a total *integration* of the Semantic Web methodologies with the methodologies that have been used in the traditional software development community [3].

I find common ground with a couple of the ideas in this proposal. First, accepted formats like JSON should have a prominent place in data exchange. Second, leveraging methodologies used in the traditional community is definitely a good thing.

But John, while suggesting reuse of existing traditions, is also paradoxically recommending a wholesale replacement for RDF. He is also positing a single exchange standard (JSON). And, he stops tantalizingly short of recognizing an important truth that I’m sure he knows: simple instance record assertions and representations — the essence of data exchange — can and should be viewed separately from schema representations.

As I have noted in my earlier naïve data ’structs’ series, there are in fact scores of existing data transfer formats that have been adopted by their communities — and are likely to remain popular within those communities for some time — that can play a similar role to JSON. So long as the role of data exchange is kept to the assertions (”metadata”) about instances, many formats can play in the sandbox.

The role of RDF may or may not reside with data exchange. To conflate and equate RDF and JSON is to reduce the power of keeping instance record representations separate from schema and world view representations. John’s basic sensibilities, I think, could be more effectively promoted by not posing ‘either-or’ strawmen and recognizing that data exchange formats will ALWAYS be diverse and heterogeneous.

Observation: Existing and emerging data ’structs’ useful to data exchange will remain manifest in format and diversity; data exchange imperatives are a different matter from schema and knowledge representation.

Example 2: RDFa is Not ‘Expressive’ Enough

Somewhat in contrast to this thread was a different one by Martin Hepp, editor of the excellent Good Relations ontology, on the LOD (linked open data) mailing list [4]. This thread, which sensibly questions how difficult it is for mere mortals to configure an Apache server to support publishing RDF, reached further into the realm of RDFa as a document annotation language.

As Hepp states,

The reason is that, as beautiful the idea is of using RDFa to make a) the human-readable presentation and b) the machine-readable meta-data link to the same literals, the problematic is it in reality once the structure of a) and b) are very different. For very simple property-value pairs, embedding RDFa markup is no problem. But if you have a bit more complexity at the conceptual level and in particular if there are significant differences to the structure of the presentation (e.g. in terms of granularity, ordering of elements, etc.), it gets very, very messy and hard to maintain.

Further discussion in this thread elaborates the interest in having the documents in which the RDFa is embedded carry much more schema-level information.

Like the Sowa case, this raises the question of where to draw the line. Should embedded metadata in documents carry complex schema information as well? So, we now shift the focus from data exchange to schema representation.

I think this is really unnecessary since it is quite easy in RDFa to refer to a separately specified schema. By, in this case, conflating metadata transfer and exchange with schema, the bar has been raised unnecessarily high.

If we need to capture schema and world views, fine, let us do so directly and succinctly. Then, let our document metadata (in this case using RDFa) make attribute assertions about that “payload” simply and cleanly. The Web certainly does not need individual documents carrying with them entire schema representational views of the world.

Observation: Data exchange, even based on RDF (via RDFa), is best kept to the assertions of facts and attributes.

Example 3: Mixing Vocabularies

In a microformats context, Thomas Lörtsch posed some questions on mixing vocabularies [5] and how they should be interpreted. This caused an involved discussion of intent and possible implications and best practices, with discussants including Brian Suda, Peter Mika, Ben Ward and others. It also led to the start of a useful wiki page on how objects should be represented in Web pages when multiple microformats can be invoked.

For quite some time microformats, I think, have gotten the “mix” just about right. They have created well-reasoned attributes for distinct instance types and seek to keep their embedding of that information simple in existing documents. Some advocate while others question the rigor of the microformat structure; that is not the topic here.

What is interesting about this thread is that it evolved to discuss the implications and best practices when an author posts a document with more than one microformat. How do these vocabularies relate? How should we, as “consumers” of the document, parse the vocabularies?

Yahoo!’s SearchMonkey service has recognized microformats for some time, and its questions regarding interpretation and best practices in the thread were natural. But the interesting point that seemed to come out of this thread is that users will post microformats as they wish. While care and standards in the design of the microformats can help reduce confusion and conflict, it can not guarantee it. The final responsibility for proper ingest and processing likely resides with the aggregators and publishers that consume such data.

So, here, too, we have another case of asserting metadata and embedding for data exchange in a slightly different native format than RDF. Huzzah!

Observation: Standards setters and consuming agents (often aggregators, publishers or search engines) should take lead responsibility for best practices and processing attribute data, realizing that original authors and developers may not fully comply.

Revisiting the ABox and TBox Split

structWFSThese examples are a bit of a long way around the barn to reinforce what we have been arguing for some time: the need for a proper split between the ABox (assertions related to instances) and the TBox (concept relationships, schema and world views) [6]. This has been a pretty constant theme in our writing, ranging from first introductions, to its relation to description logics, relationships to existing data ’structs’, and explicit discussion of ABox and TBox roles in a four-part series.

One of the key points throughout this writing is that an ABox-TBox mindset provides a context and rigor for looking at questions such as our three examples above. In all three cases, I argue, the seeming conundrums result from lacking this mindset. Once this mindset is applied, the respective roles of various data formats, RDF, schema and the like naturally fall into place.

Of course, the Web is also a dirty and chaotic place where niceties of design and best practices are routinely ignored or unknown or purposefully rejected. So be it. This is reality. This reality needs to be accommodated. But good design can help overcome it and work to establish resilient, flexible architectures.

Of course, even though this might be good design, there is no ability to enforce such distinctions across the Web. However, insofar as key implementators are concerned (standards writers, major publishers, tools developers, industry experts, and the like) we can put in place better approaches. This mantra is at the heart of all that Structured Dynamics does — including the structWSF Web services framework, just released as open source code.

A General Data Mixing Model

So, now we can finally turn our attention to the structWSF Web services framework, more broadly described here.

There are a number of perspectives and contexts to view this structWSF framework. In this posting, we take the boundary conditions of data formats and data exchange [7]. The key question for this perspective is: given the realities noted above, what is an adaptive framework for data mixing on the Web? Our schematic answer to this question is below:

structWSF Data Model Relationships

The basic design has two key data considerations. First, all structWSF tools and Web services and schema work from the canonical RDF data model. It is the hub and common denominator for all structWSF installations. We are able to design and optimize generic tools and services (including converters) around this canonical framework.

Second, we assume most everything in the outside world to be non-compliant with this canonical model, with the data representations often naïve and incomplete. Converters (also known as translators or RDFizers) are an essential bridge to this external world, and need to be designed for re-use and extensibility.

Where the outside world is compliant, they conform to the structWSF APIs or are themselve structWSF installations. In these cases, direct data exchange and access with permission rights occurs at a dataset level (not shown).

The Naïve Part of the Spectrum

Converters are themselves bona fide Web services at the structWSF level. (Only a few are presently included in the alpha release.) While some may be one-off converters (sometimes off-the-shelf RDFizers), and often devoted to large volume external data sources, it is also helpful to emphasize one or more “standard” naïve external formats. A “standard” external format allows for a more sophisticated converter and enables specific tools to be more easily justified around the standard naïve format.

As noted above, this “standard” is often JSON or a derivative of JSON. But, just as readily, the common ‘naïve’ format could be SQL from relational databases or another format common to the community at hand. In many ways, because the emphasis of data exchange is on the ABox and instance records and assertions (and attribute extensions), the actual format and serialization is pretty much immaterial.

Emphasizing one or a few naïve external formats allows more tools and services to be cost-effectively developed for those formats. And, even though the format(s) chosen for this external standard may lack the expressiveness of RDF (and, ultimately, OWL), because the burden is principally related to data exchange, this layer can be readily optimized for the deployment at hand.

Besides import converters it is also important to have export services for the more broadly used naïve external formats. In fact, some structWSF services can be devoted to data cleanup or attribute (property) or object reconciliation (including disambiguation as a possibility). In this manner, structWSF installations could also improve the authority and trustworthiness of standard data in the wild.

Another common service for this naïve data is to give it unique URI identifiers and to make it Web-accessible, thus turning it into linked data.

The RDF Canonical Data Model

Such generic services are possible because the “highest common denominator” for the system is the canonical RDF model. Because it is the consistent basis for tools and services, once a converter is available and the external information schema is mapped to the internal structure, all existing tools and services are available for re-use. Moreover, this system and its datasets are now ready for sharing with other structWSF instances, within the enterprise or beyond.

Thus, we begin to see a network of canonical “hubs” in a sea of heterogeneity, the interoperation of which is facilitated by a structWSF framework at every network node. This design is discussed more in the next part of this series.

Some, such as Sowa noted above, would prefer a grounding in common logic (CL) as opposed to RDF. Our choice to use RDF is based on the simplicity and understandability of the data model, plus the richness of languages and standards from the W3C that surround the framework.

Even here, however, the RDF basis of structWSF need not be the final word. Because of a keen intent to keep all designs and ontologies used by structWSF firmly grounded in description logics, it is possible for the structWSF basis to be converted to other languages and frameworks such as CL that can be expressed in DL.

Bringing it Back to Data Federation

Data mixing — or more preferably, data federation — has as its heart the premise of heterogeneous and distributed data sources. It implicitly acknowledges differences in syntax, semantics and serializations.

The design and architecture of structWSF is similarly premised. While each of us may prefer one model or one format over others, we must interoperate in the real world. And that world, for many understandable and immutable reasons, will retain its diversity. Accepting this reality is a first step to adaptive design.

So, we control what we can control, and we adapt to what else exists. We have chosen RDF as the canonical data model that we can control and have embedded it in a Web services framework that is Web-based and scalable; in other words, a fully compliant Web-oriented architecture. These are the conceptual foundations to structWSF.

To be sure, structWSF in its current alpha release is quite raw in many areas and incomplete in others. But we will continue to work on it — and invite your participation to do the same — such that it can fulfill its destiny as a data federation framework for the Web.


[1] I first wrote about this while at BrightPlanet; a page is still up on that Web site with the text above. I have re-caste this material in various ways since. [2] I have previously written on the “40 sources” of data heterogeneity. See here, for example. [3] See http://ontolog.cim3.net/forum/ontolog-forum/2009-06/msg00210.html and continue to follow the noted thread. [4] See the thread, ‘ .htaccess a major bottleneck to Semantic Web adoption,’ at http://lists.w3.org/Archives/Public/public-lod/2009Jun/0341.html and continue to follow this thread. [5] See http://microformats.org/discuss/mail/microformats-discuss/2009-June/012985.html and continue to follow the ‘mixing vocabularies’ thread. [6] This is our working definition of the ABox and TBox in specific reference to description logics:

“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.” [7] For functionality, download, documentation or other direct materials on structWSF, please see OpenStructs.org and its related resources. There is also a Drupal instantiation of the system called conStruct, also available for download.

June 30, 2009 06:22 PM

Wikimedia Technical Blog

Wikimedia Mobile is Officially Launched

iPhone Version in English

iPhone Version in English

After spending about 6 months in alpha-beta-development-maybe-kind-live mode, we have recently moved Wikipedia Mobile over to a new fast and sexy server. With this new server, we’ve reached the point in development where we can call this baby “launched”!

When I was brought on board at Wikimedia, I was tasked with endowing Wikimedia with a compelling mobile offering. From the beginning, we knew we were going to focus on “fully featured” smart phones. These phones are taking more and more of the market and we believe they will have an easy majority-share in a couple years. The goal is to build for the future.

At the moment, the Mobile site supports iPhone, Kindle, Android, and Palm Pre. And we fully support both English and German. There are other working languages, but they haven’t been fully translated yet. Our goal is to grow slowly and do it really well. We are starting out simple with limited support in order to test the usability and the platform’s stability. So far, things are looking good.

During the beta test period, we’ve served around 10,000,000 pages. You can view the hourly stats here (updated every hour on the hour). And with this new test server, we should be able to do more.

Based off of requests from Google and the Palm Pre folks… and with what just makes sense. We are doing default mobile redirects. That is, if you open a wikipedia link on a supported mobile device, then you get redirected automatically to the mobile gateway. If you click the “View this page on main Wikipedia” then we disable that redirect with a cookie. This way, the 99% of people using mobile devices to read Wikipedia on-the-go have a seemless experience. And, the 1% who like to edit on their mobile device can use their browser to view the main site and do all the fancy things that they like doing. We suspect an initial outcry from the editors that use their mobile devices, but hope that will calm down. We’ve had very good feedback from the 99% and so we can’t forget those folks. If anyone has any suggestions on how to make this easier for the 1% who are editing while mobile, we’d love to hear from you.

If you want live updates about the Mobile site then you can follow WikimediaMobile on Twitter. Also, if you know any Ruby, you can grab the source code via git from Github and helpout! Feel free to contact me via email with any questions.

Also, special thanks to Nic Williams and Ryan Bigg from Mocra for help with the Ruby 1.9 transition and thanks to Yahuda Katz for help with the XML parsing layer and for all his work on the Merb framework.

June 30, 2009 05:57 PM

IOSA.it

SCCH09 -- Scientific Computing & Cultural Heritage

2009-11-16 2009-11-18 Europe/Rome 2009-11-16 2009-11-18 Europe/Rome Location:  Heidelberg, Germany Reference URL:  SCCH09 website

Following the successful 1st SCCH conference in 2007, the Interdisciplinary center for Scientific Computing of the University Heidelberg invites authors and guests to the 2nd SCCH workshop in Heidelberg, Germany, November 16th-18th, 2009. The workshop is endorsed by the German Excellency Initiative and the Heidelberg Graduate School (HGS) and held in conjunction with the Heidelberg Modeling-Day on November 19th.

The main aim of this SCCH Event is to create a forum for discussions between the researchers of humanities and natural sciences as well as cultural heritage institutions. Our mission is to establish and strengthen interdisciplinary relations to provide and develop novel computing tools for experts in cultural heritage. The focus will be on applications for Cultural Heritage as well as the theoretical advances driving them.

Extended Abstract submission deadline: July, 8th 2009.

June 30, 2009 05:52 PM

International Congress "Cultural Heritage and New Technologies" (Workshop "Archäologie & Computer")

2009-11-16 2009-11-18 Europe/Rome 2009-11-16 2009-11-18 Europe/Rome Location:  Wien Reference URL:  Stadtarchäologie Wien

MAIN TOPIC 2009: ARCHIVING - or building an information system

Archiving is today central to nearly all aspects of Cultural Heritage Management.
Archives (Archaeological excavations, libraries, documents, data collections,...) are important data repositories.
The data contained in correctly treated and accessible archives makes wide and varied information available.
How can archiving in all its aspects best promote knowledge about and support the protection and conservation of cultural heritage?

June 30, 2009 05:50 PM

Wikimedia Technical Blog

Firefox 3.5 brings native open video support

Congralutations are in order for our friends and comrades-in-arms at Mozilla: they’ve released version 3.5 of their open-source Firefox browser today.

Aside from major improvements to speed and memory usage, one of the updates that has got us most excited at Wikimedia is the support for HTML 5’s native <video> and <audio> elements.

What does this mean? Well in short, it means that Firefox 3.5 is the best browser to run video and audio clips from Wikimedia Commons on!

File:Apollo_15_feather_and_hammer_drop.ogg

A few months more down the line, we’ll start being able to integrate support for our inline video sequencer, which’ll make it easy to extract snippets of a longer video and combine them — entirely using open-source, non-patent-encumbered web standards. This makes heavy use of the new HTML 5 multimedia support; while at first editing will be limited to Firefox 3.5 users, other browsers are continuing to improve and adopt the same support.

June 30, 2009 05:43 PM

On templates and programming languages

As many folks have noted, our current templating system works ok for simple things, but doesn’t scale well — even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise…

<includeonly><span style="white-space: nowrap;">{{#if:{{{3|}}}|
{{coord|{{{1|0}}}|{{{2|0}}}|{{{3|0}}}|{{{4|N}}}|{{{5|0}}}|{{{6|0}}}|{{{7|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dms}}}|display={{#if:{{{title|}}}|inline,title|inline}} }}| {{#if:{{{2|}}}|
{{coord|{{{1|0}}}|{{{2|0}}}|{{{4|N}}}|{{{5|0}}}|{{{6|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dms}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}| {{#if:{{{4|}}}|
{{coord|{{{1|0}}}|{{{4|N}}}|{{{5|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dec}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}| {{#if:{{{1|}}}|
{{coord|{{{1|0}}}|{{{5|0}}}|{{{9|type:other}}}|format={{{format|dec}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}}}}}}}}}</span></includeonly><noinclude>
{{pp-template|small=yes}}
{{documentation}}
</noinclude>

And we all thought Perl was bad!  ;)

Lua

There’s been talk of Lua as an embedded templating language for a while, and there’s even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An inherent disadvantage is that it’s a fairly rarely-used language, so still requires special learning on potential template programmers’ part.

An implementation disadvantage is that it currently is dependent on an external Lua binary installation — something that probably won’t be present on third-party installs, meaning Lua templates couldn’t be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don’t involve making up our own scripting language (something I’d dearly like to avoid):

PHP

JavaScript

Python

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter?  ;)

– brion

Update:

Hampton reminds me that Ruby has some sandboxing features and may also be a contender.

June 30, 2009 04:43 PM

ubiquity-firefox Google Group

Problem truing to install from source with manage.py

Dear all,
I have downloaded Ubiquity from source using HG to translate po files
in italian language.
I have already installed old version of Ubiquity from source without
problems, but with the last version I received an error trying to
install Ubiquity.

I have Ubiquity in this folder:
d:\git\ubiquity-firefox

June 30, 2009 04:04 PM

Semantic Forms Google Group

BibTeX import

Hi Yaron,

Short question:

Is there an easy way to import serveral hundred BibTeX entries into an
Publication/Author SMW?

I am exploring the XML import from Data Transfer, but rather than
reinventing the wheel, I just wonder if someone has done this before.

I'm running into some language differences too. My SMW is in Dutch and

June 30, 2009 03:33 PM

NLP News

Icml/colt/uai 2009 retrospective

This will probably be a bit briefer than my corresponding NAACL post because even by day two of ICML, I was a bit burnt out; I was also constantly swapping in other tasks (grants, etc.). Note that John has already posted his list of papers.#317: Multi-View Clustering via Canonical Correlation Analysis (Chaudhuri, Kakade, Livescu, Sridharan). This paper shows a new application of CCA to clustering across multiple views. They use some wikipedia data in experiments and actually prove something about the fact that (under certain multi-view-like assumptions), CCA does the "right thing."#295: Learning Nonlinear Dynamic Models (Langford, Salakhutdinov,, Zhang). The cool idea here is to cut a deterministic classifier in half and use its internal state as a sort of sufficient statistic. Think about what happens if you represent your classifier as a circuit (DAG); then anywhere you cut along the circuit gives you a sufficient representation to predict. To avoid making circuits, they use neural nets, which have an obvious "place to cut" -- namely, the internal nodes.#364: Online Dictionary Learning for Sparse Coding (Mairal, Bach, Ponce, Sapiro). A new approach to sparse coding; the big take-away is that it's online and fast.394: MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification (Zhu, Ahmed, Xing). This is a very cute idea for combining objectives across topic models (namely, the variational objective) and classification (the SVM objective) to learn topics that are good for performing a classification task.#393: Learning from Measurements in Exponential Families (Liang, Jordan, Klein). Suppose instead of seeing (x,y) pairs, you just see some statistics on (x,y) pairs -- well, you can still learn. (In a sense, this formalizes some work out of the UMass group; see also the Bellare, Druck and McCallum paper at UAI this year.)#119: Curriculum Learning (Bengio, Louradour, Collobert, Weston). The idea is to present examples in a well thought-out order rather than randomly. It's a cool idea; I've tried it in the context of unsupervised parsing (the unsearn paper at ICML) and it never helped and often hurt (sadly). I curriculum-ified by sentence length, though, which is maybe not a good model, especially when working with WSJ10 -- maybe using vocabulary would help. #319: A Stochastic Memoizer for Sequence Data (Wood, Archambeau, Gasthaus, James, Whye Teh). If you do anything with Markov models, you should read this paper. The take away is: how can I learn a Markov model with (potentially) infinite memory in a linear amount of time and space, and with good "backoff" properties. Plus, there's some cool new technology in there. A Uniqueness Theorem for Clustering Reza Bosagh Zadeh, Shai Ben-David. I already talked about this issue a bit, but the idea here is that if you fix k, then the clustering axioms become satisfiable, and are satisfied by two well known algorithms. Fixing k is a bit unsatisfactory, but I think this is a good step in the right direction.Convex Coding David Bradley, J. Andrew Bagnell. The idea is to make coding convex by making it infinite! And then do something like boosting.On Smoothing and Inference for Topic Models Arthur Asuncion, Max Welling, Padhraic Smyth, Yee Whye Teh. If you do topic models, read this paper: basically, none of the different inference algorithms do any better than the others (perplexity-wise) if you estimate hyperparameters well. Come are, of course, faster though.Correlated Non-Parametric Latent Feature Models Finale Doshi-Velez, Zoubin Ghahramani. This is an indian-buffet-process-like model that allows factors to be correlated. It's somewhat in line with our own paper from NIPS last year. There's still something a bit unsatisfactory in both our approach and their approach that we can't do this "directly."Domain Adaptation: Learning Bounds and Algorithms. Yishay Mansour, Mehryar Mohri and Afshin Rostamizadeh. Very good work on some learning theory for domain adaptation based on the idea of stability.Okay, that's it. Well, not really: there's lots more good stuff, but those were the things that caught my eye. Feel free to tout your own favorites in the comments.

June 30, 2009 01:13 PM

OpenGeoData

Iran maps

All eyes are on Tehran right now. As the center of the Iranian election protests the city has become increasingly important to websites this week. To keep their site up-to-date with this latest crisis area Flickr switched out the Yahoo road Map with Open Street Map. When I heard about this I wondered how [...]

June 30, 2009 12:42 PM

NLP News

Enterprise Search Expert Joins EveryZing Executive Team

Enterprise Search Expert Joins EveryZing Executive TeamLegal Tech Base (press release)The company's core intellectual property and capabilities leverage speech-to-text technology and natural language processing to drive its suite of solutions ...and more »

June 30, 2009 11:58 AM

ubiquity-firefox Google Group

Unable to get command hint after 0.1.5

Before 0.1.5, after trigger the ubiquity, I can get the command hint
like type 'g' will get the hint of the command with 'g' as prefix,
like google and can get the result in the right column.
After 0.1.5, I can not get the column. I have tried on firefox 3.0.11
and 3.5rc3 on ubuntu 9.04, both has this problem.

June 30, 2009 11:01 AM

Google Book Search Blog

New Features on Google Books



Think about how you use a book. You want to read it, sure--but there are a host of other ways for you to interact with the words between the covers. You might want to flip through the pages to find an image. You might want to open right up to the table of contents so you can find your favorite chapter. And you might want to pass it along to a friend so they can have a look at it, too.


Today I'm excited to announce that we're rolling out changes to Google Books that give readers and book lovers everywhere new ways to interact with the words and images contained within the books we've brought online. We've also made it easier for users to share previews of their favorite books on their blogs or websites. Here's a tour of some of the enhancements we've made to the way you search, browse, and share the books that we've digitized:


1. Embeds and links - This new toolbar option allows you to embed a preview of a full view or partner book in any of your websites or blogs--all with a simple html snippet. It's a lot like the embed tag that makes it so easy to share YouTube videos. Programmers comfortable with API tools could accomplish this via our Embedded Viewer API, but this new solution is much easier for everyone to use. You can also choose to grab a URL link to email or IM to friends that takes them to the same book and page on Google Books. For readers, this means they can more easily share pages from books you love, while publisher partners can gain even more awareness across the web to promote their books.



2. Better search within each book - You've always been able to search inside books you find on Google Books. Now, for public domain and partner books, we've made it easier to see exactly where your search term appears within the book by showing you more context around the term, including an image from the part of the page on which it appears. You can click on those images to navigate directly to the pages inside the book. You can also sort your search results by relevance in addition to page order in the book or magazine.



In the search results bar, you'll find 'Previous' and 'Next' buttons that allow you to browse through search hits quickly and easily.



3. Thumbnail view - Click on the thumbnail view button in the toolbar to see an overview of all the pages in a public domain book or in a magazine. Clicking on a thumbnail image will take you to that page in the reading view (available for "full view" books).



4. Contents drop-down menu - Above the book itself, you'll find a Contents drop-down that allows you to jump to chapters within the book--or articles within a magazine. (In case you're wondering, we built this using the same structure extraction technology that supports our mobile version of Google Books.)



5. Plain Text Mode - We've made it easier to find our plain text versions of public domain books. If a book is available in full view, you can click the 'Plain text' button in the toolbar to see our HTML version of the text (derived via OCR for full view books). This is especially useful for visually impaired Google users, who can use this format for text-to-speech and other types of software.



6. Page Turn Button and Animation - In addition to scrolling through the book, you can now also click the page turn button at the bottom of the screen, even if you haven't yet finished the page. An animated line moves with the page turn to make it easier to keep track of your location in the text.



7. Improved Book Overview Page - On the Overview page you'll find an assortment of useful data about the book, including reviews, ratings, summaries, related books, key words and phrases, references from the web, places mentioned in the book, publisher information, etc.



We hope that you enjoy these improvements to Google Books. As always, feel free to provide feedback. Happy reading!

June 30, 2009 11:29 AM

NLP News

Analyzing online content with OpenAmplify

Analyzing online content with OpenAmplifyNetworkWorld.comA new service called OpenAmplify published by Hapax LLC uses a "patented Natural Language Processing technology" which analyzes every word used in a piece ...and more »

June 30, 2009 10:24 AM

OpenGeoData

The future of mapping

Caught this on BBC news being shown on the Heathrow Express a few weeks ago. It’s the future.

June 30, 2009 09:59 AM

Open Access News

Reading the ground tremors

Michael Nielsen, Is scientific publishing about to be disrupted?  Michael Nielsen, June 29, 2009.  Excerpt:

...Today, scientific publishers are production companies, specializing in services like editorial, copyediting, and, in some cases, sales and marketing. My claim is that in ten to twenty years, scientific publishers will be technology companies....That is, their foundation will be technological innovation, and most key decision-makers will be people with deep technological expertise. Those publishers that don’t become technology driven will die off.

Predictions that scientific publishing is about to be disrupted are not new....

[Let me] draw your attention to a striking difference between today’s scientific publishing landscape, and the landscape of ten years ago. What’s new today is the flourishing of an ecosystem of startups that are experimenting with new ways of communicating research, some radically different to conventional journals. Consider Chemspider, the excellent online database of more than 20 million molecules, recently acquired by the Royal Society of Chemistry. Consider Mendeley, a platform for managing, filtering and searching scientific papers, with backing from some of the people involved in Last.fm and Skype. Or consider startups like SciVee (YouTube for scientists), thePublic Library of Science, the Journal of Visualized Experiments, vibrant community sites like OpenWetWare and the Alzheimer Research Forum, and dozens more. And then there are companies like Wordpress, Friendfeed, and Wikimedia, that weren’t started with science in mind, but which are increasingly helping scientists communicate their research. This flourishing ecosystem is not too dissimilar from the sudden flourishing of online news services we saw over the period 2000 to 2005....

Scientific publishers should be terrified that some of the world’s best scientists, people at or near their research peak, people whose time is at a premium, are spending hundreds of hours each year creating original research content for their blogs, content that in many cases would be difficult or impossible to publish in a conventional journal. What we’re seeing here is a spectacular expansion in the range of the blog medium. By comparison, the journals are standing still.

This flourishing ecosystem of startups is just one sign that scientific publishing is moving from being a production industry to a technology industry. A second sign of this move is that the nature of information is changing. Until the late 20th century, information was a static entity. The natural way for publishers in all media to add value was through production and distribution, and so they employed people skilled in those tasks, and in supporting tasks like sales and marketing. But the cost of distributing information has now dropped almost to zero, and production and content costs have also dropped radically. At the same time, the world’s information is now rapidly being put into a single, active network, where it can wake up and come alive. The result is that the people who add the most value to information are no longer the people who do production and distribution. Instead, it’s the technology people, the programmers....

How many scientific publishers are run by people who know the difference between an INNER JOIN and an OUTER JOIN? Or who know what an A/B test is? Or who know how to set up a Hadoop cluster? Without technical knowledge of this type it’s impossible to run a technology-driven organization. How many scientific publishers are as knowledgeable about technology as Steve Jobs, Sergey Brin, or Larry Page?

I expect few scientific publishers will believe and act on predictions of disruption. One common response to such predictions is the appealing game of comparison: “but we’re better than blogs / wikis / PLoS One / …!” These statements are currently true, at least when judged according to the conventional values of scientific publishing. But they’re as irrelevant as the equally true analogous statements were for newspapers. It’s also easy to vent standard immune responses: “but what about peer review”, “what about quality control”, “how will scientists know what to read”. These questions express important values, but to get hung up on them suggests a lack of imagination much like Andrew Rosenthal’s defense of the New York Times editorial page. (I sometimes wonder how many journal editors still use Yahoo!’s human curated topic directory instead of Google?) In conversations with editors I repeatedly encounter the same pattern: “But idea X won’t work / shouldn’t be allowed / is bad because of Y.” Well, okay. So what? If you’re right, you’ll be intellectually vindicated, and can take a bow. If you’re wrong, your company may not exist in ten years. Whether you’re right or not is not the point. When new technologies are being developed, the organizations that win are those that aggressively take risks, put visionary technologists in key decision-making positions, attain a deep organizational mastery of the relevant technologies, and, in most cases, make a lot of mistakes. Being wrong is a feature, not a bug, if it helps you evolve a model that works: you start out with an idea that’s just plain wrong, but that contains the seed of a better idea. You improve it, and you’re only somewhat wrong. You improve it again, and you end up the only game in town. Unfortunately, few scientific publishers are attempting to become technology-driven in this way. The only major examples I know of are Nature Publishing Group (with Nature.com) and the Public Library of Science. Many other publishers are experimenting with technology, but those experiments remain under the control of people whose core expertise is in others areas....

Here’s a list of services I expect to see developed over the next few years....

June 30, 2009 10:56 AM

OpenGeoData

Ubiquitous Geocontext

Saw this in Italy – a car park with little LEDs above each space showing green if the space is free, red if not. So you can drive around the aisles of cars and easily see if there is a space. Now for integration with your TomTom…

June 30, 2009 09:54 AM

Open Access News

Harvesting ProQuest metadata for an ETD repository

Shawn Averkamp and Joanna Lee, Repurposing ProQuest Metadata for Batch Ingesting ETDs into an Institutional Repository, code{4}lib, June 26, 2009.  (Thanks to Charles Bailey.) 

Abstract:   This article describes the workflow used by the University of Iowa Libraries to populate their institutional repository and their catalog with the data collected by ProQuest UMI Dissertation Publishing during the submission of students’ theses and dissertations. Re-purposing the metadata from ProQuest allowed the University of Iowa Libraries to streamline the process for ingesting theses and dissertations into their institutional repository. The article includes a discussion of the benefits and limitations of the workflow described.

June 30, 2009 10:17 AM

Another new OA publisher

Open Access Publications (OAP) is a new OA journal publisher.  (Thanks to Jim Till.) 

OAP will allow authors to retain copyright.  Though it doesn't indicate what license it will use, it will offer libre OA, allowing "any third party the right to download, print out, extract, archive, and distribute the article as long as its integrity is maintained and its original authors, citation details and publisher are identified."  It will charge a publication fee of £499.

OAP's first journal is Single Cell Analysis, whose inaugural issue is still forthcoming.

June 30, 2009 10:06 AM

information aesthetics

Cykelbarometer: Public Copenhagen Urban Bicycle Counter

copenhagen_bicycle_counter.jpg
The City of Copenhagen recently launched a public bicycle counter [copenhagenize.com], completely equipped with an air pump for the convenience of cyclists. The urban display counts the daily number of cyclists that use the new Green Path that slices diagonally across the Copenhagen and Frederiksberg pathway system. There is a 'sensor line' in the asphalt on the bike lane a few metres in front of the counter which registers the cyclists, probably via a motion sensor.

The idea is to encourage more people to ride by showing how many are using it. The numeric displays show the total so far today and this year. On the barometer-styled display, the left side will show last years' total.

The #500,000s cyclist passing by will get a fancy new bike.

Image taken from Matt Blackett at Flickr.


June 30, 2009 07:23 AM

Communicating the Noise Levels Caused by Heathrow Airport

heathrow_noise.jpg
The book Unseen Networks of Heathrow Airport [iancarr.net] is accompanied by a small collection of simple data visualization posters that illustrate the noise levels at different locations in and around Heathrow airport buildings and boundaries. The "Spheres" poster portrays noise levels recorded at a Sound levels (dB) location in and around Heathrow Airport by colored circles. The "Matrix" visualization displays the link between each Sound levels (dB) location and the type of sound recorded over two minutes. In "Density", each column portrays noise levels and each line represents when the noise level reached above the average of 57 dB. The "Map" poster shows areas with a specific dB level at Heathrow Airport runway 1 and 2.

Thnkx Ian.


June 30, 2009 07:05 AM

natural language processing blog

ICML/COLT/UAI 2009 retrospective

This will probably be a bit briefer than my corresponding NAACL post because even by day two of ICML, I was a bit burnt out; I was also constantly swapping in other tasks (grants, etc.). Note that John has already posted his list of papers.

  1. #317: Multi-View Clustering via Canonical Correlation Analysis (Chaudhuri, Kakade, Livescu, Sridharan). This paper shows a new application of CCA to clustering across multiple views. They use some wikipedia data in experiments and actually prove something about the fact that (under certain multi-view-like assumptions), CCA does the "right thing."
  2. #295: Learning Nonlinear Dynamic Models (Langford, Salakhutdinov,, Zhang). The cool idea here is to cut a deterministic classifier in half and use its internal state as a sort of sufficient statistic. Think about what happens if you represent your classifier as a circuit (DAG); then anywhere you cut along the circuit gives you a sufficient representation to predict. To avoid making circuits, they use neural nets, which have an obvious "place to cut" -- namely, the internal nodes.
  3. #364: Online Dictionary Learning for Sparse Coding (Mairal, Bach, Ponce, Sapiro). A new approach to sparse coding; the big take-away is that it's online and fast.
  4. 394: MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification (Zhu, Ahmed, Xing). This is a very cute idea for combining objectives across topic models (namely, the variational objective) and classification (the SVM objective) to learn topics that are good for performing a classification task.
  5. #393: Learning from Measurements in Exponential Families (Liang, Jordan, Klein). Suppose instead of seeing (x,y) pairs, you just see some statistics on (x,y) pairs -- well, you can still learn. (In a sense, this formalizes some work out of the UMass group; see also the Bellare, Druck and McCallum paper at UAI this year.)
  6. #119: Curriculum Learning (Bengio, Louradour, Collobert, Weston). The idea is to present examples in a well thought-out order rather than randomly. It's a cool idea; I've tried it in the context of unsupervised parsing (the unsearn paper at ICML) and it never helped and often hurt (sadly). I curriculum-ified by sentence length, though, which is maybe not a good model, especially when working with WSJ10 -- maybe using vocabulary would help.
  7. #319: A Stochastic Memoizer for Sequence Data (Wood, Archambeau, Gasthaus, James, Whye Teh). If you do anything with Markov models, you should read this paper. The take away is: how can I learn a Markov model with (potentially) infinite memory in a linear amount of time and space, and with good "backoff" properties. Plus, there's some cool new technology in there.
  8. A Uniqueness Theorem for Clustering Reza Bosagh Zadeh, Shai Ben-David. I already talked about this issue a bit, but the idea here is that if you fix k, then the clustering axioms become satisfiable, and are satisfied by two well known algorithms. Fixing k is a bit unsatisfactory, but I think this is a good step in the right direction.
  9. Convex Coding David Bradley, J. Andrew Bagnell. The idea is to make coding convex by making it infinite! And then do something like boosting.
  10. On Smoothing and Inference for Topic Models Arthur Asuncion, Max Welling, Padhraic Smyth, Yee Whye Teh. If you do topic models, read this paper: basically, none of the different inference algorithms do any better than the others (perplexity-wise) if you estimate hyperparameters well. Come are, of course, faster though.
  11. Correlated Non-Parametric Latent Feature Models Finale Doshi-Velez, Zoubin Ghahramani. This is an indian-buffet-process-like model that allows factors to be correlated. It's somewhat in line with our own paper from NIPS last year. There's still something a bit unsatisfactory in both our approach and their approach that we can't do this "directly."
  12. Domain Adaptation: Learning Bounds and Algorithms. Yishay Mansour, Mehryar Mohri and Afshin Rostamizadeh. Very good work on some learning theory for domain adaptation based on the idea of stability.
Okay, that's it. Well, not really: there's lots more good stuff, but those were the things that caught my eye. Feel free to tout your own favorites in the comments.

June 30, 2009 07:13 AM

NLP News

Sakhr Software Acquires Dial Directions

Sakhr Software Acquires Dial DirectionsTMCnetFor nearly two decades, Sakhr Software has been pioneering and developing the world's richest knowledge base for Arabic natural language processing (NLP). ...and more »

June 30, 2009 04:42 AM

Sakhr Software Acquires Dial Directions

WASHINGTON & SAN FRANCISCO----Sakhr Software, a global leader in advanced Arabic speech and language solutions, today announced that it has acquired Dial Directions, a leading provider of voice-entry technology for mobile devices and services.

June 30, 2009 04:19 AM

Open Access News

More on the U. Kansas OA policy

A Web version of the text of the University of Kansas' new OA policy confirms what I'd suspected in my last post: that the policy as passed doesn't contain an OA mandate. It commits the university to OA, gives the university permission to provide OA to its faculty's research via the IR, and establishes a task force to work out the details -- including the details of how the manuscripts will get into the IR.

See also: Chad Lawhorn, KU plans to be first public university library to allow free online access to researchers’ work, KTKA, June 26, 2009.

... Members of the KU faculty proposed the “open access” policy, and believe that it will put KU on the leading edge of emerging trend in how scholarly research is disseminated. ...

And once the system is fully functioning, KU leaders hope it will provide some interesting reading for the general public.

“We think one of the benefits is that this won’t just be for the research community, but even for lay people,” [Dean of Libraries Lorraine] Haricombe said.

June 30, 2009 03:50 AM

Updates on FRPAA

What's new with the Federal Research Public Access Act (FRPAA) since our last post:

Things to watch:

June 30, 2009 03:36 AM

NLP News

Silicon Valley should step up, help Iranians

Silicon Valley should step up, help IraniansSan Francisco ChronicleGoogle took things to an entirely new level by launching its Persian version of Google translate, which allows for decent machine translation between ...and more »

June 30, 2009 01:58 AM

June 29, 2009

Wikimedia Technical Blog

Blog Downtime

I am sure that many folks noticed that on the morning of 2009-06-26, techblog.wikimedia.org and blog.wikimedia.org went down.  It turns out that some of the parts of our Wordpress installations were compromised.  I do not want to get in to a direct show and tell of what they did, but hopefully we have hardened the installation to the point that it will not occur again.

This is why the blogs exist on their own server, so when things like this happen we can minimize the impact.  The blogs are both up and running now, along with the other services that were affected.  All but techblog was back online before Friday was over, techblog lagged behind until today.  (As techblog was the point of exploit, we got everything else back up first.)  Other affected services were the Open Conference Systems site for Wikimania 2009, as well as our survey software.  Both of those were back online ASAP after the incident and the rest followed after.

Of course, it was hard to get this information out to folks when the blogs were down!  It goes to show how easily using the blogs to get info out has been, since without it we had to scramble to get the information out of other channels.

Thanks to everyone who assisted in the restoration, and also thanks to everyone for their patience while the system was fixed.

June 29, 2009 10:00 PM

EFF.org Updates

Help Protesters in Iran: Run a Tor Bridge or a Tor Relay

As turmoil over the disputed election in Iran continues, many techs are trying to find ways to help Iranian citizens safely communicate and receive information despite the barriers being established by Iranian authorities. One tactic that even moderately tech-savvy Internet users can employ is to set up a Tor relay or a Tor bridge.

More sophisticated users can skip this paragraph, but for the rest, here's the basic outline. Tor (an acronym of "The Onion Router") is free and open source software that helps users remain anonymous on the Internet. Normally, when accessing websites, your computer asks for and receives a webpage out in the open, a process that exposes your IP address, the URL of the website, and the contents of the site, among other information to third parties. When accessing websites while using Tor, your computer essentially whispers its requests for a website, to another computer, which passes the request on to another computer, which passes it on to another computer, which passes it onto the computer where the website is hosted; the reply returns in the same, chain-message manner. The whispers are encrypted, so that neither outside authorities, nor the computers in the middle of the chain, can tell what is being said, and to whom. And the website itself does not have your IP address either.

Internet users in Iran are using Tor to both (a) circumvent censorship systems and (b) remain anonymous while reading and writing on the Internet. Both are critically important to the safety of protesters, many of whom fear retaliation from the government. Preliminary reports indicate that use of the Tor client in Iran has increased in the days after the contested election.

However, Tor's design relies on a robust network of "volunteer computers" (a.k.a. relays) to pass messages back and forth. This means that the speed and quality of a Tor users' browsing experience relies extensively on the number of volunteer computers there are to pass messages along. This is where volunteers can make a difference -- setting up additional relays improves access for dissident Iranians and other users of the Tor network. The more people who help out, the better and more quickly the network runs. If you're interested in helping out, find and follow instructions for configuring a Tor relay on the Tor website.

Those looking to help fight censorship should also consider providing a Tor bridge. Bridges come into play when an ISP decides to try blocking users' access to the Tor network. (For now, there seems to only be anecdotal evidence of Iran attempting to block the use of Tor. However, Iran has recntly been practicing reactive and centralized blocking, which makes any effective block of Tor far more likely.) The Tor bridge configuration differs from a relay in that your computer does not appear in the public Tor network. Instead, users looking for access to the Internet through Tor can receive your Tor routing information through more private channels, then configure their Tor client to transmit requests through your computer. By not appearing in the public Tor network, your Tor routing information is less likely to end up on an ISP filter and can provide help for a longer period of time -- but recognize that the network needs both relays and bridges.

Tor provides strong protections for its users, but if you plan to use it to access the Net, take time to fully understand its limitations. Check the Tor "Warning" section for more information. You should also consider any limitations that may exist in your arrangement with your ISP.

If you have other questions about setting up a Tor bridge or relay, please check the Running a Tor relay FAQ page. For other concerns, The Onion Router Wiki may help.

For understanding the technical conditions of the Iranian Internet, we have found the Open Network Initiative's ongoing research, Arbor Network's network analyses, and the Tor Project's own blog status reports to be informative.

June 29, 2009 08:44 PM

ubiquity-firefox Google Group

Ubiquity Herd

For some reason, on Ubiquity Herd, the feed list does not show up.
Any help with that?

June 29, 2009 08:35 PM

The FRBR Blog

Tillett, Sharing Standards for Bibliographic Data Worldwide

Catching up on something from last month: Sharing Standards for Bibliographic Data Worldwide: An Overview of Changes in Cataloguing Practices, a talk by Barbara Tillett at the Atlantic Provinces Library Association Conference 2009 in Halifax, Nova Scotia.

Built on foundations established by the Anglo-American CataloguingRules (AACR), RDA (Resouce Description and Access) will provide a comprehensive set of guidelines and instructions on resource description and access covering all types of content and media. The new standard is being developed for use primarily in libraries, but consultations are being undertaken with othercommunities (archives, museums, publishers, etc.) in an effort to attain an effective level of alignment between RDA and the metadata standards used in those communities, increasing the ability to share metadata among diverse communities. Cataloguers aren’t the only professionals who will be affected by these new rules. Increasing the ability to share metadata outside of our own organizations and changing description and access rules will impact the entire information profession. Along with providing an overview of RDA and its underlying conceptual model (FRBR- Functional Requirements for Bibliographic Records), examples of how FRBR can benefit circulation, reference and serials will be explored.

Laurel Tarulli says it was a very good talk:

Not only did she explain RDA and FRBR in a way that made complete sense (and I’ve been to other RDA sessions), but she also touched on how this is something the entire profession needs to be paying attention to, not just cataloguers. This is interesting because, up until now, many librarians have brushed it aside as a cataloguing issue. Not so! How information is retrieved, what it will retrieve and how it is presented will all change. The relationship gathering is what really excites me. And, it should excite all librarians in and out of the cataloguing department.

June 29, 2009 07:08 PM

ubiquity-firefox Google Group

undo command

hi,
i don't get the undo command, it doesn't seem to work for me.
whenever i make a change in the page and then try to use the undo, i
get an alert: "you're not in a rich text editing field"

i am using ubiquity 0.1.8

thanks
Asaf

June 29, 2009 07:03 PM

Updated release plans

Hello!
Firefox 3.5 comes out Tuesday (Tomorrow!!). All of Mozilla's launch
power is going to be going towards that. So we're pushing back the
Ubiquity 0.5 release date until Thursday, July 2nd. That will give us
a little longer to test out the localized versions, and to try to find
solutions to the install problems

June 29, 2009 06:54 PM

NLP News

PyGirl: Generating Whole-System VMs from High-Level Prototypes Using PyPy

Virtual machines (VMs) emulating hardware devices are generally implemented in low-level languages for performance reasons. This results in unmaintainable systems that are difficult to understand. In this paper we report on our experience using the PyPy toolchain to improve the portability and reduce the complexity of whole-system VM implementations. As a case study we implement a VM prototype for a Nintendo Game Boy, called PyGirl, in which the high-level model is separated from low-level VM implementation issues. We shed light on the process of refactoring from a low-level VM implementation in Java to a high-level model in RPython. We show that our whole-system VM written with PyPy is significantly less complex than standard implementations, without substantial loss in performance. Content Type Book ChapterDOI 10.1007/978-3-642-02571-6_19Authors Camillo Bruni, University of Bern Software Composition Group SwitzerlandToon Verwaest, University of Bern Software Composition Group Switzerland Book Series Lecture Notes in Business Information ProcessingOnline ISSN 1865-1356Print ISSN 1865-1348 Book Series Volume Volume 33 Book Objects, Components, Models and PatternsDOI 10.1007/978-3-642-02571-6Online ISBN 978-3-642-02571-6Print ISBN 978-3-642-02570-9 Book Part Part 7

June 29, 2009 05:40 PM

Open Access News

OA to government statistics

Siu-Ming Tam, Informing The Nation – Open Access To Statistical Information In Australia, March 18, 2009.  A presentation at the UNECE Work Session on the Communication and Dissemination of Statistics (Warsaw, May 13-15, 2009).  (Thanks to Anne Fitzgerald.)  Excerpt:

...3. In 2005, the Australian Government released cost recovery guidelines...[requiring] fees and charges set by Government agencies to reflect the costs of producing and providing the products and services....

5. In...June 2005 the [Australian Bureau of Statistics (ABS)] sought and obtained additional funding from the Australian Government for free access to ABS publications on its website. In December 2005, the Minister made the announcement, in an event to mark the centenary for the establishment of the ABS, that as a centenary tribute to the people of Australia, all ABS statistical output on the web site would be made free of charge.

6. The recent advent of Web 2.0 technologies increases the potential to use, share and 'mix and match' ABS data sets to add value to ABS information. 'Mash ups' are an excellent example of how the value of a product may be significantly enhanced by including different layers of information with statistical information. To facilitate this, and other innovative uses of ABS data, the ABS needs to have an internationally recognised licensing framework for accessing, using and reusing its statistical information.

7. In December 2008, ABS introduced Creative Commons licensing by adopting the Attribution 2.5 Australia licence for its materials contained in the ABS website.” ...

Also see Marc Debusschere, Dissemination Policies in the ESS, from the proceedings of the same conference.  Excerpt:

...27. The results of the survey show that all countries have well-established practices for disseminating statistical data, which for the larger part are disseminated for free; the most common exceptions are tailor-made data sets, microdata and paper publications....

29. ...[A] single policy document which coherently spells out dissemination principles is still absent in many countries. Specific dissemination conditions and procedures can, as a rule, be found on an ad hoc basis in many different places, but not bundled together in one place, on the web site or in a document.

30. The overview shows very markedly that policies, some times implicit ones, are quite similar across the [European Statistical system (ESS)]. The summary of current principles and practices of [National Statistical Institutes] could constitute a first outline of a basic 'Dissemination Policy Charter' for the European Statistical System:

June 29, 2009 05:55 PM

New OA publisher

PAGEPress is an apparently new publisher of OA journals in biomedicine.  It's based in Italy, a brand of MeditGroup.

The PP journals charge a publication fee, which for 2009 is 500 Euros/article.  However, PP explains that "the ability of authors to pay publication charges will never be a consideration in the decision as to whether to publish."

PP says its uses CC-BY licenses.  But when it spells out what it means, it describes a CC-BY-NC license and links to one.  However, the sample article I looked at used a CC-BY license.

The site lists 17 journals in medicine and biology.  When I clicked through on each one, I found that 8 were operational, with published content (most still on their inaugural issue), and 9 were still on the drawing boards.

June 29, 2009 05:54 PM

blog.aksw.org

The Road to OntoWiki 1.0

As our new APIs slowly become stable it is time to annouce some of the new features. OntoWiki 1.0 will have an enhanced plug-in architecture and a lot of APIs that allow you to customize the user interface. In addition to plug-ins which have been around for some time, there are three new extension types:

In short, components are pluggable controllers, modules are those little boxes OntoWiki has been using for some time now and wrapper are extension for extracting triples from external sources and importing them into your knowledge base.

One wrapper that ships with OntoWiki can e. g. load triples from Linked Data-enabled endpoints like Sindice or DBpedia. Among others, this feature is demonstrated in the new screencast. But the wrapper is only a part of OntoWiki’s new Linked Data enhancements. A plug-in publishes resources as linked data, provided their URI shares the prefix with the domain the respective OntoWiki installation runs under and the named graph is readable.

Besides extensibility, improvements in performance was one of the key goals for OntoWiki 1.0. Thus, it was only natural to make OntoWiki work with Virtuoso, one of the fastest RDF triple stores around. Besides Virtuoso, MySQL is still supported as well. It underwent, however, serious refactoring and is now based around ZendDb instead of ADOdb as its abstraction layer.

A preliminary version (OntoWiki 0.9) will be released in the coming weeks. It is based on the 1.0 code base but doesn’t contain all the features we’d like to include in 1.0.

June 29, 2009 04:42 PM

Linked Data Blog Aggregator

Virtuoso loads 110,500 triples-per-second on LUBM 8000

LUBM load speed still seems to be a metric that is quoted in comparisons of RDF stores. Consequently, we too measured the load time of LUBM 8000, 1,068-million triples, on the newest Virtuoso.

The real time for the load was 161m 3s. The rate was 110,532 triples-per-second. The hardware was one machine with 2 x Xeon 5410 (quad core, 2.33 GHz) and 16G 6667 MHz RAM. The software was Virtuoso 6 Cluster, configured into 8 partitions (processes) — one partition per CPU core. Each partition had its database striped over 6 disks total; the 6 disks on the system were shared between the 8 database processes.

The load was done on 8 streams, one per server process. At the beginning of the load, the CPU usage was 740% with no disk; at the end, it was around 700% with 25% disk wait. 100% counts here for one CPU core or one disk being constantly busy.

The RDF store was configured with the default two indices over quads, these being GSPO and OGPS. Text indexing of literals was not enabled. No materialization of entailed triples was made.

In comparison, Bigdata reported 200K triples-per-second for the first 8000 LUBM universities on a 15 blade box. We expect to do about that much on one new dual Xeon board; we’ll publish this when this is done.

We think that LUBM loading is not a realistic benchmark for the world but since other people publish such numbers, so do we.

June 29, 2009 04:12 PM

tesseract-ocr Google Group

Compressing a sequence of spaces

Tesseract is compressing a sequence of spaces in an input TIFF into a
single space in the output text. I want to preserve the original
spaces.

Tesseract 2.03
Debian 4 (2.6.18-5-686 kernel)
libtiff-tools
libtiff-dev

I'd appreciate any advice.

Thanks,
Rob

June 29, 2009 03:43 PM

Open Access News

Swords and plowshares: harvesting online knowledge

Mark Rutherford, Reading machine to snoop on Web, CNet News, June 27, 2009.  (Thanks to ResourceShelf.)

What if the wisdom of Web could be yours, without having to read through it one page at a time? That's what the military wants.

DARPA has hired a company to develop a reading machine to reduce the gap between the ever increasing mountain of digitized text and the intelligence community's insatiable appetite for data input.

BBN Technologies was awarded the $29.7 million contract to develop a universal text engine capable of capturing knowledge from written matter and rendering it into a format that artificial intelligence systems (AI) and human analysts can work with. (PDF)

The military will use the Machine Reading Program, as it's officially called, to automatically monitor the technological and political activities of nation states and transnational organizations --which could mean everything from al-Qaeda to the U.N....

BBN also expects the program to enable a plethora of new civilian applications, everything from intelligent bots to personal tutors. The system could provide unprecedented access and automated analysis of the world's libraries, allowing for vastly expanded cultural awareness and historical research....

BBN already offers a broadcast monitoring system that automatically transcribes real-time audio stream and translates it into English, creating a continuously updated, searchable archive of international television broadcasts....

Update.  Also see our past posts on open source intelligence.

June 29, 2009 04:25 PM

if:book

please discuss

In an as yet unpublished manuscript, historian Marshall Poe writes: "A book is a machine for focusing attention; the Internet is machine for diffusing it." I can see how he gets there, particularly if it's a P-book rather than an E-book, but it raises a bunch of interesting questions. If true, what are the implications . . . . ?

June 29, 2009 02:26 PM

Open Access News

No OA impact advantage seen in ophthalmology

V.C. Lansingh and M.J. Carter, Does Open Access in Ophthalmology Affect How Articles are Subsequently Cited in Research? Ophthalmology, June 20, 2009.  The article doesn't yet appear at the journal site, so I've linked to the abstract in PubMed.  Abstract:

OBJECTIVE: To determine whether the concept of open access affects how articles are cited in the field of ophthalmology.

DESIGN: Type of meta-analysis.

PARTICIPANTS: Examination of 480 articles in ophthalmology in the experimental protocol and 415 articles in the control protocol.

METHODS: Four subject areas were chosen to search the ophthalmology literature in the PubMed database using the terms "cataract," "diabetic retinopathy," "glaucoma," and "refractive errors." Searching started in December of 2003 and worked back in time to the beginning of the year. The number of subsequent citations for equal numbers of both open access (OA) and closed access (CA) (by subscription) articles was quantified using the Scopus database and Google search engine. Number of authors, article type, country/region in which the article was published, language, and funding data were also collected for each article. A control protocol was also carried out to ascertain that the sampling method was not systematically biased by matching 6 ophthalmology journals (3 OA, 3 CA) using their impact factors, and employing the same search methodology to sample OA and CA articles.

MAIN OUTCOME MEASURES: Number of citations.

RESULTS: The total number of citations was significantly higher for open access articles compared to closed access articles for Scopus (mean 15.2 versus 11.5, P < .0005, Mann-Whitney U = 20029, and Google (mean 6.4 versus 4.0, P < .0005, Mann-Whitney U = 21281). However, univariate general linear model (GLM) analysis showed that access was not a significant factor that explained the citation data. Author number, country/region of publication, subject area, language, and funding were the variables that had the most effect and were statistically significant. Control protocol results showed no significant difference between open and closed access articles in regard to number of citations found by Scopus: open access: mean = 17.8; SD (standard deviation) = 23.70; closed access: mean = 19.1; SD = 20.31; Mann-Whitney test, P = 0.730, Mann-Whitney U = 20584.

CONCLUSIONS: Unlike other fields of science, open access thus far has not affected how ophthalmology articles are cited in the literature.

June 29, 2009 02:40 PM

Open Knowledge Foundation Blog

Open Database License (ODbL) v1.0 Released

Open Data Commons have released v1.0 of the Open Database License (ODbL), a share-alike license for data and databases. This is really big news for anyone working on open data as there are very few open data licenses available and none that provide for share-alike. From the announce: We are delighted to announce the release of v1.0 [...]

June 29, 2009 12:02 PM

Open Access News

New OA journal on virology

Viruses is a new peer-reviewed OA journal published by MDPI.  The inaugural issue (June 2009) is now online.

June 29, 2009 01:00 PM

BioMed Central

Data publication and openness in the scientific community

Data publication: towards a database of everything, a Commentary article published in BMC Research Notes, discusses the changing nature of data publication, the challenges that face the Open Science movement, and why the publication of primary scientific data is important to us all.

BioMed Central has pioneered the open access publishing model and there has been rapid movement in the field of research publishing in the last few years, with open access publishing now firmly in the mainstream. The aim of BMC Research Notes is to reduce the loss suffered by the research community when results remain unpublished because they do not form a sufficiently complete story to justify the publication of a full research article. A key objective of the journal is to ensure that associated data sets are published in standard, reusable formats whenever possible, and are exposed to ensure that they are searchable and easily harvested for reuse.

This short Commentary article by Vincent S Smith is an interesting and timely contribution to the literature and debate surrounding publication of primary scientific data.

Rhian Cunliffe

Senior Journal Development Editor, BMC-series journals

June 29, 2009 11:30 AM

Open Access News

OA mandate at the Canadian Breast Cancer Research Alliance

The Canadian Breast Cancer Research Alliance has strengthened its OA policy from a request to a requirement.  (Thanks to Jim Till.)

From the old policy (adopted April 2007):

CBCRA requests that grant holders supply an electronic copy of final, accepted manuscripts funded in whole or in part by CBCRA grants.  CBCRA requests that grant holders supply an electronic copy of final, accepted manuscripts funded in whole or in part by CBCRA grants. These articles will be posted on the CBCRA Open Access Archive as soon as possible after publication. A publisher’s embargo period of up to six months will be permitted....

From the new policy (revised April 2009):

CBCRA requires that grant holders supply an electronic copy of final, accepted manuscripts funded in whole or in part by CBCRA grants, to be posted in the CBCRA Open Access Archive, as soon as possible after publication. A publisher’s embargo period of up to six months will be permitted....

Comments

June 29, 2009 12:22 PM

Version 1.0 of the Open Database License

The Open Data Commons has released version 1.0 of the Open Database License (ODbL).  From today's announcement:

The Open Database License (ODbL) is an open license for data and databases which includes explicit attribution and share-alike requirements.

This license, the first of its kind, is a major step forward for open data. There are currently very few licenses available suited to data and databases and none which provide for share-alike (existing share-alike licenses such as the GPL, GFDL and CC By-SA are all unsuitable for data).

The development of the ODbL, has been a major effort extending over more than one and half years with an intensive consultation and review period for the last 6 months. We’d like to express our thanks to the communities and individuals who have contributed during this time.

PS:  Also see our past posts on the Open Database License --and our past posts on the Science Commons alternative (Protocol for Implementing Open Access Data), which favors the unrestricted public domain over open licenses for data.

June 29, 2009 11:43 AM

First funding pledge for ELIXIR

Sweden became the first country to pledge funding to ELIXIR, an ambitious European project to preserve and provide OA to biological data.

June 29, 2009 11:14 AM

A career in OA publishing

Charles W. Bailey, Jr., A Look Back at Twenty Years as an Internet Open Access Publisher, June 28, 2009.  Excerpt:

...In August 1989, I began my scholarly digital publishing efforts, launching one of the first e-journals on the Internet, The Public-Access Computer Systems Review.  This journal, if it was published today, would be called a "libre" open access journal since it was freely available, allowed authors to retain their copyrights, and had special copyright provisions for noncommercial use.

Aside from Public-Access Computer Systems News (also "libre" open access), my subsequent digital publications, such as the Scholarly Electronic Publishing Bibliography, were "gratis" open access until 2004, when all new versions of existing publications and new publications became "libre" open access under various versions of the Creative Commons Attribution-NonCommercial License.

For current information about my publication activities, see "Brief Resume of Charles W. Bailey, Jr." and "Selected Publications of Charles W. Bailey, Jr." ...

Below is a chronology of my digital publishing efforts from June 1989 through June 2009....

Also see the abridgment, A Brief Look Back at Twenty Years as an Internet Open Access Publisher, June 28, 2009.

June 29, 2009 10:23 AM

OpenSocial API Blog

Why Enterprise Software Provider Atlassian Chose OpenSocial

Hi, I'm Mark Halvorson the "Chief Imagineer" at Atlassian Software. Whenever I tell people my title it is usually received in one of two ways - a chuckle and a blank stare, or for those in the know some comment about Walt Disney. No, I don't make rides for an amusement park, but I do get to imagine inventive ways to combine thorny, enterprise challenges with some of the exciting things happening on the consumer web. That is why I'm particularly excited to blog in this forum about how Atlassian is bringing OpenSocial to the Enterprise.

Enterprise, meet OpenSocial

Much like "Imagineer" makes you think Walt Disney when you hear OpenSocial, you are likely thinking: Orkut, MySpace, and other Internet social networks. When we heard OpenSocial we thought: now there's some cool technology we can use to bring our portfolio closer together, and closer to lots of great stuff on the Internet.

Atlassian is a seven-year young software company, hailing from Australia, and building collaboration and productivity tools for developers and teams. Many of you may of come across two of our better known products: JIRA, an issue tracker, and Confluence, an enterprise wiki. The rest of the portfolio includes a series of developer tools: FishEye, for exploring source code on the web; Crucible, for peer code review; Bamboo, a continuous integration server; and Clover, for test coverage analysis. We also offer Studio, which combines several of these products into a hosted integrated development suite.

Development is a social activity

Development is social. Developers work in teams, often with other non-developers like product managers and technical writers. Those teams work together on a variety of shared objects: specifications, tasks, documentation, source code, builds and projects. Each of those shared objects generate lots of activity: comments, subtasks, notifications of changes and edits, build failures, code commits. These teams use lots of different tools and systems: wikis, bug trackers, build automation systems, source code repositories. That's a huge internal social network. People working with people, people working with systems, and systems working with systems - a river of activity that needs to funnel to the people who care about it most. Our mission is to help developers collaborate and communicate easier, and in the process help them write higher quality code faster.

Okay, so why OpenSocial?

With eight products that support various parts of the development process, each with their own dashboard, and each spitting off data and activity that the others could benefit from, OpenSocial gave us an inventive, proven integration pattern: gadgets . We've embraced OpenSocial gadgets as a method of integration between our own products and between other enterprise software, and we're using OpenSocial gadgets as a mechanism to inject functionality and information from our products into other OpenSocial-compliant containers on the Internet, like Gmail or iGoogle.

JIRA 4.0 will be the first OpenSocial container in our portfolio to ship. JIRA has implemented OpenSocial through Shindig as a series of Atlassian plugins, which we call the Atlassian Gadgets plugins. JIRA produces Gadgets that can be displayed by other OpenSocial-compliant containers, including iGoogle and Gmail, and authentication between Gadget producers and consumers is handled through OAuth. We're excited about the possibilities. JIRA dashboards can now quickly assemble build status from Bamboo, project updates from Confluence, assigned code reviews from Crucible, all in the context of the issues and tasks assigned to a developer in the context of a JIRA project. Are you a team lead, and spend most of your time in Gmail? No problem, take all of that same information and park it there, so it's right alongside your inbox.

We've launched a little site that talks more about what we're doing at http://www.atlassian.com/opensocial. You can also follow us on twitter http://twitter.com/atlassian. I hope to do more blogging here about things we learn and cool stuff we're experimenting with. In the meantime, here's short video of how a dev manager, who may live in Gmail, can file issues and track the state of projects and builds using Atlassian Gadgets in Gmail.



June 29, 2009 09:48 AM

A new addition to the OpenSocial family - the ActionScript3 client library!

The current generation of social applications has become increasingly interesting and attractive, with many apps sporting fancy animation effects and complex user interactions. One exciting result of this trend is the growing number of ActionScript developers in the community.

To support the development of OpenSocial apps using ActionScript, we are happy to introduce a new client library which exposes almost all of the OpenSocial v0.8 JavaScript APIs in native ActionScript 3 for Flash and Flex gadget developers. The library provides an event-driven development model that is prevalent in the ActionScript community, a FlexUnit-based testing framework, and samples for both Flash and Flex environments. We hope the library will ease the learning curve for ActionScript developers and shorten the development cycle. To check out the code, point your browsers to the Source tab linked from the ActionScript Client Library project page.

This library is completely open sourced under the Apache 2.0 license, and contributions are not only welcomed, but encouraged. In addition to a wiki page explaining the patch submission process, this project hosts an issue tracker which will be populated with known issues and requested enhancements. This tracker is the best place to start if you're interested in contributing to the project. Please use the tracker to report any new bugs or incompatibilities you find, or to request new features. You can also 'star' feature requests reported by other developers if they are significant to your own development. This will help us prioritize which bugs or features to work on next. Also, you are welcome to join the client library discussion forum and post your questions and feedback. We look forward to seeing you there!

June 29, 2009 09:35 AM

Open Video Conference

Columbia’s Educational Video Environment Released at OVC

The Columbia Center for New Media Teaching and Learning (CCNMTL) released its online video analysis tool to the public under an open source license this week at the Open Video Conference. Video Interactions for Teaching and Learning (VITAL) is an educational web environment that allows students to edit, annotate, and store video clips that they select from an accompanying video library. Students can then use these clips within multimedia essays that are published within the VITAL environment for review and critique by the professor and classmates.

CCNMTL released VITAL as an open source project to invite developers to join in advancing what has been a very successful educational tool at Columbia University and to encourage other universities to adopt and use the VITAL environment. A VITAL project page has been created in the Google code repository that provides details for using and participating in the environment. Check out the VITAL press release or VITAL demo for more information.

June 29, 2009 08:22 AM

ubiquity-firefox Google Group

Display More Results

I would like to know whether it's possible to ask Ubiquity to show
more than the top 3 results from Google search?

June 29, 2009 04:25 AM

June 28, 2009

ocropus Google Group

Patch for genAM.py (ocropus and iulib) for Python 2.3.4

Here's my humble first contribution to OCRopus...
I am trying to build OCRopus on a Bluehost server so I can use it as
part of a web service. Bluehost's version of Python is 2.3.4 (at
least on the machine my site is hosted on).
Their Python doesn't like the for loop inside the join statement.
Lines like this:

June 28, 2009 10:40 PM

cleanup of wikis and documentation

As you may have noticed, the ocropus.org link has changed and now
points to the Google Code site. The primary resources for OCRopus are
now just two sites:

=== Google Code ===

[link] aka
[link]

Home page, source code, core documentation, links, issue tracker, etc.

June 28, 2009 09:58 PM

Confidence value

Hi,

I am wondering if there is any way to obtain the confidence value of
each recognized character by OCRopus.

Thanks for your reply

Thai

June 28, 2009 09:56 PM

tesseract-ocr Google Group

Confidence value for each character

Hi,

I am wondering if there is anyway to get the confidence value for each
recognized character in tesseract.

waiting for the replies..

Thanks

Thai

June 28, 2009 09:45 PM

Open Access News

A new model for OA repositories

Laurent Romary and Chris Armbruster, Beyond Institutional Repositories, a preprint, self-archived June 26, 2009.

Abstract:   The current system of so-called institutional repositories, even if it has been a sensible response at an earlier stage, may not answer the needs of the scholarly community, scientific communication and accompanied stakeholders in a sustainable way. However, having a robust repository infrastructure is essential to academic work. Yet, current institutional solutions, even when networked in a country or across Europe, have largely failed to deliver. Consequently, a new path for a more robust infrastructure and larger repositories is explored to create superior services that support the academy. A future organization of publication repositories is advocated that is based upon macroscopic academic settings providing a critical mass of interest as well as organizational coherence. Such a macro-unit may be geographical (a coherent national scheme), institutional (a large research organization or a consortium thereof) or thematic (a specific research field organizing itself in the domain of publication repositories).

The argument proceeds as follows: firstly, while institutional open access mandates have brought some content into open access, the important mandates are those of the funders and these are best supported by a single infrastructure and large repositories, which incidentally enhances the value of the collection (while a transfer to institutional repositories would diminish the value). Secondly, we compare and contrast a system based on central research publication repositories with the notion of a network of institutional repositories to illustrate that across central dimensions of any repository solution the institutional model is more cumbersome and less likely to achieve a high level of service. Next, three key functions of publication repositories are reconsidered, namely a) the fast and wide dissemination of results; b) the preservation of the record; and c) digital curation for dissemination and preservation. Fourth, repositories and their ecologies are explored with the overriding aim of enhancing content and enhancing usage. Fifth, a target scheme is sketched, including some examples. In closing, a look at the evolutionary road ahead is offered.

June 28, 2009 08:41 PM

ubiquity-firefox Google Group

0.5 Conversion: Defining argument prepositions

Hi guys,

In parser 2 is it possible to define the names of prepositions that
are used in argument roles?

E.g. Previously there was a modifier called "in", and this meant the
user could enter "view-tasks in list". In Parser2 world the best role
I have found is "source" and this results in "view-tasks from list".

June 28, 2009 02:29 PM

Open Access News

Finland joins SCOAP3

Finland's National Electronic Library (FinELib) has joined the CERN SCOAP3 project.

June 28, 2009 01:27 PM

More on the history of OA and the preprint culture in physics

Richard Poynder, Open Access and the A-Bomb, Open and Shut?  June 22, 2009.  Excerpt:

Many have wondered why the first scientists to embrace Open Access (OA) were physicists.

That physicists were the OA trailblazers is not in doubt: it was, after all, theoretical physicist Paul Ginsparg who in 1991 created the seminal physics preprint repository arXiv....

Maybe because physicists have been sharing paper preprints with one another for decades? OA advocate Eberhard Hilf tells me that this began as long ago as 1932, when the Italian Nobel Laureate Enrico Fermi started to routinely mail preprints of his papers to colleagues prior to publishing them....

In this light, arXiv was simply a digital manifestation of a practice that began long before the Internet....

Miriam Blake head of the library at the Los Alamos National Laboratory (LANL)...was kind enough to ask one of her colleagues – LANL librarian Michelle Garcia – to see if she could find any reference to an OA mandate in the Los Alamos archives....

A few days later I had an email from Garcia. There was no mention of a mandate in her message, but she did send me something of greater inherent interest: a link to the 1945 Smyth Report.

The Smyth Report, Garcia explained, is “the earliest example of any kind of acknowledgement on the need for public release of information specifically on the development of atomic energy by the US government. Following the Smyth Report, there was a declassification program headed by a committee of senior scientists that led the Manhattan Project, which came up with the declassification guidelines in 1946.” ...

As the preface to the Report puts it, “The ultimate responsibility for our nation’s policy rests on its citizens and they can discharge such responsibilities wisely only if they are informed.”

[T]he Smyth Report stressed that scientific information should be released to the public not because its creation had been funded by taxpayers, but because it would enable them to make informed decisions about how the science should be used....

Back to the question of why physicists were the first to embrace OA: Could it be that the US atomic weapons declassification program helped create the preprint culture characteristic of the particle physics community?

In other words, in being asked to think through the reasons for and against making their research freely available, could it be that physicists became acculturated into assuming that the default position should be one in which scientific information is made as widely available as possible, as soon as possible – on the assumption that in most cases the benefits far outweigh the disadvantages? ...

June 28, 2009 01:15 PM

ubiquity-firefox Google Group

Ubiquity 0.5Pre2 does not work for me

I installed 0.5pre2 recently, but since then Ubiquity does not work any
more. I am using Firefox 3.0.11 in Windows Vista. What could be the reason
and solution for this?
Thanks
Tony

June 28, 2009 03:40 AM

Semantic Forms Google Group

internal link to "search by property"

hi, this may have been asked but i couldn't find it. say i have a
property called "Favorite Soda", and I want to make each page's data
link to the list of other pages using the same data. How do I do that,
if the field has a space?

basically, i want someone who put in "coca cola" to have those words

June 28, 2009 02:52 AM

June 27, 2009

ocropus Google Group

Newbie question: removing artifacts from mobile phone picture

Hello,

I'm rather new to Ocropus. I've managed to compile and install using
Ubuntu 8.04.
I'm now starting to look into ocroscript to find out how I can use it.

I've done a lot of work with tesseract and I'm able to use tesseract
to great extent.

However, I'm looking to do some document analysis now, and I know

June 27, 2009 05:46 PM

The FRBR Blog

FRSAD draft available, FRAD book published

Seen on David Bigwood’s Catalogablog, quoting something else:

IFLA Working Group on Functional Requirements for Subject Authority Records (FRSAR)

Invitation to participate:

Review of “Functional Requirements for Subject Authority Data (FRSAD) — Draft Report” Available through: http://nkos.slis.kent.edu/FRSAR/index.html or directly from: http://nkos.slis.kent.edu/FRSAR/report090623.pdf (2,800 kb)

Comments deadline: July 31, 2009

FRSAD is the new name for FRSAR, just as FRAD started as FRANAR, Functional Requirements and Numbering of Authority Records. Which you can now hold in your hands, because Functional Requirements for Authority Data is finished and now in book form.

This book represents one portion of the extension and expansion of the Functional Requirements for Bibliographic Records. FRBR has been published as Nr 19 in the present Series. It contains a further analysis of attributes of various entities that are the centre of focus for authority data (persons, families, corporate bodies, works, expressions, manifestations, items, concepts, objects, events, and places), the name by which these entities are known, and the controlled access points created by cataloguers for them. The conceptual model describes the attributes of these entities and the relationships between them.

It costs €69.95 or USD $84 for North Americans.

There are no links on IFLA’s site to a downloadable FRAD, and there’s no mention of the FRSAD draft. The FRSAD group is hosting the draft on their own web site. Neither group announced their news on the FRBR mailing list. I’m bewildered. I assume the final FRAD text will be available to download soon. Open access to FRBR was a major contributor to its success.

June 27, 2009 01:54 PM

tesseract-ocr Google Group

tessnet2.dll signing

Hi guys,

I was wondering if there is a way to sign the tessnet2_32.dll with a
strong name?

thanks,

snake.

June 27, 2009 01:33 PM

Linked Data Blog Aggregator

3sat TV magazine features Linked Data and DBpedia

The 3Sat computer magazine ‘neues‘ has broadcasted a feature about Linked Data and DBpedia and the roles both efforts are playing in the evolution of the Web into a medium for the publication and linkage of data.

See:

Background information:

June 27, 2009 07:43 AM

ocropus Google Group

Moderation Required ...

Although most of us are adults :-) , I would request to
put all new members' messages under moderation .

recently this Ocropus group and other similar groups
have been a target of unwarranted messages - sad .

thanks - mvs

June 27, 2009 07:02 AM

ubiquity-firefox Google Group

Case sensitive commands

Are commands supposed to be case sensitive?
It took me until I looked into the source of the command feed to
figure out why "google" to search google wasn't working for me on
0.5pre2, and the reason I believe is that the name is capitalized
("Google") in the command definition. This is also true of some other

June 27, 2009 04:06 AM

information aesthetics

NYTimes Michael Jackson's Billboard Rankings Over Time

nytimes_michael_jackson.jpg
The New York Times just released this interactive infographic about Jackson's Billboard Rankings Over Time [nytimes.com]. It show the timeline of how Michael Jackson's songs performed on the Billboard Hot 100 chart, and how Michael Jackson's Billboard rankings compare with other notable artists, just as The Beatles, US or Mariah Carey.

More information about how the NYTimes graphics department was able to churn out this graphic so quickly, can be found at the Revolutions blog.


June 27, 2009 01:35 AM

June 26, 2009

Free Our Data: the blog

Michael Cross: setting data free is an easy promise when in opposition – so would a Tory government do it?

Michael Cross, co-founder of this campaign, has an article at the Guardian’s Comment Is Free site on the Conservative pledges on data made on Thursday by David Cameron. Of note: The three-year-old Free Our Data campaign – founded by myself and the Guardian’s technology editor Charles Arthur – will welcome Cameron’s re-stated promise to publish every [...]

June 26, 2009 08:18 PM

EFF.org Updates

miniLinks for 2009-06-26

June 26, 2009 07:57 PM

Open Access News

U. Kansas adopts an OA policy

University of Kansas, KU becomes first U.S. public university to pass an open access policy, press release, June 26, 2009. (Thanks to A. Townsend Peterson.)

The University of Kansas has become the nation’s first public university to adopt an “open access” policy that makes its faculty’s scholarly journal articles available for free online. ...

Under the new faculty-initiated policy approved by Chancellor Robert Hemenway, digital copies of all articles produced by the university’s professors will be housed in KU ScholarWorks, an existing digital repository for scholarly work created by KU faculty and staff in 2005. ...

Professors will be allowed to seek a waiver but otherwise will be asked to provide electronic forms of all articles to the repository. KU’s Faculty Senate overwhelmingly endorsed the policy at a meeting earlier this year, but additional policy details, including the waiver process, will be developed by a senate task force in the coming academic year, said Faculty Senate President Lisa Wolf-Wendel, professor of education leadership and policy studies. The task force will be led by Ada Emmett, associate librarian for scholarly communications. ...

Via email: The policy was approved by the Faculty Senate on April 30, 2009; by the Provost on May 19; and by the Chancellor on May 22. From the text of the policy:

... Each faculty member grants to KU permission to make scholarly articles to which he or she made substantial intellectual contributions publicly available in the KU open access institutional repository, and to exercise the copyright in those articles. In legal terms, the permission granted by each faculty member is a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit. This license in no way interferes with the rights of the KU faculty author as the copyright holder of the work. The policy will apply to all scholarly articles authored or co-authored while a faculty member of KU. Faculty will be afforded an opt out opportunity. Faculty governance in consultation with the Provost's office will develop the details of the policy which will be submitted for approval by the Faculty Senate.

Comment. The university's press release is a bit misleading. Both the University of Oregon and Oregon State University, which are public universities, have departmental mandates. But KU is the first university-wide institutional mandate of any American public university, and only the second of any American university, after MIT.

I haven't found a final version of the policy text online. But an earlier draft of the policy contains several features missing from the version I received by email, most notably a deposit mandate. The version I received authorizes the university to provide OA to faculty articles (with an opt-out), but doesn't state that faculty will be required to deposit a copy. (The press release says that authors will be "asked" to deposit.)

June 26, 2009 07:35 PM

How to build free knowledge

Peter Eckersley, Finding a fair price for free knowledge, New Scientist, June 24, 2009. (Thanks to Garrett Eastman.)

... It makes no sense to limit and control access now we have technologies to give information to everyone. But it is also foolish to pretend we do not need incentives to help produce and publish that information. ...

[I]f we really want to end scarcity, we will have to build institutions that promote knowledge-sharing, while at the same time ensuring that there are incentives for creative and technical minds to contribute.

Science, and the universities that support it, is the grandest example of a system that has evolved to promote the abundance of knowledge. Universities offer incentives in the form of tenure, promotion and prestige to researchers who can discover and share the information which their peers consider most valuable. ...

Take the open access movement, which has campaigned to ensure that scientific articles are freely available to the public ... Within a decade or two, it is safe to say that all scientific literature will be online, free and searchable. Journal publishers will still be paid, but at a different point in the chain.

Outside the universities we have some even more remarkable developments. Fifteen years ago, who would have predicted that teenagers would be allowed to edit the world's primary reference source from their homes? ...

It's time to recognise that when we build institutions to promote the abundance of knowledge, everybody wins. When it comes to knowledge, you can never have too much of a good thing.

June 26, 2009 06:58 PM

More on publishing data

Vincent S. Smith, Data publication: towards a database of everything, BMC Research Notes, June 24, 2009. (Thanks to Garrett Eastman.) Abstract:

The fabric of science is changing, driven by a revolution in digital technologies that facilitate the acquisition and communication of massive amounts of data. This is changing the nature of collaboration and expanding opportunities to participate in science. If digital technologies are the engine of this revolution, digital data are its fuel. But for many scientific disciplines, this fuel is in short supply. The publication of primary data is not a universal or mandatory part of science, and despite policies and proclamations to the contrary, calls to make data publicly available have largely gone unheeded. In this short essay I consider why, and explore some of the challenges that lie ahead, as we work toward a database of everything.

June 26, 2009 06:31 PM

Video of Boyle on The Public Domain

A video of James Boyle's presentation on his book, The Public Domain: Enclosing the Commons of the Mind, at the Royal Society for the encouragement of Arts, Manufactures & Commerce (London, March 10, 2009) is now available. (Thanks to Michel Bauwens.)

June 26, 2009 06:17 PM

More on student support for OA

Nick Shockey, Students join access debate, Research Information, June 25, 2009.

... A group of six national and local American student associations, representing both graduates and undergraduates, have come together to issue the Student Statement on the Right to Research. This statement calls on researchers, universities, and governments to take relevant steps to increase access to the results of research.

In the past, discourse on scholarly publication and open access (OA) has largely been between academics, librarians, and publishers. This resolution marks students’ entry into the discussion. It reflects the large impact that limited access to research can have on students of all disciplines. ...

The new generation of scholars has grown up using the internet and having access to whatever information they need whenever they need it. Not having the same kind of unfettered access to information that is critical for their professional development is especially frustrating. ...

The statement has resonated with students in the USA but, while the current signatories are American, the resolution is not exclusive in its focus. It has also generated interest from students in Canada and across Europe and we look forward to reaching out to international student organisations in the near future. ...

As we move forward, we hope to use this statement as a rallying point for students to get engaged with the OA movement and as a solid foundation on which to build a rich student voice on OA. ...

June 26, 2009 06:06 PM

Impact factors of Hindawi journals rise

Hindawi's Impact Factors Increase by 27%, press release, June 23, 2009.

Hindawi Publishing Corporation is pleased to announce that it has seen very strong growth in the Impact Factors of its journals in the recently released 2008 Journal Citation Report published by Thomson Scientific. This most recent Journal Citation Report shows the average Impact Factor of Hindawi's journals increasing by more than 27% over the past year, with two of Hindawi's largest journals, EURASIP Journal on Advances in Signal Processing and Mathematical Problems in Engineering, rising by 70% and 45% respectively. ...

In addition to the 14 journals that were included in the 2007 Journal Citation Report, three of Hindawi's journals received Impact Factors for the first time this year ...

June 26, 2009 05:55 PM

ubiquity-firefox Google Group

How can I have Firefox remember all open windows after shutting down?

How can I have Firefox remember all open windows after shutting down.
Frequently I have all these tabs and windows open and I want to save
these open windows so I can being where I left off the next day. I
don't want to make bookmarks, I just want all the same windows open
when I restart. I have searched alot of groups and can't find an

June 26, 2009 04:05 PM

ocropus Google Group

Training

Hi,

I'm using the Mercurial code and getting good results! One big
problem is that words seem to run together. (I'm putting the image
and text together with hocrtopdf)

"are in an informal setting in a conference room, we must"

is recognized as:

"areinaninformalset6nginaconfe renceroom,wemust g"

June 26, 2009 03:57 PM

Open Access News

More on FRPAA

The Federal Research Public Access Act (FRPAA), an OA mandate for research funded by the U.S. federal government, was introduced yesterday by Sens. Joe Lieberman and John Cornyn. What's new since our last post:

June 26, 2009 04:54 PM

Linked Data Blog Aggregator

Linked Data Rules Simplified

As a compliment to the most recent Linked Data Design Issues note by TimBL, I would like to add this subtle tweak to the enumerated rules:

  1. Identify or Name things using HTTP URIs
  2. Describe things using the RDF metadata model
  3. Increase link data mesh density on the Web by linking (referring) to things in other data spaces using their HTTP URIs.

If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.

Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:

  1. RDFa within (X)HTML documents
  2. N3, Turtle, TriX, RDF/XML etc. based documents
  3. Programmatically generated variants of 1&2.

Related

June 26, 2009 02:49 PM

ocropus Google Group

allheaders.h accepted by the compiler, rejected by the preprocessor!

I'm trying to build OCRopus, I checkedout the code form mercurial last night.
it comes down to this:
./configure --prefix=/home/stuporgl/www/bo okscanned/prog/
--with-tesseract=/home/stuporg l/www/bookscanned/prog/
--with-iulib=/home/stuporgl/ww w/bookscanned/prog/
leptheaders=/home/stuporgl/www /bookscanned/prog/include/libl ept/

June 26, 2009 12:14 PM

Browse Blogs

TiddlyGuv: An Open-Source Governance System

This is an overview of TiddlyGuv, an open-source governance application under development at Osmosoft, the open source innovation arm of BT. TiddlyGuv helps enterprises manage their open-source activities, and TiddlyGuv itself is an open-source, BSD-licensed, framework available for anyone to adopt and tailor to their needs.

June 26, 2009 11:06 AM