Unilever Centre for Molecular Informatics
 

petermr’s blog

A Scientist and the Web

 
 
Recent developments - August 25th, 2008

I have been offline for some time because a lot has been going on. As I mentioned earlier we have had 5 students over this summer and they have been phenomenal. Not that previous students - such as Joe Townsend - have’t also been.  But in the last 2 months we have achieved a huge amount of software and systems development and I’ll tell you about this later. However it has kept me 100% occupied on software and I haven’t had a chance to think about blogging or anything much else. There are several things that I shall write about (see http://wwmm.ch.cam.ac.uk)

  • The Chem4Word project where Microsoft and we are developing a chemical authoring system within the Office (and related) XML environments.
  • The Crystal editor that we have  been working on with sponsorship from the Int. Union of Crystallography and which Nick Day announced ar the Osaka IUCr meeting
  • A departmental data repository for crystallography
  • An ontology for chemical reactions (sponsorship from Royal Soc. Chemistry)
And there are upcoming meetings:
  • This week I am talking at the Ticer meeting on digital liraries in Tilburg
  • On Friday there is the Nature blogging meeting in London
So I have a full set of topics to cover. As I am in the process of changing machines and getting software ready for presentation I’ll leave the details for a day or two.

 
 
Update - July 29th, 2008

I have been off the air for some time because of travel and also technical problems with the blog which Jim Downing has solved (thanks). <p/> I hope to blog soon about data, repositories and escience among various topics.

 
 
ESOF2008 Alma Swan’s session - July 18th, 2008

Alma Swan has organised the session at Barcelona ESOF : (http://www.esof2008.org/fileadmin/media/programme/scientific_programme_preliminary_abstracts.pdf) [Saturday, 18th July, 1630).

Sharing scientific data: who benefits? Alma Swan, Key Perspectives Ltd, United Kingdom

Abstract: Digital datasets—text-based, numeric, audio, video or image-based—form the output of all scientific disciplines. How are these data being made available for sharing? What quality control mechanisms are in place? What kinds of naming conventions, tags, and metadata are in use and how effective are they at helping to manage open data? Who is storing, archiving and curating open data and at which levels? And how is the production and sharing of open data assessed: what processes are in place for crediting scientists for making their raw data openly accessible for sharing and re-use. How much can and should data publication replace traditional forms of publication of research findings?

 
 
ESOF - July 18th, 2008

We have been so busy with our summer program - semantic authoring and capturing of chemistry - that I haven’t had a breathing space. I’ll be blogging more about that. However a change of scene - tomorrow I’m in Barcelona at ESOF: The Euroscience Open Forum. I’ll post more later. It’s very important that Eurpe is a world leader in this arena. ========================================================== ESOF: The Euroscience Open Forum About ESOF ESOF2008 logoFor too long, Europe was lacking an independent arena for open dialogue on the role of all the sciences, including the humanities, in society. We have it now with the Euroscience Open Forum. The initiative was taken in 1999 by the researchers themselves: the Euroscience Open Forum was brought to life by Euroscience. Euroscience recognised the need for an interdisciplinary, pan-European meeting place for open dialogue and the exchange of ideas. Visit the ESOF2008 web site The ESOF concept Science and technology are becoming increasingly important as they concern and affect everybody. The Euroscience Open Forum is not an ordinary scientific conference, but a totally new concept. It consists of a Forum for discussion of topical issues, an embedded conference (with an exhibition) to showcase European achievements right across the scientific and technological spectrum, and an outreach programme. The outreach programme consists of a large number of events and happenings throughout the ESOF host city, which are targeted to the public at large of all ages. At ESOF2004 in Stockholm, the outreach programme “Science in the City” attracted 11000 visitors. At ESOF2006, the outreach programme was linked to the “Wissenschaftssommer”, attracting some 60000 visitors. ESOF also serves as a young scientists’ forum, encouraging students, PhD students and post-docs to share their experience and participate in debates about such subjects as the European Charter for Researchers, how to motivate young people to engage in scientific careers, and how the construction of the European Research Area enhances the prospects of young scientists. ESOF’s aims are: * Presenting scientific and technological developments at the cutting edge in all their variety from natural sciences to the social sciences and the humanities * Stimulating the European public’s awareness of and interest in science and technology * Fostering a European dialogue on science and technology, society and policy by offering a platform for cross-disciplinary interaction and communication on current trends and future roads for science and technology, their interaction with society and policy and the role of the public ESOF’s European itinerary The Euroscience Open Forum is held every other year, visiting the major scientific cities of Europe and bringing European science to the attention of all citizens. The starting point of ESOF’s European journey was Stockholm, Sweden, in 2004. Two years later, ESOF’s itinerary brought the vent to Munich, Germany. And, after ESOF2006, the route will continue southwards : ESOF2008 will be held in the capital of Catalonia, Barcelona, Spain. ESOF’s exciting host cities reflect Europe’s cultural diversity. Thus, you will experience that the spirit of every Euroscience Open Forum is different… ESOF’s success depends on you, too! You can contribute to this open dialogue on all the sciences and on their role in shaping a knowledge-based society. ESOF invites individuals and organisations to submit their best ideas in the form of proposals for the programme. The best of these proposals will be selected for the Forum by a Programme Committee of international standing. For information about ESOF2008, please visit (http://www.esof2008.org/) http://www.esof2008.org/ You can also propose the next destination for ESOF’s travel plans. For further information, please contact us:

 
 
John Sulston calls for reform of IPR policy - July 6th, 2008

Whether you support Open Access and Open Data or believe that Closed Access and patents are the best way of promoting high quality science, there is no doubt about the fact that restrictions on access to IPR area major drain on scientific effort. We all spend a significant point of time having to investigate contracts, and finding out whether or not we can actually do something. Now John Sulston has spoken out:

John Sulston, recipient of the 2002 Nobel Prize for medicine, has launched a new research institute, the Institute for Science, Ethics and Innovation at the University of Manchester. Sulston is using the launch to highlight his views on openness in science and the need to reform innovation and intellectual property policy. (Thanks to Subbiah Arunachalam.)

See the op-ed co-authored by Sulston and Joseph Stiglitz in the July 5 edition of The Times:

… The question of “Who owns science?” is therefore a crucial one, the answer to which will have broad-reaching implications for scientific progress and for the way in which the benefits of science are distributed, fairly or otherwise. Two of the most pressing issues concern equity of access to scientific knowledge and the useful products that arise from that knowledge. …

The second issue we wish to highlight is that of access to science itself. The ideal shared by almost all scientists is that science should be open and transparent, not just in its practices and procedures, but so that the results and the knowledge generated through research should be freely accessible to all. There is a broad consensus in the scientific community that such openness and transparency promotes the advancement of science and enhances the likelihood that the benefits of science are enjoyed by all. For more than a hundred years, these principles have been the bedrock of academia and the scientific community.

We call upon all interested in the future of science to join with us in an active and open-ended search for answers.

See also coverage in The Times and the BBC.

PMR: I hope that this message finds its way to the policy makers in academia as they have the power and the responsibility to act. In many cases the academic staff are unable to find the information they want or to allow it to reach those that they would hope to collaborate with. Not only are there patent and copyright restrictions, but universities often sign draconian contracts with the gatekeepers of scientific information. For example software companies can revoke licences or even sue the universities if we publicize bugs in the program. Publishers require libraries to sign contracts that forbid the use of the information in ways that individual staff don’t even know about. It’s only hearsay but I understand that these can include “excessive downloads” or data-mining.

In no way can any of this be seen as anything other than holding science back.

 
 
In praise of Undergraduates - July 5th, 2008

One of the highlights of my year is our summer program of undergraduate projects in the Centre. We’ve done this for six years and each student spends 8-10 weeks working on projects in Molecular Informatics.

I have been astonished and delighted by what the students have been able to achieve and the lasting legacy they have left and are continuing to leave. I’m leaving out names and will speak in general terms. The students are usually sponsored by an external organisation and we have built up good relations with quite a number - such as publishers and pharma companies. Some students are also supported by the Department, and some by Unilever. We advertise by word of mouth and by the subject email lists. In general the number of positions has roughly matched the number of applicants - this year we have four projects which are all filled and I hope to talk more about them in this blog.

Oscar - our chemical text- and data-mining/processing facility sprang from summer projects (support from Royal Society of Chemistry and Nature Publishing Group). I am consistently delighted with the standard of the Oscar summer software - the Experimental Data Checker has run for nearly 5 years without needing any software support. CrystalEye sprang from a summer project sponsored by the International Union of Crystallography.

You might think that 2 months is too little time to do anything useful, and most of the time you would be wrong. It’s not uncommon to start getting useful material in the first week. This is in some part because we would as a large team. Some of us the Centre members hot-desk into the “training area” and we work communally - fixing each others’ probelms and discussing strategy.

Most of the students get to present to the sponsors and this has been very useful. One presented over a video link to the US office of the sponsor.

And there is a longer-term benefit - 5 of the students are now doing - or have just finished - PhDs with us. That has been an enormous benefit to the knowledge, expertise and culture of the Centre.

In more general terms, when anyone asks me how they are going to adjust to the rapid changes in modern thinking I advise them to include undergraduates in their team. If you are in the Library sector you have to understand how students think and act and the only way to do this is to work alongside them. You’ll find that long-held views about metadata, bibliographies, customised databases, and the linear reading of articles no longer hold. The e-generation works differently. And it’s often us who have to be educated.

I’m not involved in formal undergraduate education here (I have done some demonstrating) but if I were I would turn the system on its head and involve the students in preparing and delivering course material. They are oretty good at finding it, after all.

 
 
Open Access Data Repositories - July 5th, 2008

Peter Suber has been working with colleagues to create a Wiki of Open Access Data repositories. From his blog

List of data repositories The Open Access Directory (OAD) list of Data repositories is now open for community editing.

OAD is a wiki, and you can help the cause by adding or revising entries to its lists.

Data repositories are becoming very important now and it’s clear that they are primarily useful if they are Open. Some subjects such as bioscience have had a long history of Open data repositories - and if the Wiki listed every one it would dominate the field.

Of course there are lots of nuances to discuss. What is Data? and what is Open? And I’ve spent time on this blog discussing these. At present I’ll just reiterate that we should label data as “Open Data” (from the Open Knowledge Foundation). And should protext freedom with Community Norms, not licences or contracts.

Every creator of an Open data resource should label it as such. All you need is:

This material is Open Knowlege

 
 
Research Repository System - July 2nd, 2008

Chris Rusbridge of the Digital Curation Centre (Edinburgh, UK) has come up with a great idea which I think has captured the zeitgeist. He started with “negative click” repository - and has mutated the name to Research Repository System. I was about to blog something just to say that I really supported his idea but hadn’t time to comment more when I suddenly found SEVEN Posts on his blog.

Here’s the latest post - it links back…

Research Repository System persistent storage

This is the seventh and last of a series of posts aiming to expand on the idea of the negative click, positive value repository, which I’m now calling a Research Repository System. I’ve suggested it should contain these elements:
  • spinoffs
  • I’ll try to find time to add comments. However we are preoccupied and very actively building our own repository system here for crystallgraphic and chemical data in the Department and I’ll be blogging bits as we go along. I’ll try to keep in sync with Chris.

    For me the true repository system has to be invisible…perhaps in the way the web is going. Universal = invisible. But that will take a while

     
     
    Festival of Crystallography at Wellcome - June 30th, 2008

    Last Thursday a group of us went to see the Wellcome Collection (iat 183 Euston Road, built in 1932). It’s literally across the road from Euston Station. We were all crystallographers and had worked with the great women and men crystallographers of the 20th Century. The particular reason is that the insititute is housing an exhibition of a remarkable collaboration between crystallographers and the Festival of Britain in 1951. I remember being taken to the Festival and being wowed by it - it was a vision of the future and how science and technology could change the world. I don’t remember all the parts and I don’t think I remember the crystallography. The crystallographic community, spurred by Helen Megaw at Cambridge, donated some of their output to be used as visual displays (”patterns”) and this is now exhibited as From Atoms to Patterns. There were no computer displays in crystallography at that time, so they couldn’t use the graphics that we have now. Instead they created displays in fabrics and synthetic materials and glass. I can’t reproduce them here (copyright) but please click through to see what I’m talking about. Better still, if you are in London and have half an hour before catching a train at Euston drop in. It’s free. And the permanent exhibition from Henry Wellcome is also most interesting. Crystallography is very beautiful. That’s why I got into it as a teenager - I got excited by polyhedra and my chemistry teacher gave me Phillips’ book on crystallography to read. I made physical models of all 32 point groups. Perhaps there was a subliminal echo of the festival in that activity. Several of the patterns had been provided by Dorothy Crowfoot Hodgkin - all of us had worked with Dorothy - some for many years. She’d been asked for patterns and the question of copyright came up. Dorothy wrote: “I feel rather doubtful whether I own any copyright of a pattern perpetuated by nature”.

     
     
    Elsevier: The grand challenge - June 29th, 2008

    Some of you will have noticed that Elsevier has launched a competition:

    WHAT IF YOU WERE THE PUBLISHER?

    Demonstrate your best ideas for how scientific research articles should be presented on the web and compete to win great prizes!

    CONTEST OVERVIEW

    We’ve worked hard to build the Article 2.0 dataset, and now we’re opening it up to developers via a simple, straightforward REST API. We will provide contestants with access to approximately 7,500 full-text XML scientific articles (including images) and challenge each contestant to be the publisher. In other words, each contestant will have complete freedom for how they would like to present the scientific research articles contained in the Article 2.0 dataset. We will encourage the use of XQuery, but this will not be a mandate. By leveraging these APIs, the contestant becomes the publisher and can render scientific articles to meet their needs including integrating the article into existing applications or combining it with other web service APIs.

    I and my colleagues are excited by this and I have written off to Elsevier to ask more about the content of the dataset. However, what can you do with 5000 articles covering the whole of science? Citation analysis has already been done. What else is general to all the disciplines? Well, we have some ideas and we aren’t giving them away here, but here’s one you might like to work on.

    Lets’ assume that the “fulltext” is chosen randomly from all items on the Elsevier site marked as “fulltext PDF”. These are, of course, chargeable at 31.50 USD. So I’ve done a pilot study on the latest issues of Tetrahedron.

    There are 29 fulltext articles (all 31.50 USD) and here are five of them (16% of the total).:

    Editorial board Page IFC Open Preview Purchase PDF (71 K) | Related Articles
    2. You are not entitled to access the full text of this document
    Graphical contents list Pages 7445-7451
    Corrigendum to “The potential of intermolecular Ncdots, three dots, centeredO interactions of nitro groups in crystal engineering, as revealed by structures of hexakis(4-nitrophenyl)benzene” [Tetrahedron 63(28) (2007) 6603–6613] Page 7650 Eric Gagnon, Thierry Maris, Kenneth E. Maly, James D. Wuest Open Preview Purchase PDF (68 K) | Related Articles
    28. You are not entitled to access the full text of this document
    Calendar Page I Open Preview Purchase PDF (60 K) | Related Articles
    29. You are not entitled to access the full text of this document
    IBC: Guide for Authors Page IBC Open Preview Purchase PDF (238 K) | Related Articles

    So it would normally cost 157.50 USD to read these. Hopefully 16% of the Elsevier Article 2.0 dataset will be of this category and we’ll be able to read and analyse the fulltext for free. That’s about 800 articles, so enough to work on. I do hope Elsevier have included them, because they are clearly worth paying for. Indeed the total cost of these articles would be 25, 000 USD and we can get them for free!

    I’ll be continuing my little adventure into which others publishers charge non-subscribers for:

    • Editorial Board Info
    • List of abstracts
    • Corrigenda
    • calendar
    • Guide for authors
    and whether they have thought of other ways of making money. After all Web 2.0 is all about making money, isn’t it?