## Tuesday, January 30, 2007

### CDK Workshop - Day #2

Because of other obligations, I was unable to attend the first day of the CDK Workshop, though Christoph had set up Skype so that at least I could hear the talks from Prof. Berthold (Konstanz, Germany) about KNIME and Prof. Zielesny about CDK-Taverna.

Today, Miguel Rojas and Stefan Kuhn discussed their research. Miguel showed the state of mass spectrum prediction using the CDK and the MEDEA plugin for Bioclipse. Stefan demonstrated the NMRShiftDB and a new lab systems for NMR experiment scheduling and management system based on that. Dr. Ott (Nijmegen, Netherlands) showed the BioMeta Database which contains metabolite and reaction information derived from the KEGG, but which fixes a set of chemical problems in the latter (see also the article, DOI:10.1186/1471-2105-7-517).

The afternoons of CDK workshops traditionally have discussion sessions and hackathons. Two groups were formed: one consisted of the KNIME guys who, together with Miguel and Federico focused in QSAR descriptor calculations in KNIME, while Stefan, Martin and me looked at the fingerprinter peculiarities that Martin found (see also this CDK News article), and came up with a possible further performance improvement of the AllRingsFinder. Because one class of molecules that is causing trouble consist of two ring systems connected by a long linker, like Choloyl-CoA (below), we anticipate that splitting the molecule up into ring systems prior to using the SSSR algorithm should speed up the complete all-ring finding process.

Currently, the spanning tree is calculated before deciding on using the SSSR finder, which, we think, can be used to partition the molecule into separate ring systems. On each of them, then, the further steps of the ring search can be applied.

After dinner (pasta/pizza), during the Spanish-German handball game, we continued the hacking and discussions, now focusing as a whole group on QSAR descriptors in KNIME. We looked at each descriptor and decided if it should go into a QSAR calculator node, or even in a node of its own.

Bugs found:
I won't close this blog entry without giving a list of problems we found in the current CDK; some minor and small, some more troublesome. Here goes: typos all over the place; the OrderQueryBond lack a return statement in an else clause; the Mol2Reader does not mark atom and bond aromaticity properly and reads a single bond as aromatic, and an aromatic bond as single; the Renderer2D does not always highlight both atoms when hovering over a bond; SmilesGenerator.parseBond() should output bond orders correctly; the SSSR finder seems to have a messed up if-else statement for the ringBondCount limit of 37; the BondCount descriptor should count all bonds by default, not just the single bonds; IDescriptor.getParameters() should return null instead of Object[0]; several programs use the SYBYL atomtype S.o2, while the specification and the CDK config defines S.O2; the IP descriptor now returns a variable length descriptor.

## Wednesday, January 24, 2007

### Blogging and the Press

Today at the OSMB we had again a good lunch again, and Rachel Sterne joined our table. She works at a New York based start up Ground Report, which is a news website where anyone, including bloggers, can post news stories. Not links to news stories, as on Slashdot, but actual news stories. Stories that can be committed are not restricted to any topic, or country, or whatever. The good news is that the revenues out of advertisement is shared with the people that submit the stories, 50/50 even, if I understood correctly. The more visitor hits your story gets, the bigger your part of the revenue is.

Now, the reason why I advertise this, is that Paul recently blogged about the status of bloggers as members of the press. ACS does not seem to think so, though even the Pulizer organization disagrees. The ACS requires that freelancers are connected to an news organization, and I am wondering wether they would accept Ground Report as such...

### OSMB2007 Day #1: venture capital, scientific blogger and Kepler

The second day just started of the Open Source Meets Business, and now actually listening to the PHP talk, but here is a short update on day 1, which was the investment summit. It was not so crowded, but especially the talks from the venture capitalists were interesting. During lunch we actually talked to one in person, which was insightful. I will be putting up links to interesting sites mentioned during this conference on my delicious account.

• an active community is important, cherish it
• support as business model is not interesting for venture capatilists
• don't think you understand the legal implications

Noteworthy is that we have free wireless at the conference site :) So I downloaded a recent presentation by Jean-Claude about his open science work and blogging efforts, which I enjoyed watching very much. I skyped with my wife and children, and I booked a hotel for the ACS meeting in March in Chicago, as chances are high that I will attend that meeting.

Last night it started snowing, and it is completely white outside right now. The temperature has dropped to normal winter season, which made the burritos in downtown Nuernberg extra nice. Later today, Christoph's COSI talk is scheduled, and I was delighted to learn via Chemical blogspace that Carlos blogged about it yesterday! Cheers Carlos! In the same blog he also mentions that he is integrating the CDK with something called Kepler. Carlos, if you read this: what is the URL for Kepler?

## Monday, January 22, 2007

### Open Source Meets Business 2007

Today I leave for a two day visit at the Open Source Meets Business conference in Nürnberg, where Christoph will speak about the Chemoinformatics OpenSource Initiative (COSI). If you happen to go to that meeting too, let's try to meet!

## Sunday, January 14, 2007

### CDK Literature #1

For each CDK News I try to write up what CDK related literature has been published recently, but I failed to do so for the last two issues. In order to not postpone writing it up until close to the deadline, I will write up things here, so that I can copy-paste it later for CDK News.

Oxidoreductase-catalyzed reactions

Mu et al. analyzed about 2000 oxidation/reduction reactions from KEGG using the CDK and JOELib for the chemoinformatics bits. The reactions were grouped into 12 subclasses, and SVM was used to train models to distinguish reactants from non-reactants. It seems that there were not independent test sets used, but cross-validation indicates that there approach is possible. The works uses CDK's HydrogenAdder, UniversalIsomorphismTester, and unnamed QSAR descriptors. It would be interesting to see how it compares to the work of Aires-de-Sousa.
Fangping Mu, Pat J. Unkefer, Clifford J. Unkefer and William S. Hlavacek, Prediction of oxidoreductase-catalyzed reactions based on atomic properties of metabolites, Bioinformatics, 2006 22(24):3082-3088; doi:10.1093/bioinformatics/btl535

Cognate ligands

Bashton et al. took a different approach in analyzing the metabolome. They looked at the correlation of ligand structure with enzyme domains, and propose a method to identify cognate ligands, that is, ligands that are present in vivo and are required for a functional metobolome. The CDK is used for calculating fingerprints and used for calculating maximal common substructures (MCSS). The paper notes that the MCSS is not necessarily of biochemical relevance, indicating that there is room for pharmacophore like concept in the CDK.
Matthew Bashton, Irene Nobeli, and Janet M. Thornton, Cognate Ligand Domain Mapping for Enzymes, Journal of Molecular Biology, 364(4):836-852; doi:10.1016/j.jmb.2006.09.041

## Thursday, January 11, 2007

### Why do I blog?

Mitch blogged about a comment Bethany Halford, Associate Editor of C&EN, left in The Chem Blog. She is writing an opinion piece on chemistry blogs, and is wondering why I blog, whether I use a nickname, and if my employer knows I blog. So, here goes.

Why do I blog?

I started blogging in October 2005 to reduce my workload: involved in open source chemoinformatics projects, I quite often emailed to mailing lists about interesting websites/projects/events etc. Not uncommonly to multiple lists, which required me to tune the email to the list. I realized that blogging about it, would make it possible to no longer post it to mailing lists, and, therefore, reduce my workload. A second reason is that I post tricks there, so that I have them available in a central place, and to post questions that, hopefully, others can answer. As such, it is a way of communicating with fellow scientists, without the need the specifically address them. Open, free and fast.

Deliberately, I did not start a personally diary blog, but a blog about my work as chemoinformatician. Nevertheless, the nature of blogging allows to give what you write a personal twist. To stress scientific nature of my blog, and many others, is that blogging scientists often cite and discuss literature, which nicely leads to scientific blog aggregators like Postgenomic.com and Chemical blogspace, which summarize the scientific literature being discussed in the blogosphere. The latter even recently started to blog about molecules being discussed. There are even blogs which specialize on discussing literature, such as the blog by Rajarshi, Gary and David.

Why do I not use a nickname?

In my blogging I am clear in who I am, even where I work; I blog about my scientific work, and, as reader, putting one and one together would lead to my real name soon enough anyway. I did not discuss the blogging with the employer I had in 2005, but the blogging is mostly done outside office hours anyway, certainly in that period. My current employer is a scientific blogger himself. Even my nickname, or pseudonym, is not that obfuscated.

Moreover, I do make a statement in my blog (which sort of summarizes to: "you cannot do science if you cannot reproduce experimental results"), and I think it is not more than fair to identify myself. I'm not like Ender's brother Peter.

Why do I answer Bethany's questions?

I try to convince myself that I do not answer these questions out of procrastination, something Bethany is wondering. Instead, I like blogging as new way to communicate with fellow scientists on a scientific level (Bethany, do explore the full chemical blogspace, and be amazed of the high scientific content gems around!), though this might qualify is catching up with current literature. Moreover, answering this questions allows me to advertize my blog, and some websites I like. I feel that blogging might fill a niche in scientific communication.

## Tuesday, January 09, 2007

### The del.icio.us tagometer on www2.blogger.com

Yesterday I blogged about how to include the new del.icio.us tagometer on a www.blogger.com blog, just like Improbulus did last December as I discovered later. Felix asked me how it could be done on the new www2.blogger.com template system. Well, here it is.

Like with the old blogger.com template system, you need to add this to the header, just before the </head> end tag:
<!-- del.icio.us badge stuff --><script type="text/javascript">  if (typeof window.Delicious == "undefined") window.Delicious = {};  Delicious.BLOGBADGE_MANUAL_MODE = true;</script><link id="delicious-blogbadge-css"       href="http://images.del.icio.us/static/css/blogbadge.css"      rel="stylesheet" type="text/css" /><script src="http://images.del.icio.us/static/js/blogbadge.js" />

And, for the blog entry template bit, look for this the <p> element of class 'post-footer-line post-footer-line-3', which was empty for me. Add this <div> to that:
<p class='post-footer-line post-footer-line-3'>  <div class="delicious-blogbadge-line" expr:id="data:post.id">    <script type="text/javascript">      Delicious.BlogBadge.register('<data:post.id/>', '<data:post.url/>', '<data:post.title/>');    </script>  </div></p>

To get at the right place, with the full template XHTML content, go to your www2.blogger.com/home homepage, click the Template tab, then pick the Edit HTML option, and make sure to enable the Expand Widget Templates option.

## Monday, January 08, 2007

### The del.icio.us tagometer on Blogspot.com

Some days ago I read about the del.icio.us tagometer, which is basically sort of save as I had before on this blog. The tagometer, however, shows some interesting properties of the blog items, like the number of people who bookmarked the item, and what tags they used. The tagometer help does not show how it can be integrated with blogspot.com (where this blog is hosted), but with the source from 0xDECAFBAD I got it working. These blogs are not yet moved to the new blogger.com system (so, www.blogger.com, not www2.blogger.com), so the below principally applies to the older system.

First you need to adapt this blob to the <head> of the template:
<$BlogMetaData$><!-- del.icio.us badge stuff --><script type="text/javascript">  if (typeof window.Delicious == "undefined") window.Delicious = {};  Delicious.BLOGBADGE_MANUAL_MODE = true;</script><link id="delicious-blogbadge-css"       href="http://images.del.icio.us/static/css/blogbadge.css"      rel="stylesheet" type="text/css" /><script src="http://images.del.icio.us/static/js/blogbadge.js"></script></head>

where <$BlogMetaData$> and </head> should already be present in the template.

Further down the template, you need to add a bit in the <div class="blogPost"> section, just after the last <div class="byline"> element in your template. The bits you add use blogger variables, so make sure to get it right:
<div class="delicious-blogbadge-line" id="badge-<$BlogItemNumber$>">  <script type="text/javascript">    Delicious.BlogBadge.register('badge-<$BlogItemNumber$>', '<$BlogItemPermalinkURL$>', "<$BlogItemTitle$>");  </script></div>

Note the quotes of the third argument. Do this properly, the quotes in the output of <$BlogItemTitle$> should be escaped, so that it does not interfere with the quotes of the register() JavaScript call. Can anyone tell me how to do that in JavaScript?

## Thursday, January 04, 2007

### Chemical blogspace is getting more chemical

The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After hacking InChI support into Chemical blogspace a couple of days back, adding some more visual feedback on those molecules is not that hard, with PubChem around that is:

Beware! Every marked up molecule in your blog is being picked up! So should the compound with the SMILES N(=NC1=CC=C(C=C1)N(CCO)CCO)C3=CC=C(C=CC2=C(C(=C(C#N)C#N)OC2(C)C)C#N)S3, which is reported to be the most light sensitive molecule every synthesized so far.

## Tuesday, January 02, 2007

### Chemistry in HTML: JavaScript from the server

Recently I blogged about a Greasemonkey script to take advantage of semantic markup of chemistry in blogs (and HTML in general), and later made some plans how this can be extended. One of the ideas was to make this userscript available from the server, instead of having people need to install Greasemonkey and the script separately. So, here it is.

sechemtic.js

Consider this (X)HTML:
<html xmlns="http://www.w3.org/1999/xhtml"      xmlns:chem="http://www.blueobelisk.org/chemistryblogs/"><head> <title>m1</title> <script type="text/javascript" src="sechemtic.js" /></head><body onload="addGoogleAndPubChemLinks(1,1)">  <h1>The Output</h1>  <p>This article is about <span class="chem:compound">m1</span>   (SMILES:<span class="chem:smiles">CCCOC</span>).</p></body></html>

I think the above example shows the simple setup of the Sechemtic Web script (please forgive me my habit to use bad linguistic mashups ;). Just load the script in the HTML <head>, and add in the onload="addGoogleAndPubChemLinks(1,1)" attribute to the <body> element. With blogs these bits would be part of the template, and, therefore, need to be installed once. From then on, just use the semantic markup as explained earlier. Both the microformat and the RDFa method are supported. In case of the latter, I recommend to define the chem namespace in the template of webpages too, instead of in the <span> elements.

<html><head> <title>m1</title> <script type="text/javascript" src="sechemtic.js" /></head><body onload="addGoogleAndPubChemLinks(1,1)">  <h1>The Output</h1>  <p>This article is about <span class="compound">m1</span>   (SMILES:<span class="smiles">CCCOC</span>).</p></body></html>