Friday, 17 February 2012
Google Refine: a powerful tool also for genomic data?
Managing and working with genomic data such as taxonomic databases or next generation sequencing data is a challenging and tortuous field. One of the worst problems is to compile and put together different data sets in an easy and fast way, and, why not, also in a comprehensible and intuitive fashion. As experience teach us, some softwares that seem to be the right ones to use, turned out to be not the best ones. For example, as one of us wrote on this blog some days ago, Excel shows difficulties and embarrassing limits when formatting a text with some gene names, potentially leading to mistakes in the bioinformatic analysis. Fortunately, a number of new softwares have been recently developed to help us in the bioinformatic data analysis; one of them is VarSifter (we had a post about it few weeks ago), a very useful tool to manage next generation sequencing data and to analyze genomic variants.
Here we want to speak about a new software called Google Refine, developed by Google "for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase".
Google Refine is not specific for bioinformatic data but it is widely dynamic and can support different file formats such as TSV, Excel, CSV, and XML. It offers a huge amount of intruments that allow to manage your data in almost infinite ways. It would be impossible to try to explain here all the potential applications of Google Refine. We suggest to personally try this tool and we are sure that you will find it extremely powerful and user friendly at the same time!
Here is a link to the YouTube video describing the main features of Google Refine.