Recently I have been tidying up data for my research projects in NUS. This process of dealing with a few TBs of data in one day made me slightly paranoid of the integrity of the data: where should they be stored, which archiving + compresssion protocal should be used, which local/remote file transferring algorithms should be used and even what kind of media - should they be transferred via USB or ethernet.
I am writing this post not as a guideline, but mainly for self-reference and hopefully a prompt for discussion.
The boom of bioinformatics in recent years is coupled with cheaper technologies and consequently the surge of the amount of data available. The rapid development of the field itself is an anti-estblishment movement - even the most experienced bioinformaticians must spend a significant amount of time getting updated with the resources and toolkits.
Here are some of the life savers that are not commonly introduced in standard bioinformatics curriculum. I think these toolkits encapsulate my understanding of the spirit of programming: there must be an easier way to do it.
To Compare directory and text files Although unfavorable, sometimes it happens that one project directory is duplicated and the progress of analysis differs between each other. The tools that I have found to be very helpful for comparing different directories and files so that you can merge the two directory into one and keep the most updated files from each folder is Meld.