k

Biopython

Description

The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology.

Features

Biopython FAQ (based on actual questions)

Why should I use Biopython and not Bioperl?
You should use what is comformatble to you. If you are a Perl guru (or monk), feel free to use Bioperl, it's your choice.

Why did you chose Biopython?
Because I program in Python so it is natural to do my bioinformatics work using Biopython.

Why do you use Python?
I already pointed out some Python advantages, but most important Python feature (IMHO) is readbility. I can easily read other's people code (even my own code!).

Jonathan Hilmer contributed with his opinion on this matter:
My work involves a lot of modeling and visualizations of protein sequences and structures: both Chimera and PyMol (interactive 3D protein modeling software) use Python. For more advanced and customized work there is Blender (general 3D modeling and animation), which also uses Python.
For dealing with large amounts of data, or numerical analysis, Scipy and/or Numpy are available and easily integrated. There are also examples of plotting packages, but I think most languages have pretty good plotting options.
Python can be used as an interface to R (statistics package, the primary community standard), and you can write Python code to run via Java with Jython. Sage (uses Python) is a mathematics software package which ties together many useful tools, both Python and non-Python.
There's sometime to be said for using the right tool for the job, but when you're a scientist first and a programmer second, there is a limit to the number of languages with which you can become proficient. If you can only use or learn one language, I think Python is by far the most useful and accessible.

Python runs always as a web page?
No. This is just a demo and I am showing Python and Biopython in a web page because I feel it is cool for a presentation, but it is not the standard way to run a Python script. In most cases this is done with a text editor.

Do you need a special editor for Python?
No. Any decent text editor will do it (no, Notepad is not a decent text editor).

Where can I retrieve Biopython?
Biopython download page is a good start if you dare to make manual installation. Software repository in your Linux may be a easier choice, but chances are that the package is already outdated. Another option is using easy_install:
# easy_install biopython

I am a Perl user and I find easier to code a biological function rather using Bioperl. Will I have the same problem with Biopython?
For some functions, you may be tempted to make your own code instead of using provided tools. My advice is to spend some time learning how to use what is already done instead of reinvent the wheel. It pays off at the end of the day.

Does Biopython works on recent Python versions?
Yes, Biopython works with Python 2.6. Python 3.0 is still new and some libraries are not ported yet some it doesn't make sense to have Biopython for Python 3.0 at this time.

Biopython for protein analysis

Retrieve a sequence

Retrieve the sequence for the Citrobacter freundii ampR gene. This code requires a working internet connection.
from Bio import Entrez

handle = Entrez.esearch(db="Nucleotide",
         term='"Citrobacter freundii ampR"[TITL]')
record = Entrez.read(handle)
for gid in record["IdList"]:
    msumm = Entrez.esummary(db="Nucleotide",id=gid)
    summ = Entrez.read(msumm)
    print "GI:%s, Title:%s"%(gid,summ[0]['Title'])
#Uncomment to see the GenBank file:
#handle = Entrez.efetch(db="Nucleotide", id="736668", rettype='gb')
#print handle.read()

Save it as a Genbank file

This script download the gene with GI 736668 from Entrez. It requires a working internet connection.
from Bio import Entrez

handle = Entrez.efetch(db="Nucleotide", id="736668", rettype='gb')
tmpfilename = 'ampRdna.gb'
tmp_fh = open(tmpfilename,'w')
tmp_fh.write(handle.read())
tmp_fh.close()

Read a Genbank file

This script reads the file generated in previous code snippet.
from Bio import SeqIO

handle = open('ampRdna.gb') # File created in previous script.
seq_record = SeqIO.read(handle, "genbank")
print 'Comment:',seq_record.annotations['comment']
print 'Taxonomy:',seq_record.annotations['taxonomy']
print 'Accessions:', seq_record.annotations['accessions']
print 'Organism:',seq_record.annotations['organism']
print "Description:",seq_record.description
#print "feat",seq_record.features
print "id:",seq_record.id
print "Gene name:",seq_record.name
print "Sequence:",seq_record.seq