Uncategorized

Trying to work with the NCBI’s Entrez api using python.


Trying to work with the NCBI’s Entrez api using python.

1

Hello, I’m currently working with biopython’s ‘Entrez’ library and finding it very frustrating and lacking proper documentation, I’m just trying to find all the sequencing data for lacY in e.coli and download it into SeqIO.

https://www.ncbi.nlm.nih.gov/gene/949083

from Bio import Entrez, SeqIO
# Search for e.coli lacY gene and find id's (GenBank Accession Numbers)
handle = Entrez.esearch(db='nucleotide', retmax=10, term='Escherichia coli[Orgn] AND lacY[Gene]') 
record = Entrez.read(handle)
id_list = record['IdList']
# Efetch genbank data
handle1 = Entrez.efetch(db='nucleotide', id=id_list[0], rettype="gb", retmode="text")
print(handle1.read())
record1 = SeqIO.read(handle1, "genbank")
print(record1.seq)
# Error occurs, "ValueError: No records found in handle"

The library is able to download the id_lists, and printing out the handle read it’s found the link provided, but it can’t download the actually fasta data from it. I’m interested to know if anyone else has been able to solve this programmatically, I could always download the fasta files manually but this was only a test run for a larger project I’m working on.

Thanks!


biopython


python


NCBI

• 88 views

You throw away your results in this line:

print(handle1.read())

handle1 is a generator, so the next time you call handle1.read(), you get nothing back (empty string). It’s designed this way in case you get millions of sequences back, it reads only one at a time.

Do this if you’re sure it’s only one sequence:

from io import StringIO

handle1 = Entrez.efetch(db='nucleotide', id=id_list[0], rettype="gb", retmode="text")
res = handle1.read()
record1 = SeqIO.read(StringIO(res), 'genbank')


Login
before adding your answer.

Traffic: 2021 users visited in the last hour



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *