Regex Match Not Working On Simple String With Pyteomics Parser
I am performing an in silico digestion of the human proteome, meaning that I am trying to chopped the amino acid sequence of every protein at a certain position. I am using the Pyt
Solution 1:
Pyteomics maintainer here.
The error message actually tells you the source of the problem: PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: {'sequence': 'AKDEVQKN'}"
It means that instead of a string 'AKDEVQKN'
you passed a dictionary {'sequence': 'AKDEVQKN'}
. This actually happens here:
pep_dic = [{'sequence': i} foriin unique_peptides]
forpeptidesin pep_dic:
pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False)
...
You should pass the sequence itself to parse
, not the dict:
pep_dic['parsed_sequence'] = parser.parse(peptides['sequence'], show_unmodified_termini=False)
Solution 2:
Attempted to use their valid function to test all peptides before I ran the parser. I couldn't find any false in my string. I am now looking at their function or my own.
> for peptide in menu["Peptide"]:> x=parser.valid(peptide)> if x == False:> print(peptide)> break> else:> print(x)
Solution 3:
Not a solution but some analysis...
In the following simple case example code, 'AKDEVQKN' does match using the regex in the post.
import re
line = 'AKDEVQKN'
pat = re.compile(r'^([^-]+-)?((?:[^A-Z-]*[A-Z])+)(-[^-]+)?$')
x = re.match(pat, line)
if x:
print(x)
print(x.group())
print(x.groups())
outputs:
<re.Match object; span=(0, 8), match='AKDEVQKN'>
AKDEVQKN
(None, 'AKDEVQKN', None)
That suggests that the issue is somewhere else in the code.
- is 'AKDEVQKN' the complete line or is there more?
- Is it possible that _modX_sequence has been changed by the time re.match is called with sequence 'AKDEVQKN'?
To check, temporarily change
~\Anaconda\envs\SciFly\lib\site-packages\pyteomics\parser.py
at line 312 from:
try:
n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError:
to
try:
if sequence == 'AKDEVQKN':
print("DEBUG: ", sequence, _modX_sequence)
# or drop into a debugger, pdb or iPython's # import pdb; pdb.set_trace()# dir()
n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError:
Post a Comment for "Regex Match Not Working On Simple String With Pyteomics Parser"