Skip to content Skip to sidebar Skip to footer

Regex Match Not Working On Simple String With Pyteomics Parser

I am performing an in silico digestion of the human proteome, meaning that I am trying to chopped the amino acid sequence of every protein at a certain position. I am using the Pyt

Solution 1:

Pyteomics maintainer here.

The error message actually tells you the source of the problem: PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: {'sequence': 'AKDEVQKN'}"

It means that instead of a string 'AKDEVQKN' you passed a dictionary {'sequence': 'AKDEVQKN'}. This actually happens here:

pep_dic = [{'sequence': i} foriin unique_peptides]
forpeptidesin pep_dic:
    pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False)
    ...

You should pass the sequence itself to parse, not the dict:

pep_dic['parsed_sequence'] = parser.parse(peptides['sequence'], show_unmodified_termini=False)

Solution 2:

Attempted to use their valid function to test all peptides before I ran the parser. I couldn't find any false in my string. I am now looking at their function or my own.

> for peptide in menu["Peptide"]:>     x=parser.valid(peptide)> if x == False:> print(peptide)> break> else:> print(x)

Solution 3:

Not a solution but some analysis...

In the following simple case example code, 'AKDEVQKN' does match using the regex in the post.

import re

line = 'AKDEVQKN'

pat = re.compile(r'^([^-]+-)?((?:[^A-Z-]*[A-Z])+)(-[^-]+)?$')

x = re.match(pat, line)

if x:
    print(x)
    print(x.group())
    print(x.groups())

outputs:

<re.Match object; span=(0, 8), match='AKDEVQKN'>
AKDEVQKN
(None, 'AKDEVQKN', None)

That suggests that the issue is somewhere else in the code.

  • is 'AKDEVQKN' the complete line or is there more?
  • Is it possible that _modX_sequence has been changed by the time re.match is called with sequence 'AKDEVQKN'? To check, temporarily change ~\Anaconda\envs\SciFly\lib\site-packages\pyteomics\parser.py at line 312 from:
try:
  n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError: 

to

try:
  if sequence == 'AKDEVQKN':
    print("DEBUG: ", sequence, _modX_sequence)
    # or drop into a debugger, pdb or iPython's # import pdb; pdb.set_trace()# dir() 
  n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError:

Post a Comment for "Regex Match Not Working On Simple String With Pyteomics Parser"