Skip to content

Peptidoform

Defines the Peptide class and associated utilities for handling peptidoforms.

This module provides a Peptide class for representing modified peptide sequences, and their site localization probabilities. It offers methods to access and manipulate peptide information, summarize isoform probabilities, and retrieve modification sites. Additionally, it includes utility functions for parsing modified sequence strings and converting site localization probabilities to and from a standardized string format.

Classes:

Name Description
Peptide

Representation of a peptide sequence identified by mass spectrometry.

Functions:

Name Description
parse_modified_sequence

Returns the plain sequence and a list of modification positions and tags.

modify_peptide

Returns a string containing the modifications within the peptide sequence.

make_localization_string

Generates a site localization probability string.

read_localization_string

Converts a site localization probability string into a dictionary.

Peptide

Peptide(
    modified_sequence: str,
    localization_probabilities: Optional[
        dict[str, dict[int, float]]
    ] = None,
    protein_position: Optional[int] = None,
)

Representation of a peptide sequence identified by mass spectrometry.

Methods:

Name Description
make_modified_sequence

Returns a modified sequence string.

count_modification

Returns how often the a specified modification occurs.

isoform_probability

Calculates the isoform probability for a given modification.

get_peptide_site_probability

Return the modification localization probability of the peptide position.

get_protein_site_probability

Return the modification localization probability of the protein position.

list_modified_peptide_sites

Returns a list of peptide positions containing the specified modification.

list_modified_protein_sites

Returns a list of protein positions containing the specified modification.

Source code in msreport\peptidoform.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def __init__(
    self,
    modified_sequence: str,
    localization_probabilities: Optional[dict[str, dict[int, float]]] = None,
    protein_position: Optional[int] = None,
):
    plain_sequence, modifications = parse_modified_sequence(
        modified_sequence, "[", "]"
    )

    self.plain_sequence = plain_sequence
    self.modified_sequence = modified_sequence
    self.localization_probabilities = localization_probabilities
    self.protein_position = protein_position

    self.modification_positions = ddict(list)
    self.modified_residues = {}
    for position, mod_tag in modifications:
        self.modification_positions[mod_tag].append(position)
        self.modified_residues[position] = mod_tag

make_modified_sequence

make_modified_sequence(
    include: Optional[list[str]] = None,
) -> str

Returns a modified sequence string.

Parameters:

Name Type Description Default
include Optional[list[str]]

Optional, list of modifications that are included in the modified sequence string. By default all modifications are added.

None

Returns:

Type Description
str

A modified sequence string where modified amino acids are indicated by

str

square brackets containing a modification tag. For example

str

"PEPT[phospho]IDE"

Source code in msreport\peptidoform.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def make_modified_sequence(self, include: Optional[list[str]] = None) -> str:
    """Returns a modified sequence string.

    Args:
        include: Optional, list of modifications that are included in the modified
            sequence string. By default all modifications are added.

    Returns:
        A modified sequence string where modified amino acids are indicated by
        square brackets containing a modification tag. For example
        "PEPT[phospho]IDE"
    """
    if include is None:
        return self.modified_sequence

    selected_modifications = []
    for position, mod_tag in self.modified_residues.items():
        if mod_tag in include:
            selected_modifications.append((position, mod_tag))
    return modify_peptide(self.plain_sequence, selected_modifications)

count_modification

count_modification(modification: str) -> int

Returns how often the a specified modification occurs.

Source code in msreport\peptidoform.py
61
62
63
64
65
def count_modification(self, modification: str) -> int:
    """Returns how often the a specified modification occurs."""
    if modification not in self.modification_positions:
        return 0
    return len(self.modification_positions[modification])

isoform_probability

isoform_probability(modification: str) -> float | None

Calculates the isoform probability for a given modification.

Returns:

Type Description
float | None

The isoform probability for the combination of the assigned modification

float | None

sites. Calculated as the product of the single modification localization

float | None

probabilities. If no localization exist for the specified 'modification',

float | None

None is returned.

Source code in msreport\peptidoform.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def isoform_probability(self, modification: str) -> float | None:
    """Calculates the isoform probability for a given modification.

    Returns:
        The isoform probability for the combination of the assigned modification
        sites. Calculated as the product of the single modification localization
        probabilities. If no localization exist for the specified 'modification',
        None is returned.
    """
    probabilities = []
    for site in self.list_modified_peptide_sites(modification):
        probability = self.get_peptide_site_probability(site)
        if probability is None:
            return None
        probabilities.append(probability)
    return float(np.prod(probabilities))

get_peptide_site_probability

get_peptide_site_probability(position: int) -> float | None

Return the modification localization probability of the peptide position.

Parameters:

Name Type Description Default
position int

Peptide position which modification localization probability is returned.

required

Returns:

Type Description
float | None

Localization probability between 0 and 1. Returns None if the specified

float | None

position does not contain a modification or if no localization probability

float | None

is available.

Source code in msreport\peptidoform.py
84
85
86
87
88
89
90
91
92
93
94
95
96
def get_peptide_site_probability(self, position: int) -> float | None:
    """Return the modification localization probability of the peptide position.

    Args:
        position: Peptide position which modification localization probability is
            returned.

    Returns:
        Localization probability between 0 and 1. Returns None if the specified
        position does not contain a modification or if no localization probability
        is available.
    """
    return self._get_site_probability(position, is_protein_position=False)

get_protein_site_probability

get_protein_site_probability(position: int) -> float | None

Return the modification localization probability of the protein position.

Parameters:

Name Type Description Default
position int

Protein position which modification localization probability is returned.

required

Returns:

Type Description
float | None

Localization probability between 0 and 1. Returns None if the specified

float | None

position does not contain a modification or if no localization probability

float | None

is available.

Source code in msreport\peptidoform.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def get_protein_site_probability(self, position: int) -> float | None:
    """Return the modification localization probability of the protein position.

    Args:
        position: Protein position which modification localization probability is
            returned.

    Returns:
        Localization probability between 0 and 1. Returns None if the specified
        position does not contain a modification or if no localization probability
        is available.
    """
    return self._get_site_probability(position, is_protein_position=True)

list_modified_peptide_sites

list_modified_peptide_sites(modification: str) -> list[int]

Returns a list of peptide positions containing the specified modification.

Source code in msreport\peptidoform.py
112
113
114
def list_modified_peptide_sites(self, modification: str) -> list[int]:
    """Returns a list of peptide positions containing the specified modification."""
    return self._list_modified_sites(modification, use_protein_position=False)

list_modified_protein_sites

list_modified_protein_sites(modification: str) -> list[int]

Returns a list of protein positions containing the specified modification.

Source code in msreport\peptidoform.py
116
117
118
def list_modified_protein_sites(self, modification: str) -> list[int]:
    """Returns a list of protein positions containing the specified modification."""
    return self._list_modified_sites(modification, use_protein_position=True)

parse_modified_sequence

parse_modified_sequence(
    modified_sequence: str, tag_open: str, tag_close: str
) -> tuple[str, list[tuple[int, str]]]

Returns the plain sequence and a list of modification positions and tags.

Parameters:

Name Type Description Default
modified_sequence str

Peptide sequence containing modifications.

required
tag_open str

Symbol that indicates the beginning of a modification tag, e.g. "[".

required
tag_close str

Symbol that indicates the end of a modification tag, e.g. "]".

required

Returns:

Type Description
str

A tuple containing the plain sequence as a string and a sorted list of

list[tuple[int, str]]

modification tuples, each containing the position and modification tag

tuple[str, list[tuple[int, str]]]

(excluding the tag_open and tag_close symbols).

Source code in msreport\peptidoform.py
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
def parse_modified_sequence(
    modified_sequence: str,
    tag_open: str,
    tag_close: str,
) -> tuple[str, list[tuple[int, str]]]:
    """Returns the plain sequence and a list of modification positions and tags.

    Args:
        modified_sequence: Peptide sequence containing modifications.
        tag_open: Symbol that indicates the beginning of a modification tag, e.g. "[".
        tag_close: Symbol that indicates the end of a modification tag, e.g. "]".

    Returns:
        A tuple containing the plain sequence as a string and a sorted list of
        modification tuples, each containing the position and modification tag
        (excluding the tag_open and tag_close symbols).
    """
    start_counter = 0
    tags = []
    plain_sequence = ""
    for position, char in enumerate(modified_sequence):
        if char == tag_open:
            start_counter += 1
            if start_counter == 1:
                start_position = position
        elif char == tag_close:
            start_counter -= 1
            if start_counter == 0:
                tags.append((start_position, position))
        elif start_counter == 0:
            plain_sequence += char

    modifications = []
    last_position = 0
    for tag_start, tag_end in tags:
        mod_position = tag_start - last_position
        modification = modified_sequence[tag_start + 1 : tag_end]
        modifications.append((mod_position, modification))
        last_position += tag_end - tag_start + 1
    return plain_sequence, sorted(modifications)

modify_peptide

modify_peptide(
    sequence: str,
    modifications: list[tuple[int, str]],
    tag_open: str = "[",
    tag_close: str = "]",
) -> str

Returns a string containing the modifications within the peptide sequence.

Returns:

Type Description
str

Modified sequence. For example "PEPT[phospho]IDE", for sequence = "PEPTIDE" and

str

modifications = [(4, "phospho")]

Source code in msreport\peptidoform.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
def modify_peptide(
    sequence: str,
    modifications: list[tuple[int, str]],
    tag_open: str = "[",
    tag_close: str = "]",
) -> str:
    """Returns a string containing the modifications within the peptide sequence.

    Returns:
        Modified sequence. For example "PEPT[phospho]IDE", for sequence = "PEPTIDE" and
        modifications = [(4, "phospho")]
    """
    last_pos = 0
    modified_sequence = ""
    for pos, mod in sorted(modifications):
        tag = mod.join((tag_open, tag_close))
        modified_sequence += sequence[last_pos:pos] + tag
        last_pos = pos
    modified_sequence += sequence[last_pos:]
    return modified_sequence

make_localization_string

make_localization_string(
    localization_probabilities: dict[str, dict[int, float]],
    decimal_places: int = 3,
) -> str

Generates a site localization probability string.

Parameters:

Name Type Description Default
localization_probabilities dict[str, dict[int, float]]

A dictionary in the form {"modification tag": {position: probability}}, where positions are integers and probabilitiesa are floats ranging from 0 to 1.

required
decimal_places int

Number of decimal places used for the probabilities, default 3.

3

Returns:

Type Description
str

A site localization probability string according to the MsReport convention.

str

Multiple modifications entries are separted by ";". Each modification entry

str

consist of a modification tag and site probabilities, separated by "@". The

str

site probability entries consist of f"{position}:{probability}" strings, and

str

multiple probability entries are separted by ",".

str

For example "15.9949@11:1.000;79.9663@3:0.200,4:0.800"

Source code in msreport\peptidoform.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
def make_localization_string(
    localization_probabilities: dict[str, dict[int, float]], decimal_places: int = 3
) -> str:
    """Generates a site localization probability string.

    Args:
        localization_probabilities: A dictionary in the form
            {"modification tag": {position: probability}}, where positions are integers
            and probabilitiesa are floats ranging from 0 to 1.
        decimal_places: Number of decimal places used for the probabilities, default 3.

    Returns:
            A site localization probability string according to the MsReport convention.
            Multiple modifications entries are separted by ";". Each modification entry
            consist of a modification tag and site probabilities, separated by "@". The
            site probability entries consist of f"{position}:{probability}" strings, and
            multiple probability entries are separted by ",".

            For example "15.9949@11:1.000;79.9663@3:0.200,4:0.800"
    """
    modification_strings = []
    for modification, probabilities in localization_probabilities.items():
        localization_strings = []
        for position, probability in probabilities.items():
            probability_string = f"{probability:.{decimal_places}f}"
            localization_strings.append(f"{position}:{probability_string}")
        localization_string = ",".join(localization_strings)
        modification_strings.append(f"{modification}@{localization_string}")
    localization_string = ";".join(modification_strings)
    return localization_string

read_localization_string

read_localization_string(
    localization_string: str,
) -> dict[str, dict[int, float]]

Converts a site localization probability string into a dictionary.

Parameters:

Name Type Description Default
localization_string str

A site localization probability string according to the MsReport convention. Can contain information about multiple modifications, which are separted by ";". Each modification entry consist of a modification tag and site probabilities, separated by "@". The site probability entries consist of f"{peptide position}:{localization probability}" strings, and multiple entries are separted by ",". For example "15.9949@11:1.000;79.9663@3:0.200,4:0.800"

required

Returns:

Type Description
dict[str, dict[int, float]]

A dictionary in the form {"modification tag": {position: probability}}, where

dict[str, dict[int, float]]

positions are integers and probabilitiesa are floats ranging from 0 to 1.

Source code in msreport\peptidoform.py
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def read_localization_string(localization_string: str) -> dict[str, dict[int, float]]:
    """Converts a site localization probability string into a dictionary.

    Args:
        localization_string: A site localization probability string according to the
            MsReport convention. Can contain information about multiple modifications,
            which are separted by ";". Each modification entry consist of a modification
            tag and site probabilities, separated by "@". The site probability entries
            consist of f"{peptide position}:{localization probability}" strings, and
            multiple entries are separted by ",".
            For example "15.9949@11:1.000;79.9663@3:0.200,4:0.800"

    Returns:
        A dictionary in the form {"modification tag": {position: probability}}, where
        positions are integers and probabilitiesa are floats ranging from 0 to 1.
    """
    localization: dict[str, dict[int, float]] = {}
    if localization_string == "":
        return localization

    for modification_entry in localization_string.split(";"):
        modification, site_entries = modification_entry.split("@")
        site_probabilities = {}
        for site_entry in site_entries.split(","):
            position, probability = site_entry.split(":")
            site_probabilities[int(position)] = float(probability)
        localization[modification] = site_probabilities
    return localization