Helper
A collection of widely used helper and utility functions.
This module re-exports commonly used functions from various msreport.helper
submodules for convenience.
Functions:
Name | Description |
---|---|
apply_intensity_cutoff |
Sets values below the threshold to NA. |
find_columns |
Returns a list column names containing the substring. |
find_sample_columns |
Returns column names that contain the substring and any entry of 'samples'. |
guess_design |
Extracts sample name, experiment, and replicate from specified sample columns. |
intensities_in_logspace |
Evaluates whether intensities are likely to be log transformed. |
keep_rows_by_partial_match |
Filter a table to keep only rows partially matching any of the specified values. |
remove_rows_by_partial_match |
Filter a table to remove rows partially matching any of the specified values. |
rename_mq_reporter_channels |
Renames reporter channel numbers with sample names. |
rename_sample_columns |
Renames sample names according to the mapping in a cautious manner. |
apply_intensity_cutoff
Sets values below the threshold to NA.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe to which the protein annotations are added. |
required |
column_tag
|
str
|
Substring used to identify intensity columns from the 'table' to which the intensity cutoff is applied. |
required |
threshold
|
float
|
Values below the treshold will be set to NA. |
required |
Source code in msreport\helper\table.py
131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
find_columns
Returns a list column names containing the substring.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Columns of this datafram are queried. |
required |
substring
|
str
|
String that must be part of column names. |
required |
must_be_substring
|
bool
|
If true than column names are not reported if they are exactly equal to the substring. |
False
|
Returns:
Type | Description |
---|---|
list[str]
|
A list of column names. |
Source code in msreport\helper\table.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
find_sample_columns
Returns column names that contain the substring and any entry of 'samples'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Columns of this dataframe are queried. |
required |
substring
|
str
|
String that must be part of column names. |
required |
samples
|
Iterable[str]
|
List of strings from which at least one must be present in matched columns. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
A list of column names containing the substring and any entry of 'samples'. |
list[str]
|
Columns are returned in the order of entries in 'samples'. |
Source code in msreport\helper\table.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
guess_design
guess_design(table: DataFrame, tag: str) -> DataFrame
Extracts sample name, experiment, and replicate from specified sample columns.
"Total" and "Combined", and their lower case variants, are not allowed as sample names and will be ignored.
First a subset of columns containing a column tag are identified. Then sample names are extracted by removing the column tag from each column name. And finally, sample names are split into experiment and replicate at the last underscore.
This requires that the naming of samples follows a specific convention. Sample names must begin with the experiment name, followed by an underscore and a unique identifier of the sample, for example the replicate number. The experiment name can also contain underscores, as it is split only by the last underscore.
For example "ExpA_r1" would be split into experiment "ExpA" and replicate "r1", "Exp_A_1" would be experiment "Exp_A" and replicate "1".
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe which columns are used for extracting sample names. |
required |
tag
|
str
|
Column names containing the 'tag' are selected for sample extraction. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A dataframe containing the columns "Sample", "Experiment", and "Replicate" |
Source code in msreport\helper\table.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
intensities_in_logspace
Evaluates whether intensities are likely to be log transformed.
Assumes that intensities are log transformed if all values are smaller or equal to 64. Intensities values (and intensity peak areas) reported by tandem mass spectrometry typically range from 10^3 to 10^12. To reach log2 transformed values greater than 64, intensities would need to be higher than 10^19, which seems to be very unlikely to be ever encountered.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Union[DataFrame, ndarray, Iterable]
|
Dataset that contains only intensity values, can be any iterable, a numpy.array or a pandas.DataFrame, multiple dimensions or columns are allowed. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if intensity values in 'data' appear to be log transformed. |
Source code in msreport\helper\table.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
keep_rows_by_partial_match
Filter a table to keep only rows partially matching any of the specified values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The input table that will be filtered. |
required |
column
|
str
|
The name of the column in the 'table' which entries are checked for partial matches to the values. This column must have the datatype 'str'. |
required |
modifications
|
An iterable of strings that are used to filter the table. Any of the specified values must have at least a partial match to an entry from the specified 'column' for a row to be kept in the filtered table. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame containing only the rows that have a partial or complete match |
DataFrame
|
with any of the specified 'values'. |
Example
df = pd.DataFrame({"Modifications": ["phos", "acetyl;phos", "acetyl"]}) keep_rows_by_partial_match(df, "Modifications", ["phos"]) Modifications 0 phos 1 acetyl;phos
Source code in msreport\helper\table.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
|
remove_rows_by_partial_match
Filter a table to remove rows partially matching any of the specified values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The input table that will be filtered. |
required |
column
|
str
|
The name of the column in the 'table' which entries are checked for partial matches to the values. This column must have the datatype 'str'. |
required |
modifications
|
An iterable of strings that are used to filter the table. Any of the specified values must have at least a partial match to an entry from the specified 'column' for a row to be removed in the filtered table. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame containing no rows that have a partial or complete match with |
DataFrame
|
any of the specified 'values'. |
Example
df = pd.DataFrame({"Modifications": ["phos", "acetyl;phos", "acetyl"]}) remove_rows_by_partial_match(df, "Modifications", ["phos"]) Modifications 2 acetyl
Source code in msreport\helper\table.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
|
rename_mq_reporter_channels
Renames reporter channel numbers with sample names.
MaxQuant writes reporter channel names either in the format "Reporter intensity 1" or "Reporter intensity 1 Experiment Name", dependent on whether an experiment name was specified. Renames "Reporter intensity", "Reporter intensity count", and "Reporter intensity corrected" columns.
NOTE: This might not work for the peptides.txt table, as there are columns present with the experiment name and also without it.
Source code in msreport\helper\table.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
rename_sample_columns
Renames sample names according to the mapping in a cautious manner.
In general, this function allows the use of 'mapping' with keys that are substrings of any other keys, as well as values that are substrings of any of the keys.
Importantly, if the mapping keys (sample names) are substrings of other column names within the table, unintended renaming of those columns will occur. For instance, when renaming columns ["Abundance", "Intensity A"] with the mapping {"A": "Sample Alpha"}, the columns will be renamed to ["Sample Alphabundance", "Intensity Sample Alpha"].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe which columns will be renamed. |
required |
mapping
|
dict[str, str]
|
A mapping of old to new sample names that will be used to replace matching substrings in the columns from table. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A copy of the table with renamed columns. |
Source code in msreport\helper\table.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|