Analyze
Tools for post-processing and statistical analysis of Qtable
data.
All functions in this module take a Qtable
object and modify its data in place. The
module provides functionality for data evaluation, normalization, imputation of missing
values, and statistical testing, including integration with R's LIMMA package.
Classes:
Name | Description |
---|---|
Transformer |
|
CategoryTransformer |
|
Functions:
Name | Description |
---|---|
analyze_missingness |
Quantifies missing values of expression columns. |
validate_proteins |
Validates protein entries (rows). |
apply_transformer |
Applies a transformer to the values of a Qtable selected with the tag parameter. |
apply_category_transformer |
Apply a category transformer to Qtable columns selected by tag. |
normalize_expression |
Normalizes expression values in qtable. |
create_site_to_protein_normalizer |
Creates a fitted |
create_ibaq_transformer |
Creates a fitted |
normalize_expression_by_category |
Normalizes expression values in a Qtable based on categories. |
impute_missing_values |
Imputes missing expression values in qtable. |
calculate_experiment_means |
Calculates mean expression values for each experiment. |
calculate_multi_group_comparison |
Calculates average expression and ratios for multiple comparison groups. |
two_group_comparison |
Calculates comparison values for two experiments. |
calculate_multi_group_limma |
Uses limma to perform a differential expression analysis of multiple experiments. |
calculate_two_group_limma |
Uses limma to perform a differential expression analysis of two experiments. |
Transformer
Bases: Protocol
Methods:
Name | Description |
---|---|
fit |
Fits the Transformer and returns a fitted Transformer instance. |
is_fitted |
Returns True if the Transformer has been fitted. |
transform |
Transform values in 'table'. |
fit
fit(table: DataFrame) -> Self
Fits the Transformer and returns a fitted Transformer instance.
Source code in msreport\analyze.py
31 32 |
|
is_fitted
is_fitted() -> bool
Returns True if the Transformer has been fitted.
Source code in msreport\analyze.py
34 35 |
|
transform
transform(table: DataFrame) -> DataFrame
Transform values in 'table'.
Source code in msreport\analyze.py
37 38 |
|
CategoryTransformer
Bases: Protocol
Methods:
Name | Description |
---|---|
fit |
Fits the Transformer and returns a fitted Transformer instance. |
is_fitted |
Returns True if the Transformer has been fitted. |
transform |
Transform values in 'table'. |
get_category_column |
Returns the name of the category column. |
fit
fit(table: DataFrame) -> Self
Fits the Transformer and returns a fitted Transformer instance.
Source code in msreport\analyze.py
42 43 |
|
is_fitted
is_fitted() -> bool
Returns True if the Transformer has been fitted.
Source code in msreport\analyze.py
45 46 |
|
transform
transform(table: DataFrame) -> DataFrame
Transform values in 'table'.
Source code in msreport\analyze.py
48 49 |
|
get_category_column
get_category_column() -> str
Returns the name of the category column.
Source code in msreport\analyze.py
51 52 |
|
analyze_missingness
analyze_missingness(qtable: Qtable) -> None
Quantifies missing values of expression columns.
Adds additional columns to the qtable; for the number of missing values per sample "Missing sample_name", per experiment "Missing experiment_name" and in total "Missing total"; and for the number of quantification events per experiment "Events experiment_name" and in total "Events total".
Requires expression columns to be set. Missing values in expression columns must be present as NaN, and not as zero or an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance. |
required |
Source code in msreport\analyze.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
validate_proteins
validate_proteins(
qtable: Qtable,
min_peptides: int = 0,
min_spectral_counts: int = 0,
remove_contaminants: bool = True,
min_events: Optional[int] = None,
max_missing: Optional[int] = None,
) -> None
Validates protein entries (rows).
Adds an additional column "Valid" to the qtable, containing Boolean values.
Requires expression columns to be set. Depending on the arguments requires the columns "Total peptides", "Spectral count Combined", "Potential contaminant", and the experiment columns "Missing experiment_name" and "Events experiment_name".
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance. |
required |
min_peptides
|
int
|
Minimum number of unique peptides, default 0. |
0
|
min_spectral_counts
|
int
|
Minimum number of combined spectral counts, default 0. |
0
|
remove_contaminants
|
bool
|
If true, the "Potential contaminant" column is used to remove invalid entries, default True. If no "Potential contaminant" column is present 'remove_contaminants' is ignored. |
True
|
min_events
|
Optional[int]
|
If specified, at least one experiment must have the minimum number of quantified events for the protein entry to be valid. |
None
|
max_missing
|
Optional[int]
|
If specified, at least one experiment must have no more than the maximum number of missing values. |
None
|
Source code in msreport\analyze.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
apply_transformer
apply_transformer(
qtable: Qtable,
transformer: Transformer,
tag: str,
exclude_invalid: bool,
remove_invalid: bool,
new_tag: Optional[str] = None,
) -> None
Applies a transformer to the values of a Qtable selected with the tag parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, to which the transformer is applied. |
required |
transformer
|
Transformer
|
The transformer to apply. |
required |
tag
|
str
|
The tag used to identify the columns for applying the transformer. |
required |
exclude_invalid
|
bool
|
Exclude invalid values from the transformation. |
required |
remove_invalid
|
bool
|
Remove invalid values from the table after the transformation. |
required |
new_tag
|
Optional[str]
|
Optional, if specified than the tag is replaced with this value in the column names and the transformed data is stored to these new columns. |
None
|
Source code in msreport\analyze.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
apply_category_transformer
apply_category_transformer(
qtable: Qtable,
transformer: CategoryTransformer,
tag: str,
exclude_invalid: bool,
remove_invalid: bool,
new_tag: Optional[str] = None,
) -> None
Apply a category transformer to Qtable columns selected by tag.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, to which the transformer is applied. |
required |
transformer
|
CategoryTransformer
|
The CategoryTransformer to apply. |
required |
tag
|
str
|
The tag used to identify the columns for applying the transformer. |
required |
exclude_invalid
|
bool
|
Exclude invalid values from the transformation. |
required |
remove_invalid
|
bool
|
Remove invalid values from the table after the transformation. |
required |
new_tag
|
Optional[str]
|
Optional, if specified than the tag is replaced with this value in the column names and the transformed data is stored to these new columns. |
None
|
Raises:
Type | Description |
---|---|
KeyError
|
If the category column of the |
ValueError
|
If no sample columns are found for the specified tag. |
Source code in msreport\analyze.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
normalize_expression
normalize_expression(
qtable: Qtable,
normalizer: Transformer,
exclude_invalid: bool = True,
) -> None
Normalizes expression values in qtable.
Normalizes values present in the qtable expression columns, requires that expression columns are defined. The normalizer will be fit with the expression values if it has not been fitted already.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, which expression values will be normalized. |
required |
normalizer
|
Transformer
|
A Normalizer instance from the msreport.normalize module. Note that if an already fitted normalizer is passed, it has to be fitted with a dataframe which column names correspond to the sample names present in qtable.design. A not fitted normalizer is fitted with the expression values present in the qtable. |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to filter which expression rows are used for fitting a not fitted normalizer; default True. Independent of if exclude_invalid is True or False, all expression values will be normalized. |
True
|
Source code in msreport\analyze.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
|
create_site_to_protein_normalizer
create_site_to_protein_normalizer(
qtable: Qtable,
category_column: str = "Representative protein",
) -> CategoricalNormalizer
Creates a fitted CategoricalNormalizer
for site-to-protein normalization.
The CategoricalNormalizer
is fitted to protein expression profiles of the provided
qtable
. The protein expression profiles are calculated by subtracting the mean
expression value of each protein from the protein expression values. Expression
values must be log transformed. The generated CategoricalNormalizer
can be used to
normalize ion, peptide or site qtables based on protein categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
Qtable instance containing protein values for fitting the normalizer. |
required |
category_column
|
str
|
The name of the column containing the protein categories. |
'Representative protein'
|
Returns:
Type | Description |
---|---|
CategoricalNormalizer
|
A fitted |
Source code in msreport\analyze.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
|
create_ibaq_transformer
create_ibaq_transformer(
qtable: Qtable,
category_column: str = "Representative protein",
ibaq_column: str = "iBAQ peptides",
) -> CategoricalNormalizer
Creates a fitted CategoricalNormalizer
for iBAQ transformation.
The CategoricalNormalizer
is fitted to iBAQ peptide counts of the provided
qtable
, and can be used to transform protein intensities by dividing them by the
corresponding iBAQ peptide counts. Missing iBAQ peptide counts are replaced by 1 and
values smaller than 1 are replaced by 1. iBAQ peptide counts are then log2
transformed because the CategoryTransformer
expects log2 transformed values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
Qtable instance containing iBAQ peptide counts for fitting the normalizer. |
required |
category_column
|
str
|
The name of the column containing the protein categories. |
'Representative protein'
|
ibaq_column
|
str
|
The name of the column containing the iBAQ peptide counts. |
'iBAQ peptides'
|
Returns:
Type | Description |
---|---|
CategoricalNormalizer
|
A fitted |
Source code in msreport\analyze.py
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
normalize_expression_by_category
normalize_expression_by_category(
qtable: Qtable, normalizer: CategoryTransformer
) -> None
Normalizes expression values in a Qtable based on categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, which expression values will be normalized. |
required |
normalizer
|
CategoryTransformer
|
A |
required |
Raises:
Type | Description |
---|---|
KeyError
|
If the category column of the |
Source code in msreport\analyze.py
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 |
|
impute_missing_values
impute_missing_values(
qtable: Qtable,
imputer: Transformer,
exclude_invalid: bool = True,
) -> None
Imputes missing expression values in qtable.
Imputes missing values (nan) present in the qtable expression columns, requires that the qtable has defined expression columns. If the passed imputer object is not yet fitted, it will be fit with the expression values. If 'exclude_invalid' is True, only valid expression values will be used for fitting the imputer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, which missing expression values will be imputed. |
required |
imputer
|
Transformer
|
An Imputer instance from the msreport.impute module. Note that if an already fitted imputer is passed, it has to be fitted with a dataframe which column names correspond to the sample names present in qtable.design. A not fitted imputer is fitted with the expression values present in the qtable. |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to determine for which rows imputation is performed. Default True. |
True
|
Source code in msreport\analyze.py
417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 |
|
calculate_experiment_means
calculate_experiment_means(qtable: Qtable) -> None
Calculates mean expression values for each experiment.
Adds a new column "Expression experiment_name" for each experiment, containing the mean expression values of the corresponding samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, which mean experiment expression values will be calculated. |
required |
Source code in msreport\analyze.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 |
|
calculate_multi_group_comparison
calculate_multi_group_comparison(
qtable: Qtable,
experiment_pairs: Iterable[Iterable[str]],
exclude_invalid: bool = True,
) -> None
Calculates average expression and ratios for multiple comparison groups.
For each experiment pair, adds new columns "Average expression Experiment_1 vs Experiment_2" and "Ratio [log2] Experiment_1 vs Experiment_2" to the qtable. Expression values must be log transformed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
Qtable instance that contains expression values for calculating group comparisons. |
required |
experiment_pairs
|
Iterable[Iterable[str]]
|
A list containing one or multiple experiment pairs for which the group comparison should be calculated. The specified experiments must correspond to entries from qtable.design["Experiment"]. |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to determine which rows are used for calculating the group comparisons; default True. |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If 'experiment_pairs' contains invalid entries. Each experiment pair must have exactly two entries and the two entries must not be the same. All experiments must be present in qtable.design. No duplicate experiment pairs are allowed. |
Source code in msreport\analyze.py
478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 |
|
two_group_comparison
two_group_comparison(
qtable: Qtable,
experiment_pair: Iterable[str],
exclude_invalid: bool = True,
) -> None
Calculates comparison values for two experiments.
Adds new columns "Average expression Experiment_1 vs Experiment_2" and "Ratio [log2] Experiment_1 vs Experiment_2" to the qtable. Expects that expression values are log2 transformed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
A Qtable instance, containing expression values. |
required |
experiment_pair
|
Iterable[str]
|
The two experiments that will be compared, experiments must be present in qtable.design |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to determine for which rows comparison values are calculated. |
True
|
Source code in msreport\analyze.py
536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 |
|
calculate_multi_group_limma
calculate_multi_group_limma(
qtable: Qtable,
experiment_pairs: Sequence[Iterable[str]],
exclude_invalid: bool = True,
batch: bool = False,
limma_trend: bool = True,
) -> None
Uses limma to perform a differential expression analysis of multiple experiments.
For each experiment pair specified in 'experiment_pairs' the following new columns are added to the qtable: - "P-value Experiment_1 vs Experiment_2" - "Adjusted p-value Experiment_1 vs Experiment_2" - "Average expression Experiment_1 vs Experiment_2" - "Ratio [log2] Experiment_1 vs Experiment_2"
Requires that expression columns are set, and expression values are log2 transformed All rows with missing values are ignored, impute missing values to allow differential expression analysis of all rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
Qtable instance that contains expression values for differential expression analysis. |
required |
experiment_pairs
|
Sequence[Iterable[str]]
|
A list containing lists of experiment pairs for which the results of the differential expression analysis should be reported. The specified experiment pairs must correspond to entries from qtable.design["Experiment"]. |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to determine which rows are used for the differential expression analysis; default True. |
True
|
batch
|
bool
|
If true batch effects are considered for the differential expression analysis. Batches must be specified in the design in a "Batch" column. |
False
|
limma_trend
|
bool
|
If true, an intensity-dependent trend is fitted to the prior variance during calculation of the moderated t-statistics, refer to limma.eBayes for details; default True. |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If 'experiment_pairs' contains invalid entries. Each experiment pair must have exactly two entries and the two entries must not be the same. All experiments must be present in qtable.design. No duplicate experiment pairs are allowed. |
KeyError
|
If the "Batch" column is not present in the qtable.design when 'batch' is set to True. |
ValueError
|
If all values from qtable.design["Batch"] are identical when 'batch' is set to True. |
Source code in msreport\analyze.py
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 |
|
calculate_two_group_limma
calculate_two_group_limma(
qtable: Qtable,
experiment_pair: Sequence[str],
exclude_invalid: bool = True,
limma_trend: bool = True,
) -> None
Uses limma to perform a differential expression analysis of two experiments.
Adds new columns "P-value Experiment_1 vs Experiment_2", "Adjusted p-value Experiment_1 vs Experiment_2", "Average expression Experiment_1 vs Experiment_2", and "Ratio [log2] Experiment_1 vs Experiment_2" to the qtable.
Requires that expression columns are set, and expression values are log2 transformed. All rows with missing values are ignored, impute missing values to allow differential expression analysis of all rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qtable
|
Qtable
|
Qtable instance that contains expression values for differential expression analysis. |
required |
experiment_pair
|
Sequence[str]
|
The names of the two experiments that will be compared, experiments must be present in qtable.design |
required |
exclude_invalid
|
bool
|
If true, the column "Valid" is used to determine which rows are used for the differential expression analysis; default True. |
True
|
limma_trend
|
bool
|
If true, an intensity-dependent trend is fitted to the prior variances; default True. |
True
|
Raises: ValueError: If 'experiment_pair' contains invalid entries. The experiment pair must have exactly two entries and the two entries must not be the same. Both experiments must be present in qtable.design.
Source code in msreport\analyze.py
663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 |
|