Normalize
Transformer classes for normalizing and transforming quantitative proteomics data.
This module defines various transformer classes for normalizing and scaling quantitative
values in tabular data. Examples include normalizers like median, mode, and LOWESS, as
well as scalers such as PercentageScaler and ZScoreScaler. A specialized
CategoricalNormalizer
is also provided, which, when appropriately fitted and applied,
can be used for complex transformations such as iBAQ or site-to-protein normalization.
These transformers can be fitted to a table containing quantitative values to learn parameters. Once fitted, they can then be applied to another table to adjust its values. The transformation returns a new copy of the table with the normalized/scaled values, leaving the original table unchanged.
Classes:
Name | Description |
---|---|
FixedValueNormalizer |
Normalization by a constant normalization factor for each sample. |
ValueDependentNormalizer |
Normalization with a value dependent fit for each sample. |
SumNormalizer |
Normalizer that uses the sum of all values in each sample for normalization. |
MedianNormalizer |
A FixedValueNormalizer that uses the median as the fitting function. |
ModeNormalizer |
A FixedValueNormalizer that uses the mode as the fitting function. |
LowessNormalizer |
A ValueDependentNormalizer that uses lowess as the fitting function. |
CategoricalNormalizer |
Normalize samples based on category-dependent reference values. |
PercentageScaler |
Transform column values to percentages by dividing them with the column sum. |
ZscoreScaler |
Normalize samples by z-score scaling. |
Log2Transformer |
Apply log2 transformation to column values. |
FixedValueNormalizer
Normalization by a constant normalization factor for each sample.
Expects log transformed intensity values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
center_function
|
Callable
|
A function that accepts a sequence of values and returns a center value such as the median. |
required |
comparison
|
str
|
Must be "paired" or "reference". When "paired" is specified the normalization values are first calculated for each column pair. Then an optimal normalization value for each column is calculated by solving a matrix of linear equations of the column pair values with least squares. When "reference" is selected, a pseudo-reference sample is generated by calculating the mean value for each row. Only rows with valid values in all columns are used. Normalization values are then calculated by comparing each column to the pseudo-reference sample. |
required |
Methods:
Name | Description |
---|---|
fit |
Fits the FixedValueNormalizer. |
is_fitted |
Returns True if the FixedValueNormalizer has been fitted. |
get_fits |
Returns a dictionary containing the fitted center values per sample. |
transform |
Applies a fixed value normalization to each column of the table. |
Source code in msreport\normalize.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
fit
fit(table: DataFrame) -> Self
Fits the FixedValueNormalizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe used to calculate normalization values for each column. The normalization values are stored with the column names. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the instance itself. |
Source code in msreport\normalize.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
is_fitted
is_fitted() -> bool
Returns True if the FixedValueNormalizer has been fitted.
Source code in msreport\normalize.py
84 85 86 |
|
get_fits
Returns a dictionary containing the fitted center values per sample.
Raises:
Type | Description |
---|---|
NotFittedError
|
If the FixedValueNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
88 89 90 91 92 93 94 95 |
|
transform
transform(table: DataFrame) -> DataFrame
Applies a fixed value normalization to each column of the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The data to normalize. Each column name must correspond to a column name from the table that was used for the fitting. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Transformed dataframe. |
Raises:
Type | Description |
---|---|
NotFittedError
|
If the FixedValueNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
ValueDependentNormalizer
Normalization with a value dependent fit for each sample.
Expects log transformed intensity values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fit_function
|
Callable[[Iterable, Iterable], ndarray]
|
A function that accepts two sequences of values with equal length, with the first sequence being the observed samples values and the second the reference values. The function must return a numpy array with two columns. The first column contains the values and the second column the fitted deviations. |
required |
Methods:
Name | Description |
---|---|
fit |
Fits the ValueDependentNormalizer. |
is_fitted |
Returns True if the ValueDependentNormalizer has been fitted. |
get_fits |
Returns a dictionary containing lists of fitting data per sample. |
transform |
Applies a value dependent normalization to each column of the table. |
Source code in msreport\normalize.py
171 172 173 174 175 176 177 178 179 180 181 182 |
|
fit
fit(table: DataFrame) -> Self
Fits the ValueDependentNormalizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe used to calculate normalization arrays for each column. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the instance itself. |
Source code in msreport\normalize.py
184 185 186 187 188 189 190 191 192 193 194 |
|
is_fitted
is_fitted() -> bool
Returns True if the ValueDependentNormalizer has been fitted.
Source code in msreport\normalize.py
196 197 198 |
|
get_fits
Returns a dictionary containing lists of fitting data per sample.
Returns:
Type | Description |
---|---|
dict[str, ndarray]
|
A dictionary mapping sample names to fitting data. Fitting data is sequence |
dict[str, ndarray]
|
of [itensity, deviation at this intensity] pairs. |
Raises:
Type | Description |
---|---|
NotFittedError
|
If the ValueDependentNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
transform
transform(table: DataFrame) -> DataFrame
Applies a value dependent normalization to each column of the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The data to normalize. Each column name must correspond to a column name from the table that was used for the fitting. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Transformed dataframe. |
Raises:
Type | Description |
---|---|
NotFittedError
|
If the ValueDependentNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
SumNormalizer
SumNormalizer()
Normalizer that uses the sum of all values in each sample for normalization.
Expects log2-transformed intensity values. To obtain normalization factors, the sum of non-log2-transformed values is calculated for each sample, then divided by the average of all sample sums and log2-transformed.
Methods:
Name | Description |
---|---|
fit |
Fits the SumNormalizer and returns a fitted instance. |
is_fitted |
Returns True if the Transformer has been fitted. |
get_fits |
Returns a dictionary containing the fitted center values per sample. |
transform |
Transform values in table. |
Source code in msreport\normalize.py
273 274 275 |
|
fit
fit(table: DataFrame) -> Self
Fits the SumNormalizer and returns a fitted instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Dataframe used to calculate normalization values for each column. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the instance itself. |
Source code in msreport\normalize.py
277 278 279 280 281 282 283 284 285 286 287 288 289 |
|
is_fitted
is_fitted() -> bool
Returns True if the Transformer has been fitted.
Source code in msreport\normalize.py
291 292 293 |
|
get_fits
Returns a dictionary containing the fitted center values per sample.
Raises:
Type | Description |
---|---|
NotFittedError
|
If the FixedValueNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
295 296 297 298 299 300 301 302 |
|
transform
transform(table: DataFrame) -> DataFrame
Transform values in table.
Source code in msreport\normalize.py
304 305 306 307 308 309 310 311 312 313 314 315 |
|
MedianNormalizer
MedianNormalizer()
Bases: FixedValueNormalizer
A FixedValueNormalizer that uses the median as the fitting function.
Use MedianNormalizer.fit(table: pd.DataFrame) to fit the normalizer, and then MedianNormalizer.transform(table: pd.DataFrame) with the fitted normalizer to apply the normalization.
Source code in msreport\normalize.py
326 327 328 329 330 |
|
ModeNormalizer
ModeNormalizer()
Bases: FixedValueNormalizer
A FixedValueNormalizer that uses the mode as the fitting function.
Use ModeNormalizer.fit(table: pd.DataFrame) to fit the normalizer, and then ModeNormalizer.transform(table: pd.DataFrame) with the fitted normalizer to apply the normalization.
Source code in msreport\normalize.py
341 342 343 344 345 |
|
LowessNormalizer
LowessNormalizer()
Bases: ValueDependentNormalizer
A ValueDependentNormalizer that uses lowess as the fitting function.
Use LowessNormalizer.fit(table: pd.DataFrame) to fit the normalizer, and then LowessNormalizer.transform(table: pd.DataFrame) with the fitted normalizer to apply the normalization.
Source code in msreport\normalize.py
356 357 358 |
|
CategoricalNormalizer
CategoricalNormalizer(category_column: str)
Normalize samples based on category-dependent reference values.
Values from the reference table are used for normalization of the corresponding categories in the table that will be transformed. The normalization is applied to each column of the input table based on the category of each row.
The reference table must not contain NaN values and values in the sample columns
must be log-transformed. The table to be transformed must contain the same
category_column
as the reference table and only include sample columns that were
used for fitting. Values from categories not present in the reference table will be
set to NaN. The table sample columns must also be log-transformed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
category_column
|
str
|
The name of the column containing the categories. This column must be present in the reference table and the table to be transformed. |
required |
Methods:
Name | Description |
---|---|
is_fitted |
Returns True if the CategoricalNormalizer has been fitted. |
fit |
Fits the CategoricalNormalizer to a reference table. |
get_fits |
Returns a copy of the reference table used for fitting. |
get_category_column |
Returns the name of the category column. |
transform |
Applies a category dependent normalization to the table. |
Source code in msreport\normalize.py
375 376 377 378 379 380 381 382 383 384 |
|
is_fitted
is_fitted() -> bool
Returns True if the CategoricalNormalizer has been fitted.
Source code in msreport\normalize.py
386 387 388 |
|
fit
fit(reference_table: DataFrame) -> Self
Fits the CategoricalNormalizer to a reference table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reference_table
|
DataFrame
|
The reference table used for fitting. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the instance itself. |
Raises:
Type | Description |
---|---|
ValueError
|
If the reference table contains NaN values. |
Source code in msreport\normalize.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 |
|
get_fits
get_fits() -> DataFrame
Returns a copy of the reference table used for fitting.
Raises:
Type | Description |
---|---|
NotFittedError
|
If the CategoricalNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
408 409 410 411 412 413 414 415 |
|
get_category_column
get_category_column() -> str
Returns the name of the category column.
Source code in msreport\normalize.py
417 418 419 |
|
transform
transform(table: DataFrame) -> DataFrame
Applies a category dependent normalization to the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The table to normalize. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The normalized table. |
Raises:
Type | Description |
---|---|
KeyError
|
If the input table contains columns not present in the reference table. |
NotFittedError
|
If the CategoricalNormalizer has not been fitted yet. |
Source code in msreport\normalize.py
421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 |
|
PercentageScaler
Transform column values to percentages by dividing them with the column sum.
Methods:
Name | Description |
---|---|
fit |
Returns the instance itself. |
is_fitted |
Always returns True because the Scaler does not need to be fitted. |
get_fits |
Returns an empty dictionary. |
transform |
Transforms column values into percentages by devision with the column sum. |
fit
fit(table: DataFrame) -> Self
Returns the instance itself.
Source code in msreport\normalize.py
459 460 461 |
|
is_fitted
is_fitted() -> bool
Always returns True because the Scaler does not need to be fitted.
Source code in msreport\normalize.py
463 464 465 |
|
get_fits
get_fits() -> dict
Returns an empty dictionary.
Source code in msreport\normalize.py
467 468 469 |
|
transform
transform(table: DataFrame) -> DataFrame
Transforms column values into percentages by devision with the column sum.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The table used to scale row values. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A copy of the table containing the scaled values. |
Source code in msreport\normalize.py
471 472 473 474 475 476 477 478 479 480 |
|
ZscoreScaler
Normalize samples by z-score scaling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
with_mean
|
bool
|
If True, center row values by subtracting the row mean. |
True
|
with_std
|
bool
|
If True, scale row values by dividing by the row std. |
True
|
Methods:
Name | Description |
---|---|
fit |
Returns the instance itself. |
is_fitted |
Always returns True because the ZscoreScaler does not need to be fitted. |
get_fits |
Returns a dictionary containing the parameters 'with_mean' and 'with_std'. |
transform |
Applies a z-score normalization to each column of the table. |
Source code in msreport\normalize.py
486 487 488 489 490 491 492 493 494 |
|
fit
fit(table: DataFrame) -> Self
Returns the instance itself.
Source code in msreport\normalize.py
496 497 498 |
|
is_fitted
is_fitted() -> bool
Always returns True because the ZscoreScaler does not need to be fitted.
Source code in msreport\normalize.py
500 501 502 |
|
get_fits
get_fits() -> dict
Returns a dictionary containing the parameters 'with_mean' and 'with_std'.
Source code in msreport\normalize.py
504 505 506 |
|
transform
transform(table: DataFrame) -> DataFrame
Applies a z-score normalization to each column of the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
The table used to scale row values. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A copy of the table containing the scaled values. |
Source code in msreport\normalize.py
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 |
|
Log2Transformer
Apply log2 transformation to column values.
Methods:
Name | Description |
---|---|
fit |
Returns the instance itself. |
is_fitted |
Returns True if the transformer is fitted. |
transform |
Applies a log2 transformation to each column of the table. |
fit
fit(table: DataFrame) -> Self
Returns the instance itself.
Source code in msreport\normalize.py
528 529 530 |
|
is_fitted
is_fitted() -> bool
Returns True if the transformer is fitted.
Source code in msreport\normalize.py
532 533 534 |
|
transform
transform(table: DataFrame) -> DataFrame
Applies a log2 transformation to each column of the table.
Zero values are replaced with NaN before the transformation to avoid an error during the log2 calculation.
Source code in msreport\normalize.py
536 537 538 539 540 541 542 |
|