Impute
Transformer classes for imputing missing values in quantitative proteomics data.
This module defines transformer classes that can be fitted to a table containing quantitative values to learn imputation parameters. Once fitted, these transformers can then be applied to another table to transform it by filling in missing values. The transformation returns a new copy of the table with the imputed values, leaving the original table unchanged.
Classes:
Name | Description |
---|---|
FixedValueImputer |
Imputer for completing missing values with a fixed value. |
GaussianImputer |
Imputer for completing missing values by drawing from a gaussian distribution. |
PerseusImputer |
Imputer for completing missing values as implemented in Perseus. |
FixedValueImputer
Imputer for completing missing values with a fixed value.
Replace missing values using a constant value or with an integer that is smaller than the minimum value of each column or smaller than the minimum value of the whole array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
strategy
|
str
|
The imputation strategy. - If "constant", replace missing values with 'fill_value'. - If "below", replace missing values with an integer that is smaller than the minimal value of the fitted dataframe. Minimal values are calculated per column if 'column_wise' is True, otherwise the minimal value is calculated for all columns. |
required |
fill_value
|
float
|
When strategy is "constant", 'fill_value' is used to replace all occurrences of missing_values. |
0.0
|
column_wise
|
bool
|
If True, imputation is performed independently for each column, otherwise the whole dataframe is imputed togeter. Default True. |
True
|
Methods:
Name | Description |
---|---|
fit |
Fits the FixedValueImputer. |
is_fitted |
Returns True if the FixedValueImputer has been fitted. |
transform |
Impute all missing values in 'table'. |
Source code in msreport\impute.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
fit
fit(table: DataFrame) -> Self
Fits the FixedValueImputer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Input Dataframe for generating fill values for each column. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the fitted FixedValueImputer instance. |
Source code in msreport\impute.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
is_fitted
is_fitted() -> bool
Returns True if the FixedValueImputer has been fitted.
Source code in msreport\impute.py
75 76 77 |
|
transform
transform(table: DataFrame) -> DataFrame
Impute all missing values in 'table'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
A dataframe of numeric values that will be completed. Each column name must correspond to a column name from the table that was used for the fitting. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
'table' with imputed missing values. |
Source code in msreport\impute.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
GaussianImputer
Imputer for completing missing values by drawing from a gaussian distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mu
|
float
|
Mean of the gaussian distribution. |
required |
sigma
|
float
|
Standard deviation of the gaussian distribution, must be positive. |
required |
seed
|
Optional[int]
|
Optional, allows specifying a number for initializing the random number generator. Using the same seed for the same input table will generate the same set of imputed values each time. Default is None, which results in different imputed values being generated each time. |
None
|
Methods:
Name | Description |
---|---|
fit |
Fits the GaussianImputer, altough this is not necessary. |
is_fitted |
Returns always True, as the GaussianImputer does not need to be fitted. |
transform |
Impute all missing values in 'table'. |
Source code in msreport\impute.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
fit
fit(table: DataFrame) -> Self
Fits the GaussianImputer, altough this is not necessary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Input Dataframe for fitting. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the fitted GaussianImputer instance. |
Source code in msreport\impute.py
119 120 121 122 123 124 125 126 127 128 |
|
is_fitted
is_fitted() -> bool
Returns always True, as the GaussianImputer does not need to be fitted.
Source code in msreport\impute.py
130 131 132 |
|
transform
transform(table: DataFrame) -> DataFrame
Impute all missing values in 'table'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
A dataframe of numeric values that will be completed. Each column name must correspond to a column name from the table that was used for the fitting. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
'table' with imputed missing values. |
Source code in msreport\impute.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
PerseusImputer
PerseusImputer(
median_downshift: float = 1.8,
std_width: float = 0.3,
column_wise: bool = True,
seed: Optional[int] = None,
)
Imputer for completing missing values as implemented in Perseus.
Perseus-style imputation replaces missing values by random numbers drawn from a normal distribution. Sigma and mu of this distribution are calculated from the standard deviation and median of the observed values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
median_downshift
|
float
|
Times of standard deviations the observed median is downshifted for calulating mu of the normal distribution. Default is 1.8 |
1.8
|
std_width
|
float
|
Factor for adjusting the standard deviation of the observed values to obtain sigma of the normal distribution. Default is 0.3 |
0.3
|
column_wise
|
bool
|
If True, imputation is performed independently for each column, otherwise the whole dataframe is imputed togeter. Default True. |
True
|
seed
|
Optional[int]
|
Optional, allows specifying a number for initializing the random number generator. Using the same seed for the same input table will generate the same set of imputed values each time. Default is None, which results in different imputed values being generated each time. |
None
|
Methods:
Name | Description |
---|---|
fit |
Fits the PerseusImputer. |
is_fitted |
Returns True if the PerseusImputer has been fitted. |
transform |
Impute all missing values in 'table'. |
Source code in msreport\impute.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
|
fit
fit(table: DataFrame) -> Self
Fits the PerseusImputer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
Input Dataframe for calculating mu and sigma of the gaussian distribution. |
required |
Returns:
Type | Description |
---|---|
Self
|
Returns the fitted PerseusImputer instance. |
Source code in msreport\impute.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|
is_fitted
is_fitted() -> bool
Returns True if the PerseusImputer has been fitted.
Source code in msreport\impute.py
219 220 221 |
|
transform
transform(table: DataFrame) -> DataFrame
Impute all missing values in 'table'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table
|
DataFrame
|
A dataframe of numeric values that will be completed. Each column name must correspond to a column name from the table that was used for the fitting. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
'table' with imputed missing values. |
Source code in msreport\impute.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|