Skip to content

Plot

Plotting functions for visualizing proteomics data from Qtable.

The functions in this module generate a wide range of plots, including heatmaps, PCA plots, volcano plots, and histograms, to analyze and compare expression values, missingness, contaminants, and other features in proteomics datasets. The plots are designed to work with the Qtable class as input, which provides structured access to proteomics data and experimental design information.

Users can customize plot styles via the set_active_style function, which allows applying style sheets from the msreport library or those available in matplotlib.

Functions:

Name Description
expression_comparison

Generates an expression comparison plot for two experiments.

pvalue_histogram

Generates p-value histograms for one or multiple experiment comparisons.

volcano_ma

Generates a volcano and an MA plot for the comparison of two experiments.

experiment_ratios

Figure to compare the similarity of expression values between experiments.

replicate_ratios

Figure to compare the similarity of expression values between replicates.

expression_clustermap

Plot sample expression values as a hierarchically-clustered heatmap.

sample_pca

Figure to compare sample similarities with a principle component analysis.

contaminants

A bar plot that displays relative contaminant amounts (iBAQ) per sample.

missing_values_horizontal

Horizontal bar plot to analyze the completeness of quantification.

missing_values_vertical

Vertical bar plot to analyze the completeness of quantification.

sample_correlation

Generates a pair-wise correlation matrix of samples 'Expression' values.

sample_intensities

Figure to compare the overall quantitative similarity of samples.

set_active_style

Set the active plotting style for the msreport.plot submodule.

set_dpi

Changes the default dots per inch settings for matplotlib plots.

expression_comparison

expression_comparison(
    qtable: Qtable,
    experiment_pair: list[str],
    comparison_tag: str = " vs ",
    plot_average_expression: bool = False,
    special_entries: Optional[list[str]] = None,
    special_proteins: Optional[list[str]] = None,
    annotation_column: str = "Gene name",
    exclude_invalid: bool = True,
) -> tuple[Figure, list[Axes]]

Generates an expression comparison plot for two experiments.

The subplot in the middle displays the average expression of the two experiments on the y-axis and the log fold change on the x-axis. The subplots on the left and right display entries with only missing values in one of the two experiments.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
experiment_pair list[str]

The names of the two experiments that will be compared, experiments must be present in qtable.design.

required
comparison_tag str

String used in comparison columns to separate a pair of experiments; default " vs ", which corresponds to the MsReport convention.

' vs '
plot_average_expression bool

If True plot average expression instead of maxium expression. Default False.

False
special_entries Optional[list[str]]

Optional, allows to specify a list of entries from the qtable.id_column column to be annotated.

None
special_proteins Optional[list[str]]

This argument is deprecated, use 'special_entries' instead.

None
annotation_column str

Column used for labeling the points of special entries in the scatter plot. Default "Gene name". If the 'annotation_column' is not present in the qtable.data table, the qtable.id_column is used instead.

'Gene name'
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Raises:

Type Description
ValueError

If the "Expression" and "Events" columns for the specified experiments are missing in the Qtable.

Returns:

Type Description
Figure

A matplotlib Figure objects and a list of three Axes objects containing the

list[Axes]

expression comparison plots.

Source code in msreport\plot\comparison.py
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
@with_active_style
def expression_comparison(
    qtable: Qtable,
    experiment_pair: list[str],
    comparison_tag: str = " vs ",
    plot_average_expression: bool = False,
    special_entries: Optional[list[str]] = None,
    special_proteins: Optional[list[str]] = None,
    annotation_column: str = "Gene name",
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Generates an expression comparison plot for two experiments.

    The subplot in the middle displays the average expression of the two experiments on
    the y-axis and the log fold change on the x-axis. The subplots on the left and right
    display entries with only missing values in one of the two experiments.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        experiment_pair: The names of the two experiments that will be compared,
            experiments must be present in qtable.design.
        comparison_tag: String used in comparison columns to separate a pair of
            experiments; default " vs ", which corresponds to the MsReport convention.
        plot_average_expression: If True plot average expression instead of maxium
            expression. Default False.
        special_entries: Optional, allows to specify a list of entries from the
            `qtable.id_column` column to be annotated.
        special_proteins: This argument is deprecated, use 'special_entries' instead.
        annotation_column: Column used for labeling the points of special entries in the
            scatter plot. Default "Gene name". If the 'annotation_column' is not present
            in the `qtable.data` table, the `qtable.id_column` is used instead.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Raises:
        ValueError: If the "Expression" and "Events" columns for the specified
            experiments are missing in the Qtable.

    Returns:
        A matplotlib Figure objects and a list of three Axes objects containing the
        expression comparison plots.
    """
    missing_columns = []
    for exp in experiment_pair:
        for tag in ["Expression", "Events"]:
            if f"{tag} {exp}" not in qtable.data.columns:
                missing_columns.append(f"{tag} {exp}")
    missing_columns = sorted(missing_columns)
    if missing_columns:
        raise ValueError(
            f"Missing the required columns in the Qtable: {missing_columns}."
        )

    if special_entries is None:
        special_entries = []
    if special_proteins is not None:
        warnings.warn(
            "The argument 'special_proteins' is deprecated, use 'special_entries' instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        special_entries = list(special_entries) + list(special_proteins)

    exp_1, exp_2 = experiment_pair
    comparison_group = comparison_tag.join(experiment_pair)

    qtable_data = qtable.get_data(exclude_invalid=exclude_invalid)
    if annotation_column not in qtable_data.columns:
        annotation_column = qtable.id_column
    total_scatter_area = 5000
    params = {
        "highlight": {
            "s": 10,
            "color": "#E73C40",
            "edgecolor": "#000000",
            "lw": 0.2,
            "zorder": 3,
        },
        "default": {"alpha": 0.75, "color": "#40B7B5", "zorder": 2},
    }

    mask = (qtable_data[f"Events {exp_1}"] + qtable_data[f"Events {exp_2}"]) > 0
    qtable_data = qtable_data[mask]

    only_exp_1 = qtable_data[f"Events {exp_2}"] == 0
    only_exp_2 = qtable_data[f"Events {exp_1}"] == 0
    mask_both = np.invert(np.any([only_exp_1, only_exp_2], axis=0))

    # Test if plotting maximum intensity is better than average
    qtable_data[f"Maximum expression {comparison_group}"] = np.max(
        [qtable_data[f"Expression {exp_2}"], qtable_data[f"Expression {exp_1}"]], axis=0
    )
    qtable_data[f"Average expression {comparison_group}"] = np.nanmean(
        [qtable_data[f"Expression {exp_2}"], qtable_data[f"Expression {exp_1}"]], axis=0
    )

    def scattersize(df: pd.DataFrame, total_area) -> float:
        if len(values) > 0:
            size = min(max(np.sqrt(total_area / df.shape[0]), 0.5), 4)
        else:
            size = 1
        return size

    suptitle_space_inch = 0.4
    ax_height_inch = 3.2
    main_ax_width_inch = 3.2
    side_ax_width_inch = 0.65
    ax_wspace_inch = 0.6
    width_ratios = [side_ax_width_inch, main_ax_width_inch, side_ax_width_inch]

    fig_height = suptitle_space_inch + ax_height_inch
    fig_width = main_ax_width_inch + (side_ax_width_inch + ax_wspace_inch) * 2
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_wspace = ax_wspace_inch / np.mean(width_ratios)

    fig, axes = plt.subplots(
        1,
        3,
        figsize=fig_size,
        sharey=True,
        gridspec_kw={
            "bottom": 0,
            "top": subplot_top,
            "left": 0,
            "right": 1,
            "wspace": subplot_wspace,
            "width_ratios": width_ratios,
        },
    )
    fig.suptitle(f'Comparison of "Expression": {comparison_group}', y=1)

    # Plot values quantified in both experiments
    ax = axes[1]
    values = qtable_data[mask_both]
    s = scattersize(values, total_scatter_area)
    x_variable = "Ratio [log2]"
    y_variable = (
        "Average expression" if plot_average_expression else "Maximum expression"
    )
    x_col = " ".join([x_variable, comparison_group])
    y_col = " ".join([y_variable, comparison_group])
    x_values = values[x_col]
    y_values = values[y_col]
    ax.grid(axis="both", linestyle="dotted")
    ax.scatter(x_values, y_values, s=s, **params["default"])
    highlight_mask = values[qtable.id_column].isin(special_entries)
    annotated_scatter(
        x_values=x_values[highlight_mask],
        y_values=y_values[highlight_mask],
        labels=values[annotation_column][highlight_mask],
        ax=ax,
        scatter_kws=params["highlight"],
    )

    ax.set_xlabel(x_variable)
    ax.set_ylabel(y_variable)

    # Plot values quantified only in one experiment
    for ax, mask, exp in [(axes[2], only_exp_1, exp_1), (axes[0], only_exp_2, exp_2)]:
        y_variable = f"Expression {exp}"
        values = qtable_data[mask]
        highlight_mask = values[qtable.id_column].isin(special_entries)
        s = scattersize(values, total_scatter_area)

        ax.grid(axis="y", linestyle="dotted")
        ax.set_ylabel(y_variable)

        if len(values) == 0:
            continue

        sns.stripplot(
            y=values[y_variable],
            jitter=True,
            size=np.sqrt(s * 2),
            marker="o",
            edgecolor="none",
            ax=ax,
            **params["default"],
        )

        xlim = -0.2, 0.2
        ax.set_xlim(xlim)
        offsets = ax.collections[0].get_offsets()[highlight_mask]
        annotated_scatter(
            x_values=offsets[:, 0],
            y_values=offsets[:, 1],
            labels=values[annotation_column][highlight_mask],
            ax=ax,
            scatter_kws=params["highlight"],
        )
        ax.set_xlim(xlim)

    # Important to reverse the order here which experiment is on the left and right
    axes[0].set_xlabel(f"Absent in\n{exp_1}")
    axes[2].set_xlabel(f"Absent in\n{exp_2}")

    return fig, axes

pvalue_histogram

pvalue_histogram(
    qtable: Qtable,
    pvalue_tag: str = "P-value",
    comparison_tag: str = " vs ",
    experiment_pairs: Optional[
        Sequence[Iterable[str]]
    ] = None,
    exclude_invalid: bool = True,
) -> tuple[Figure, list[Axes]]

Generates p-value histograms for one or multiple experiment comparisons.

Histograms are generated with 20 bins of size 0.05. The p-value distribution of each experiment comparison is shown with a separate subplot.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
pvalue_tag str

String used for matching the pvalue columns; default "P-value", which corresponds to the MsReport convention.

'P-value'
comparison_tag str

String used in comparison columns to separate a pair of experiments; default " vs ", which corresponds to the MsReport convention.

' vs '
experiment_pairs Optional[Sequence[Iterable[str]]]

Optional, list of experiment pairs that will be used for plotting. For each experiment pair a p-value column must exists that follows the format f"{pvalue_tag} {experiment_1}{comparison_tag}{experiment_2}". If None, all experiment comparisons that are found in qtable.data are used.

None
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Raises:

Type Description
ValueError

If no experiment pairs are found in the Qtable for the provided p-value tag and comparison tag or if any of the provided experiment pairs does not exist in the Qtable.

Returns:

Type Description
tuple[Figure, list[Axes]]

A matplotlib Figure and a list of Axes objects, containing the p-value plots.

Source code in msreport\plot\comparison.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
@with_active_style
def pvalue_histogram(
    qtable: Qtable,
    pvalue_tag: str = "P-value",
    comparison_tag: str = " vs ",
    experiment_pairs: Optional[Sequence[Iterable[str]]] = None,
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Generates p-value histograms for one or multiple experiment comparisons.

    Histograms are generated with 20 bins of size 0.05. The p-value distribution of each
    experiment comparison is shown with a separate subplot.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        pvalue_tag: String used for matching the pvalue columns; default "P-value",
            which corresponds to the MsReport convention.
        comparison_tag: String used in comparison columns to separate a pair of
            experiments; default " vs ", which corresponds to the MsReport convention.
        experiment_pairs: Optional, list of experiment pairs that will be used for
            plotting. For each experiment pair a p-value column must exists that follows
            the format f"{pvalue_tag} {experiment_1}{comparison_tag}{experiment_2}".
            If None, all experiment comparisons that are found in qtable.data are used.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Raises:
        ValueError: If no experiment pairs are found in the Qtable for the provided
            p-value tag and comparison tag or if any of the provided experiment pairs
            does not exist in the Qtable.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the p-value plots.
    """
    data = qtable.get_data(exclude_invalid=exclude_invalid)

    def _get_valid_experiment_pairs(
        pairs: Iterable[Iterable[str]],
    ) -> list[Iterable[str]]:
        valid_pairs = []
        for pair in pairs:
            comparison_group = comparison_tag.join(pair)
            comparison_column = f"{pvalue_tag} {comparison_group}"
            if comparison_column in data.columns:
                valid_pairs.append(pair)
        return valid_pairs

    # Find all experiment pairs
    if experiment_pairs is not None:
        valid_pairs = _get_valid_experiment_pairs(experiment_pairs)
        invalid_pairs = list(set(experiment_pairs) - set(valid_pairs))
        if invalid_pairs:
            raise ValueError(
                "The following provided experiment pairs do not exist in the Qtable: "
                f"{invalid_pairs}"
            )
    else:
        experiment_pairs = _get_valid_experiment_pairs(
            itertools.permutations(qtable.get_experiments(), 2)
        )
        if not experiment_pairs:
            raise ValueError(
                "No experiment pairs found in the Qtable for p-value tag "
                f"'{pvalue_tag}' and comparison tag '{comparison_tag}'."
            )

    num_plots = len(experiment_pairs)

    suptitle_space_inch = 0.4
    ax_height_inch = 1.8
    ax_width_inch = 1
    ax_wspace_inch = 0.6

    fig_width = num_plots * ax_width_inch + (num_plots - 1) * ax_wspace_inch
    fig_height = ax_height_inch + suptitle_space_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_wspace = ax_wspace_inch / ax_width_inch

    fig, axes = plt.subplots(1, num_plots, figsize=fig_size, sharex=True, sharey=True)
    if num_plots == 1:
        axes = np.array([axes])
    else:
        axes = np.array(axes)
    fig.subplots_adjust(
        bottom=0, top=subplot_top, left=0, right=1, wspace=subplot_wspace
    )
    fig.suptitle("P-value histogram of pair-wise experiment comparisons", y=1)

    bins = np.arange(0, 1.01, 0.05)
    for ax_pos, experiment_pair in enumerate(experiment_pairs):  # type: ignore
        comparison_group = comparison_tag.join(experiment_pair)
        comparison_column = f"{pvalue_tag} {comparison_group}"
        comparison_label = f"{comparison_tag}\n".join(experiment_pair)
        p_values = data[comparison_column]

        ax = axes[ax_pos]
        ax2 = ax.twinx()
        ax2.set_yticks([])
        ax2.set_ylabel(comparison_label)

        ax.hist(
            p_values,
            bins=bins,
            color=None,
            edgecolor="#215e5d",
            linewidth=1.5,
            zorder=2,
        )
        ax.hist(
            p_values,
            bins=bins,
            color="#40B7B5",
            edgecolor=None,
            linewidth=0,
            zorder=2.1,
        )

        ax.set_xticks([0, 0.5, 1])
        # Need to remove the ticks manually because creating the twin axis somehow
        # overrides the rcParams settings.
        ax.tick_params(
            left=plt.rcParams["ytick.left"], right=plt.rcParams["ytick.right"]
        )
        ax.set_xlabel(pvalue_tag)
        ax.grid(False, axis="x")
        sns.despine(top=True, right=True)

    axes[0].set_ylabel(f"{pvalue_tag} count")
    ax.set_xlim(-0.05, 1.05)

    return fig, axes

volcano_ma

volcano_ma(
    qtable: Qtable,
    experiment_pair: Iterable[str],
    comparison_tag: str = " vs ",
    pvalue_tag: str = "P-value",
    special_entries: Optional[list[str]] = None,
    special_proteins: Optional[list[str]] = None,
    annotation_column: str = "Gene name",
    exclude_invalid: bool = True,
) -> tuple[Figure, list[Axes]]

Generates a volcano and an MA plot for the comparison of two experiments.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
experiment_pair Iterable[str]

The names of the two experiments that will be compared, experiments must be present in qtable.design.

required
comparison_tag str

String used in comparison columns to separate a pair of experiments; default " vs ", which corresponds to the MsReport convention.

' vs '
pvalue_tag str

String used for matching the pvalue columns; default "P-value", which corresponds to the MsReport convention.

'P-value'
special_entries Optional[list[str]]

Optional, allows to specify a list of entries from the qtable.id_column column to be annotated.

None
special_proteins Optional[list[str]]

This argument is deprecated, use 'special_entries' instead.

None
annotation_column str

Column used for labeling the points of special entries in the scatter plot. Default "Gene name". If the 'annotation_column' is not present in the qtable.data table, the qtable.id_column is used instead.

'Gene name'
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Raises:

Type Description
ValueError

If the 'pvalue_tag', "Average expression" or "Ratio [log2]" column is missing in the Qtable for the specified experiment_pair.

Returns:

Type Description
Figure

A matplotlib Figure object and a list of two Axes objects containing the volcano

list[Axes]

and the MA plot.

Source code in msreport\plot\comparison.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
@with_active_style
def volcano_ma(
    qtable: Qtable,
    experiment_pair: Iterable[str],
    comparison_tag: str = " vs ",
    pvalue_tag: str = "P-value",
    special_entries: Optional[list[str]] = None,
    special_proteins: Optional[list[str]] = None,
    annotation_column: str = "Gene name",
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Generates a volcano and an MA plot for the comparison of two experiments.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        experiment_pair: The names of the two experiments that will be compared,
            experiments must be present in qtable.design.
        comparison_tag: String used in comparison columns to separate a pair of
            experiments; default " vs ", which corresponds to the MsReport convention.
        pvalue_tag: String used for matching the pvalue columns; default "P-value",
            which corresponds to the MsReport convention.
        special_entries: Optional, allows to specify a list of entries from the
            `qtable.id_column` column to be annotated.
        special_proteins: This argument is deprecated, use 'special_entries' instead.
        annotation_column: Column used for labeling the points of special entries in the
            scatter plot. Default "Gene name". If the 'annotation_column' is not present
            in the `qtable.data` table, the `qtable.id_column` is used instead.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Raises:
        ValueError: If the 'pvalue_tag', "Average expression" or "Ratio [log2]" column
            is missing in the Qtable for the specified experiment_pair.

    Returns:
        A matplotlib Figure object and a list of two Axes objects containing the volcano
        and the MA plot.
    """
    ratio_tag = "Ratio [log2]"
    expression_tag = "Average expression"
    comparison_group = comparison_tag.join(experiment_pair)

    for tag in [ratio_tag, expression_tag, pvalue_tag]:
        tag_column = msreport.helper.find_sample_columns(
            qtable.data, comparison_group, [tag]
        )
        if not tag_column:
            raise ValueError(
                f"Missing the required '{tag}' column for the comparison group "
                f"'{comparison_group}' in the Qtable."
            )

    if special_entries is None:
        special_entries = []
    if special_proteins is not None:
        warnings.warn(
            "The argument 'special_proteins' is deprecated, use 'special_entries' instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        special_entries = list(special_entries) + list(special_proteins)

    data = qtable.get_data(exclude_invalid=exclude_invalid)
    if annotation_column not in data.columns:
        annotation_column = qtable.id_column

    scatter_size = 2 / (max(min(data.shape[0], 10000), 1000) / 1000)

    masks = {
        "highlight": data[qtable.id_column].isin(special_entries),
        "default": ~data[qtable.id_column].isin(special_entries),
    }
    params = {
        "highlight": {
            "s": 10,
            "color": "#E73C40",
            "edgecolor": "#000000",
            "lw": 0.2,
            "zorder": 3,
        },
        "default": {"s": scatter_size, "color": "#40B7B5", "zorder": 2},
    }

    for column in msreport.helper.find_sample_columns(
        data, pvalue_tag, [comparison_group]
    ):
        data[column] = np.log10(data[column]) * -1

    suptitle_space_inch = 0.4
    ax_height_inch = 3.2
    ax_width_inch = 3.2
    ax_wspace_inch = 1

    fig_height = suptitle_space_inch + ax_height_inch
    fig_width = ax_width_inch * 2 + ax_wspace_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_wspace = ax_wspace_inch / ax_width_inch

    fig, axes = plt.subplots(1, 2, figsize=fig_size, sharex=True)
    fig.subplots_adjust(
        bottom=0, top=subplot_top, left=0, right=1, wspace=subplot_wspace
    )
    fig.suptitle(f"Volcano and MA plot of: {comparison_group}", y=1)

    for ax, x_variable, y_variable in [
        (axes[0], ratio_tag, pvalue_tag),
        (axes[1], ratio_tag, expression_tag),
    ]:
        x_col = " ".join([x_variable, comparison_group])
        y_col = " ".join([y_variable, comparison_group])
        x_values = data[x_col]
        y_values = data[y_col]
        xy_labels = data[annotation_column]

        valid_values = np.isfinite(x_values) & np.isfinite(y_values)
        mask_default = masks["default"] & valid_values
        mask_special = masks["highlight"] & valid_values

        ax.scatter(x_values[mask_default], y_values[mask_default], **params["default"])
        annotated_scatter(
            x_values=x_values[mask_special],
            y_values=y_values[mask_special],
            labels=xy_labels[mask_special],
            ax=ax,
            scatter_kws=params["highlight"],
        )

        ax.set_xlabel(x_variable)
        if y_variable == pvalue_tag:
            ax.set_ylabel(f"{y_variable} [-log10]")
        else:
            ax.set_ylabel(f"{y_variable} [log2]")
        ax.grid(axis="both", linestyle="dotted")

    return fig, axes

experiment_ratios

experiment_ratios(
    qtable: Qtable,
    experiments: Optional[str] = None,
    exclude_invalid: bool = True,
    ylim: Sequence[float] = (-2, 2),
) -> tuple[Figure, list[Axes]]

Figure to compare the similarity of expression values between experiments.

Intended to evaluate the bulk distribution of expression values after normalization. For each experiment a subplot is generated, which displays the distribution of log2 ratios to a pseudo reference experiment as a density plot. The pseudo reference values are calculated as the average intensity values of all experiments. Only rows with quantitative values in all experiment are considered.

Requires "Events experiment" columns and that average experiment expression values are calculated. This can be achieved by calling msreport.analyze.analyze_missingness(qtable: Qtable) and msreport.analyze.calculate_experiment_means(qtable: Qtable).

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
experiments Optional[str]

Optional, list of experiments that will be displayed. If None, all experiments from qtable.design will be used.

None
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True
ylim Sequence[float]

Specifies the displayed range for the log2 ratios on the y-axis. Default is from -2 to 2.

(-2, 2)

Raises:

Type Description
ValueError

If only one experiment is specified in the experiments parameter or if the specified experiments are not present in the qtable design.

Returns:

Type Description
tuple[Figure, list[Axes]]

A matplotlib Figure and a list of Axes objects, containing the comparison plots.

Source code in msreport\plot\distribution.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
@with_active_style
def experiment_ratios(
    qtable: Qtable,
    experiments: Optional[str] = None,
    exclude_invalid: bool = True,
    ylim: Sequence[float] = (-2, 2),
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Figure to compare the similarity of expression values between experiments.

    Intended to evaluate the bulk distribution of expression values after normalization.
    For each experiment a subplot is generated, which displays the distribution of log2
    ratios to a pseudo reference experiment as a density plot. The pseudo reference
    values are calculated as the average intensity values of all experiments. Only rows
    with quantitative values in all experiment are considered.

    Requires "Events experiment" columns and that average experiment expression values
    are calculated. This can be achieved by calling
    `msreport.analyze.analyze_missingness(qtable: Qtable)` and
    `msreport.analyze.calculate_experiment_means(qtable: Qtable)`.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        experiments: Optional, list of experiments that will be displayed. If None, all
            experiments from `qtable.design` will be used.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.
        ylim: Specifies the displayed range for the log2 ratios on the y-axis. Default
            is from -2 to 2.

    Raises:
        ValueError: If only one experiment is specified in the `experiments` parameter
            or if the specified experiments are not present in the qtable design.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the comparison plots.
    """
    tag: str = "Expression"

    if experiments is not None and len(experiments) == 1:
        raise ValueError(
            "Only one experiment is specified, please provide at least two experiments."
        )
    elif experiments is not None:
        experiments_not_in_design = set(experiments) - set(qtable.design["Experiment"])
        if experiments_not_in_design:
            raise ValueError(
                "All experiments must be present in qtable.design. The following "
                f"experiments are not present: {experiments_not_in_design}"
            )
    else:
        experiments = qtable.design["Experiment"].unique().tolist()

    if len(experiments) < 2:
        fig, ax = plt.subplots(1, 1, figsize=(2.5, 1.3))
        fig.suptitle("Comparison of experiments means", y=1.1)
        ax.text(
            0.5,
            0.5,
            "Comparison not possible.\nOnly one experiment\npresent in design.",
            ha="center",
            va="center",
        )
        ax.grid(False)
        ax.tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
        sns.despine(top=False, right=False, fig=fig)
        return fig, np.array([ax])

    sample_data = qtable.make_sample_table(tag, samples_as_columns=True)
    experiment_means = {}
    for experiment in experiments:
        samples = qtable.get_samples(experiment)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
            row_means = np.nanmean(sample_data[samples], axis=1)
        experiment_means[experiment] = row_means
    experiment_data = pd.DataFrame(experiment_means)

    # Only consider rows with quantitative values in all experiments
    mask = np.all([(qtable.data[f"Events {exp}"] > 0) for exp in experiments], axis=0)
    if exclude_invalid:
        mask = mask & qtable["Valid"]
    # Use `mask.to_numpy` to solve issue with different indices of mask and dataframe
    experiment_data = experiment_data[mask.to_numpy()]
    pseudo_reference = np.nanmean(experiment_data, axis=1)
    ratio_data = experiment_data.subtract(pseudo_reference, axis=0)

    color_wheel = ColorWheelDict()
    for exp in qtable.design["Experiment"].unique():
        _ = color_wheel[exp]
    num_experiments = len(experiments)

    suptitle_space_inch = 0.55
    ax_height_inch = 1.25
    ax_width_inch = 0.65
    ax_wspace_inch = 0.2
    fig_height = ax_height_inch + suptitle_space_inch
    fig_width = num_experiments * ax_width_inch + (num_experiments - 1) * ax_wspace_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_wspace = ax_wspace_inch / ax_width_inch

    fig, axes = plt.subplots(1, num_experiments, figsize=fig_size, sharey=True)
    fig.subplots_adjust(
        bottom=0, top=subplot_top, left=0, right=1, wspace=subplot_wspace
    )
    fig.suptitle("Comparison of experiments means", y=1)

    for exp_pos, experiment in enumerate(experiments):
        ax = axes[exp_pos]
        values = ratio_data[experiment]
        color = color_wheel[experiment]
        sns.kdeplot(y=values, fill=True, ax=ax, zorder=3, color=color, alpha=0.5)
        if exp_pos == 0:
            ax.set_title(
                f"n={str(len(values))}",
                fontsize=plt.rcParams["xtick.labelsize"],
                loc="left",
            )
        ax.tick_params(labelbottom=False)
        ax.set_xlabel(experiment, rotation=90)

    axes[0].set_ylabel("Ratio [log2]\nto pseudo reference")
    axes[0].set_ylim(ylim)
    for ax in axes:
        ax.axhline(y=0, color="#999999", lw=1, zorder=2)
        ax.grid(False, axis="x")
    sns.despine(top=True, right=True, fig=fig)
    return fig, axes

replicate_ratios

replicate_ratios(
    qtable: Qtable,
    exclude_invalid: bool = True,
    xlim: Iterable[float] = (-2, 2),
) -> tuple[Figure, list[Axes]]

Figure to compare the similarity of expression values between replicates.

Displays the distribution of pair-wise log2 ratios between samples of the same experiment. Comparisons of the same experiment are placed in the same row. Requires log2 transformed expression values.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True
xlim Iterable[float]

Specifies the displayed range for the log2 ratios on the x-axis. Default is from -2 to 2.

(-2, 2)

Returns:

Type Description
tuple[Figure, list[Axes]]

A matplotlib Figure and a list of Axes objects, containing the comparison plots.

Source code in msreport\plot\distribution.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
@with_active_style
def replicate_ratios(
    qtable: Qtable,
    exclude_invalid: bool = True,
    xlim: Iterable[float] = (-2, 2),
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Figure to compare the similarity of expression values between replicates.

    Displays the distribution of pair-wise log2 ratios between samples of the same
    experiment. Comparisons of the same experiment are placed in the same row. Requires
    log2 transformed expression values.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.
        xlim: Specifies the displayed range for the log2 ratios on the x-axis. Default
            is from -2 to 2.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the comparison plots.
    """
    tag: str = "Expression"
    table = qtable.make_sample_table(
        tag, samples_as_columns=True, exclude_invalid=exclude_invalid
    )
    design = qtable.get_design()

    color_wheel = ColorWheelDict()
    for exp in design["Experiment"].unique():
        _ = color_wheel[exp]

    experiments = []
    for experiment in design["Experiment"].unique():
        if len(qtable.get_samples(experiment)) >= 2:
            experiments.append(experiment)
    if not experiments:
        fig, ax = plt.subplots(1, 1, figsize=(2, 1.3))
        fig.suptitle("Pair wise comparison of replicates", y=1.1)
        ax.text(0.5, 0.5, "No replicate\ndata available", ha="center", va="center")
        ax.grid(False)
        ax.tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
        sns.despine(top=False, right=False, fig=fig)
        return fig, np.array([ax])

    num_experiments = len(experiments)
    max_replicates = max([len(qtable.get_samples(exp)) for exp in experiments])
    max_combinations = len(list(itertools.combinations(range(max_replicates), 2)))

    suptitle_space_inch = 0.55
    ax_height_inch = 0.6
    ax_width_inch = 1.55
    ax_hspace_inch = 0.35
    fig_height = (
        num_experiments * ax_height_inch
        + (num_experiments - 1) * ax_hspace_inch
        + suptitle_space_inch
    )
    fig_width = max_combinations * ax_width_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_hspace = ax_hspace_inch / ax_height_inch

    fig, axes = plt.subplots(
        num_experiments, max_combinations, figsize=fig_size, sharex=True
    )
    if num_experiments == 1 and max_combinations == 1:
        axes = np.array([[axes]])
    elif num_experiments == 1:
        axes = np.array([axes])
    elif max_combinations == 1:
        axes = np.array([axes]).T
    fig.subplots_adjust(
        bottom=0, top=subplot_top, left=0, right=1, hspace=subplot_hspace
    )
    fig.suptitle("Pair wise comparison of replicates", y=1)

    for x_pos, experiment in enumerate(experiments):
        sample_combinations = itertools.combinations(qtable.get_samples(experiment), 2)
        for y_pos, (s1, s2) in enumerate(sample_combinations):
            s1_label = design.loc[(design["Sample"] == s1), "Replicate"].tolist()[0]
            s2_label = design.loc[(design["Sample"] == s2), "Replicate"].tolist()[0]
            ax = axes[x_pos, y_pos]
            ratios = table[s1] - table[s2]
            ratios = ratios[np.isfinite(ratios)]
            ylabel = experiment if y_pos == 0 else ""
            title = f"{s1_label} vs {s2_label}"
            color = color_wheel[experiment]

            sns.kdeplot(x=ratios, fill=True, ax=ax, zorder=3, color=color, alpha=0.5)
            ax.set_title(title, fontsize=plt.rcParams["axes.labelsize"])
            ax.set_ylabel(ylabel, rotation=0, va="center", ha="right")
            ax.set_xlabel("Ratio [log2]")
            ax.tick_params(labelleft=False)
            ax.locator_params(axis="x", nbins=5)

    axes[0, 0].set_xlim(xlim)
    for ax in axes.flatten():
        if not ax.has_data():
            ax.remove()
            continue

        ax.axvline(x=0, color="#999999", lw=1, zorder=2)
        ax.grid(False, axis="y")
    sns.despine(top=True, right=True, fig=fig)

    return fig, axes

expression_clustermap

expression_clustermap(
    qtable: Qtable,
    exclude_invalid: bool = True,
    remove_imputation: bool = True,
    mean_center: bool = False,
    cluster_samples: bool = True,
    cluster_method: str = "average",
) -> ClusterGrid

Plot sample expression values as a hierarchically-clustered heatmap.

By default missing and imputed values are assigned an intensity value of 0 to perform the clustering. Once clustering is done, these values are removed from the heatmap, making them appear white.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True
remove_imputation bool

If True, imputed values are set to 0 before clustering. Defaults to True.

True
mean_center bool

If True, the data is mean-centered before clustering. Defaults to False.

False
cluster_samples bool

If True, sample order is determined by hierarchical clustering. Otherwise, the order is determined by the order of samples in the qtable design. Defaults to True.

True
cluster_method str

Linkage method to use for calculating clusters. See scipy.cluster.hierarchy.linkage documentation for more information.

'average'

Raises:

Type Description
ValueError

If less than two samples are present in the qtable.

Returns:

Type Description
ClusterGrid

A seaborn ClusterGrid instance. Note that ClusterGrid has a savefig method

ClusterGrid

that can be used for saving the figure.

Source code in msreport\plot\multivariate.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
@with_active_style
def expression_clustermap(
    qtable: Qtable,
    exclude_invalid: bool = True,
    remove_imputation: bool = True,
    mean_center: bool = False,
    cluster_samples: bool = True,
    cluster_method: str = "average",
) -> sns.matrix.ClusterGrid:
    """Plot sample expression values as a hierarchically-clustered heatmap.

    By default missing and imputed values are assigned an intensity value of 0 to
    perform the clustering. Once clustering is done, these values are removed from the
    heatmap, making them appear white.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.
        remove_imputation: If True, imputed values are set to 0 before clustering.
            Defaults to True.
        mean_center: If True, the data is mean-centered before clustering. Defaults to
            False.
        cluster_samples: If True, sample order is determined by hierarchical clustering.
            Otherwise, the order is determined by the order of samples in the qtable
            design. Defaults to True.
        cluster_method: Linkage method to use for calculating clusters. See
            `scipy.cluster.hierarchy.linkage` documentation for more information.

    Raises:
        ValueError: If less than two samples are present in the qtable.

    Returns:
        A seaborn ClusterGrid instance. Note that ClusterGrid has a `savefig` method
        that can be used for saving the figure.
    """
    tag: str = "Expression"
    samples = qtable.get_samples()
    experiments = qtable.get_experiments()

    if len(samples) < 2:
        raise ValueError("At least two samples are required to generate a clustermap.")

    data = qtable.make_expression_table(samples_as_columns=True)
    data = data[samples]

    for sample in samples:
        if remove_imputation:
            data.loc[qtable.data[f"Missing {sample}"], sample] = 0
        data[sample] = data[sample].fillna(0)

    if not mean_center:
        # Hide missing values in the heatmap, making them appear white
        mask_values = qtable.data[
            [f"Missing {sample}" for sample in samples]
        ].to_numpy()
    else:
        mask_values = np.zeros(data.shape, dtype=bool)

    if exclude_invalid:
        data = data[qtable.data["Valid"]]
        mask_values = mask_values[qtable.data["Valid"]]

    color_wheel = ColorWheelDict()
    for exp in experiments:
        _ = color_wheel[exp]
    sample_colors = [color_wheel[qtable.get_experiment(sample)] for sample in samples]

    suptitle_space_inch = 0.4
    sample_width_inch = 0.27
    cbar_height_inch = 3
    cbar_width_inch = sample_width_inch
    cbar_gap_inch = sample_width_inch
    col_colors_height_inch = 0.12
    col_dendrogram_height_inch = 0.6 if cluster_samples else 0.0
    heatmap_height_inch = 3
    heatmap_width_inch = len(samples) * sample_width_inch

    fig_width = cbar_width_inch + heatmap_width_inch + cbar_gap_inch
    fig_height = (
        suptitle_space_inch
        + col_dendrogram_height_inch
        + col_colors_height_inch
        + heatmap_height_inch
    )
    fig_size = fig_width, fig_height

    heatmap_width = heatmap_width_inch / fig_width
    heatmap_x0 = 0
    heatmap_height = heatmap_height_inch / fig_height
    heatmap_y0 = 0
    col_colors_height = col_colors_height_inch / fig_height
    col_colors_y0 = heatmap_y0 + heatmap_height
    col_dendrogram_height = col_dendrogram_height_inch / fig_height
    col_dendrogram_y0 = col_colors_y0 + col_colors_height
    cbar_widh = cbar_width_inch / fig_width
    cbar_x0 = (heatmap_width_inch + cbar_gap_inch) / fig_width
    cbar_height = cbar_height_inch / fig_height
    cbar_y0 = col_colors_y0 - cbar_height

    heatmap_args: dict[str, Any] = {
        "cmap": "magma",
        "yticklabels": False,
        "figsize": fig_size,
    }
    if mean_center:
        data = data.sub(data.mean(axis=1), axis=0)
        heatmap_args.update({"vmin": -2.5, "vmax": 2.5, "center": 0, "cmap": "vlag"})

    # Generate the plot
    grid = sns.clustermap(
        data,
        col_cluster=cluster_samples,
        col_colors=sample_colors,
        row_colors=["#000000" for _ in range(len(data))],
        mask=mask_values,
        method=cluster_method,
        metric="euclidean",
        **heatmap_args,
    )
    # Reloacte clustermap axes to create a consistent layout
    grid.figure.suptitle(f'Hierarchically-clustered heatmap of "{tag}" values', y=1)
    grid.figure.delaxes(grid.ax_row_colors)
    grid.figure.delaxes(grid.ax_row_dendrogram)
    grid.ax_heatmap.set_position(
        [heatmap_x0, heatmap_y0, heatmap_width, heatmap_height]
    )
    grid.ax_col_colors.set_position(
        [heatmap_x0, col_colors_y0, heatmap_width, col_colors_height]
    )
    grid.ax_col_dendrogram.set_position(
        [heatmap_x0, col_dendrogram_y0, heatmap_width, col_dendrogram_height]
    )
    grid.ax_cbar.set_position([cbar_x0, cbar_y0, cbar_widh, cbar_height])

    # manually set xticks to guarantee that all samples are displayed
    if cluster_samples:
        sample_order = [samples[i] for i in grid.dendrogram_col.reordered_ind]
    else:
        sample_order = samples
    sample_ticks = np.arange(len(sample_order)) + 0.5
    grid.ax_heatmap.grid(False)
    grid.ax_heatmap.set_xticks(sample_ticks, labels=sample_order)
    grid.ax_heatmap.tick_params(
        axis="x", labelsize=plt.rcParams["axes.labelsize"], rotation=90
    )

    grid.ax_heatmap.set_facecolor("#F9F9F9")

    for ax in [grid.ax_heatmap, grid.ax_cbar, grid.ax_col_colors]:
        sns.despine(top=False, right=False, left=False, bottom=False, ax=ax)
        for spine in ["top", "right", "left", "bottom"]:
            ax.spines[spine].set_linewidth(0.75)
    return grid

sample_pca

sample_pca(
    qtable: Qtable,
    tag: str = "Expression",
    pc_x: str = "PC1",
    pc_y: str = "PC2",
    exclude_invalid: bool = True,
) -> tuple[Figure, list[Axes]]

Figure to compare sample similarities with a principle component analysis.

On the left subplots two PCA components of log2 transformed, mean centered intensity values are shown. On the right subplot the explained variance of the principle components is display as barplots.

It is possible to use intensity columns that are either log-transformed or not. The intensity values undergo an automatic evaluation to determine if they are already in log-space, and if necessary, they are transformed accordingly.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
tag str

A string that is used to extract intensity containing columns. Default "Expression".

'Expression'
pc_x str

Principle component to plot on x-axis of the scatter plot, default "PC1". The number of calculated principal components is equal to the number of samples.

'PC1'
pc_y str

Principle component to plot on y-axis of the scatter plot, default "PC2". The number of calculated principal components is equal to the number of samples.

'PC2'
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Returns:

Type Description
tuple[Figure, list[Axes]]

A matplotlib Figure and a list of Axes objects, containing the PCA plots.

Source code in msreport\plot\multivariate.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
@with_active_style
def sample_pca(
    qtable: Qtable,
    tag: str = "Expression",
    pc_x: str = "PC1",
    pc_y: str = "PC2",
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Figure to compare sample similarities with a principle component analysis.

    On the left subplots two PCA components of log2 transformed, mean centered intensity
    values are shown. On the right subplot the explained variance of the principle
    components is display as barplots.

    It is possible to use intensity columns that are either log-transformed or not. The
    intensity values undergo an automatic evaluation to determine if they are already
    in log-space, and if necessary, they are transformed accordingly.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        tag: A string that is used to extract intensity containing columns.
            Default "Expression".
        pc_x: Principle component to plot on x-axis of the scatter plot, default "PC1".
            The number of calculated principal components is equal to the number of
            samples.
        pc_y: Principle component to plot on y-axis of the scatter plot, default "PC2".
            The number of calculated principal components is equal to the number of
            samples.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the PCA plots.
    """
    design = qtable.get_design()
    if design.shape[0] < 3:
        fig, ax = plt.subplots(1, 1, figsize=(2, 1.3))
        fig.suptitle(f'PCA of "{tag}" values', y=1.1)
        ax.text(
            0.5,
            0.5,
            "PCA analysis cannot\nbe performed with\nless than 3 samples",
            ha="center",
            va="center",
        )
        ax.grid(False)
        ax.tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
        sns.despine(top=True, right=True, fig=fig)
        return fig, np.array([ax])

    table = qtable.make_sample_table(
        tag, samples_as_columns=True, exclude_invalid=exclude_invalid
    )
    table = table.replace({0: np.nan})
    table = table[np.isfinite(table).sum(axis=1) > 0]
    if not msreport.helper.intensities_in_logspace(table):
        table = np.log2(table)
    table[table.isna()] = 0

    table = table.transpose()
    sample_index = table.index.tolist()
    table = sklearn.preprocessing.scale(table, with_std=False)

    num_components = min(len(sample_index) - 1, 9)
    pca = sklearn.decomposition.PCA(n_components=num_components)
    components = pca.fit_transform(table)
    component_labels = ["PC{}".format(i + 1) for i in range(components.shape[1])]
    components_table = pd.DataFrame(
        data=components, columns=component_labels, index=sample_index
    )
    variance = pca.explained_variance_ratio_ * 100
    variance_lookup = dict(zip(component_labels, variance, strict=True))

    # Prepare colors
    color_wheel = ColorWheelDict()
    for exp in qtable.get_experiments():
        _ = color_wheel[exp]

    # Prepare figure
    num_legend_cols = 3  # math.ceil(len(qtable.get_experiments()) / 8)
    bar_width = 0.8
    bar_width_inches = 0.25
    x_padding = 0.25

    suptitle_space_inch = 0.4
    ax_height_inch = 2.7
    ax_width_inch = ax_height_inch
    ax_wspace_inch = 0.6
    bar_ax_width_inch = (num_components + (2 * x_padding)) * bar_width_inches
    width_ratios = [ax_width_inch, bar_ax_width_inch]

    fig_height = suptitle_space_inch + ax_height_inch
    fig_width = ax_height_inch + bar_ax_width_inch + ax_wspace_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    subplot_wspace = ax_wspace_inch / np.mean([ax_width_inch, bar_ax_width_inch])

    bar_half_width = 0.5
    lower_xbound = (0 - bar_half_width) - x_padding
    upper_xbound = (num_components - 1) + bar_half_width + x_padding

    fig, axes = plt.subplots(
        1,
        2,
        figsize=fig_size,
        gridspec_kw={
            "bottom": 0,
            "top": subplot_top,
            "left": 0,
            "right": 1,
            "wspace": subplot_wspace,
            "width_ratios": width_ratios,
        },
    )
    fig.suptitle(f'PCA of "{tag}" values', y=1)

    # Comparison of two principle components
    ax = axes[0]
    texts = []
    for sample, data in components_table.iterrows():
        experiment = qtable.get_experiment(str(sample))
        label = design.loc[(design["Sample"] == sample), "Replicate"].tolist()[0]
        color = color_wheel[experiment]
        edge_color = color_wheel.modified_color(experiment, 0.4)
        ax.scatter(
            data[pc_x],
            data[pc_y],
            color=color,
            edgecolor=edge_color,
            lw=0.7,
            s=50,
            label=experiment,
        )
        texts.append(ax.text(data[pc_x], data[pc_y], label))
    adjustText.adjust_text(
        texts,
        force_text=0.15,
        expand_points=(1.4, 1.4),
        lim=20,
        ax=ax,
    )
    ax.set_xlabel(f"{pc_x} ({variance_lookup[pc_x]:.1f}%)")
    ax.set_ylabel(f"{pc_y} ({variance_lookup[pc_y]:.1f}%)")
    ax.grid(axis="both", linestyle="dotted")

    # Explained variance bar plot
    ax = axes[1]
    xpos = range(len(variance))
    ax.bar(xpos, variance, width=bar_width, color="#D0D0D0", edgecolor="#000000")
    ax.set_xticks(xpos)
    ax.set_xticklabels(
        component_labels,
        rotation="vertical",
        ha="center",
        size=plt.rcParams["axes.labelsize"],
    )
    ax.set_ylabel("Explained variance [%]")
    ax.grid(False, axis="x")
    ax.set_xlim(lower_xbound, upper_xbound)

    handles, labels = axes[0].get_legend_handles_labels()
    experiment_handles = dict(zip(labels, handles, strict=True))

    first_ax_bbox = axes[1].get_position()
    legend_xgap_inches = 0.25
    legend_ygap_inches = 0.03
    legend_bbox_x = first_ax_bbox.x1 + (legend_xgap_inches / fig.get_figwidth())
    legend_bbox_y = first_ax_bbox.y1 + (legend_ygap_inches / fig.get_figheight())
    handles, _ = axes[0].get_legend_handles_labels()
    num_legend_cols = np.ceil(len(qtable.get_experiments()) / 12)
    fig.legend(
        handles=experiment_handles.values(),
        loc="upper left",
        bbox_to_anchor=(legend_bbox_x, legend_bbox_y),
        title="Experiment",
        alignment="left",
        frameon=False,
        borderaxespad=0,
        ncol=num_legend_cols,
    )

    return fig, axes

contaminants

contaminants(
    qtable: Qtable, tag: str = "iBAQ intensity"
) -> tuple[Figure, Axes]

A bar plot that displays relative contaminant amounts (iBAQ) per sample.

Requires "iBAQ intensity" columns for each sample, and a "Potential contaminant" column to identify the potential contaminant entries.

The relative iBAQ values are calculated as: sum of contaminant iBAQ intensities / sum of all iBAQ intensities * 100

It is possible to use intensity columns that are either log-transformed or not. The intensity values undergo an automatic evaluation to determine if they are already in log-space, and if necessary, they are transformed accordingly.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
tag str

A string that is used to extract iBAQ intensity containing columns. Default "iBAQ intensity".

'iBAQ intensity'

Raises:

Type Description
ValueError

If the "Potential contaminant" column is missing in the Qtable data. If the Qtable does not contain any columns for the specified 'tag'.

Returns:

Type Description
tuple[Figure, Axes]

A matplotlib Figure and an Axes object, containing the contaminants plot.

Source code in msreport\plot\quality.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
@with_active_style
def contaminants(
    qtable: Qtable, tag: str = "iBAQ intensity"
) -> tuple[plt.Figure, plt.Axes]:
    """A bar plot that displays relative contaminant amounts (iBAQ) per sample.

    Requires "iBAQ intensity" columns for each sample, and a "Potential contaminant"
    column to identify the potential contaminant entries.

    The relative iBAQ values are calculated as:
    sum of contaminant iBAQ intensities / sum of all iBAQ intensities * 100

    It is possible to use intensity columns that are either log-transformed or not. The
    intensity values undergo an automatic evaluation to determine if they are already
    in log-space, and if necessary, they are transformed accordingly.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        tag: A string that is used to extract iBAQ intensity containing columns.
            Default "iBAQ intensity".

    Raises:
        ValueError: If the "Potential contaminant" column is missing in the Qtable data.
            If the Qtable does not contain any columns for the specified 'tag'.

    Returns:
        A matplotlib Figure and an Axes object, containing the contaminants plot.
    """
    if "Potential contaminant" not in qtable.data.columns:
        raise ValueError(
            "The 'Potential contaminant' column is missing in the Qtable data."
        )
    data = qtable.make_sample_table(tag, samples_as_columns=True)
    if data.empty:
        raise ValueError(f"The Qtable does not contain any '{tag}' columns.")
    if msreport.helper.intensities_in_logspace(data):
        data = np.power(2, data)

    relative_intensity = data / data.sum() * 100
    contaminants = qtable["Potential contaminant"]
    samples = data.columns.to_list()

    color_wheel = ColorWheelDict()
    colors = [color_wheel[exp] for exp in qtable.get_experiments(samples)]
    dark_colors = [
        color_wheel.modified_color(exp, 0.4) for exp in qtable.get_experiments(samples)
    ]

    num_samples = len(samples)
    x_values = range(relative_intensity.shape[1])
    bar_values = relative_intensity[contaminants].sum(axis=0)

    suptitle_space_inch = 0.4
    ax_height_inch = 1.6
    bar_width_inches = 0.24
    x_padding = 0.24

    fig_height = ax_height_inch + suptitle_space_inch
    fig_width = (num_samples + (2 * x_padding)) * bar_width_inches
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)

    bar_width = 0.8
    bar_half_width = 0.5
    lower_xbound = (0 - bar_half_width) - x_padding
    upper_xbound = (num_samples - 1) + bar_half_width + x_padding
    min_upper_ybound = 5

    fig, ax = plt.subplots(figsize=fig_size)
    fig.subplots_adjust(bottom=0, top=subplot_top, left=0, right=1)
    fig.suptitle("Relative amount of contaminants", y=1)

    ax.bar(
        x_values,
        bar_values,
        width=bar_width,
        color=colors,
        edgecolor=dark_colors,
        zorder=3,
    )
    ax.set_xticks(x_values)
    ax.set_xticklabels(samples, fontsize=plt.rcParams["axes.labelsize"], rotation=90)
    ax.set_ylabel(f"Sum contaminant\n{tag} [%]")

    ax.grid(False, axis="x")
    sns.despine(top=True, right=True)

    ax.set_ylim(0, max(min_upper_ybound, ax.get_ylim()[1]))
    ax.set_xlim(lower_xbound, upper_xbound)
    return fig, ax

missing_values_horizontal

missing_values_horizontal(
    qtable: Qtable, exclude_invalid: bool = True
) -> tuple[Figure, Axes]

Horizontal bar plot to analyze the completeness of quantification.

Requires the columns "Missing experiment_name" and "Events experiment_name", which are added by calling msreport.analyze.analyze_missingness(qtable: Qtable).

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Returns:

Type Description
tuple[Figure, Axes]

A matplotlib Figure and Axes object, containing the missing values plot.

Source code in msreport\plot\quality.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
@with_active_style
def missing_values_horizontal(
    qtable: Qtable,
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, plt.Axes]:
    """Horizontal bar plot to analyze the completeness of quantification.

    Requires the columns "Missing experiment_name" and "Events experiment_name", which
    are added by calling msreport.analyze.analyze_missingness(qtable: Qtable).

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Returns:
        A matplotlib Figure and Axes object, containing the missing values plot.
    """
    experiments = qtable.get_experiments()
    num_experiments = len(experiments)
    qtable_data = qtable.get_data(exclude_invalid=exclude_invalid)

    data: dict[str, list] = {"exp": [], "max": [], "some": [], "min": []}
    for exp in experiments:
        exp_missing = qtable_data[f"Missing {exp}"]
        total = len(exp_missing)
        num_replicates = len(qtable.get_samples(exp))
        missing_all = (exp_missing == num_replicates).sum()
        missing_none = (exp_missing == 0).sum()
        with_missing_some = total - missing_all

        data["exp"].append(exp)
        data["max"].append(total)
        data["some"].append(with_missing_some)
        data["min"].append(missing_none)

    bar_width = 0.35

    suptitle_space_inch = 0.4
    ax_height_inch = num_experiments * bar_width
    ax_width_inch = 4
    fig_height = ax_height_inch + suptitle_space_inch
    fig_width = ax_width_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)

    fig, ax = plt.subplots(figsize=fig_size)
    fig.subplots_adjust(bottom=0, top=subplot_top, left=0, right=1)
    fig.suptitle("Completeness of quantification per experiment", y=1)

    sns.barplot(y="exp", x="max", data=data, label="All missing", color="#EB3952")
    sns.barplot(y="exp", x="some", data=data, label="Some missing", color="#FAB74E")
    sns.barplot(y="exp", x="min", data=data, label="None missing", color="#31A590")

    ax.set_ylabel("")
    ax.set_xlabel("")
    ax.set_xlim(0, total)

    ax.legend().remove()
    handles, labels = ax.get_legend_handles_labels()
    legend_ygap_inches = 0.3
    legend_bbox_y = 0 - (legend_ygap_inches / fig.get_figheight())

    fig.legend(
        handles[::-1],
        labels[::-1],
        bbox_to_anchor=(0.5, legend_bbox_y),
        loc="upper center",
        ncol=3,
        frameon=False,
        borderaxespad=0,
        handlelength=0.95,
        handleheight=1,
    )

    ax.tick_params(axis="y", labelsize=plt.rcParams["axes.labelsize"])
    ax.grid(axis="x", linestyle="solid")
    sns.despine(fig=fig, top=True, right=True, bottom=True)

    return fig, ax

missing_values_vertical

missing_values_vertical(
    qtable: Qtable, exclude_invalid: bool = True
) -> tuple[Figure, list[Axes]]

Vertical bar plot to analyze the completeness of quantification.

Requires the columns "Missing experiment_name" and "Events experiment_name", which are added by calling msreport.analyze.analyze_missingness(qtable: Qtable).

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Returns:

Type Description
Figure

A matplotlib Figure and a list of Axes objects containing the missing values

list[Axes]

plots.

Source code in msreport\plot\quality.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
@with_active_style
def missing_values_vertical(
    qtable: Qtable,
    exclude_invalid: bool = True,
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Vertical bar plot to analyze the completeness of quantification.

    Requires the columns "Missing experiment_name" and "Events experiment_name", which
    are added by calling msreport.analyze.analyze_missingness(qtable: Qtable).

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Returns:
        A matplotlib Figure and a list of Axes objects containing the missing values
        plots.
    """
    # add a deprecation warning here
    warnings.warn(
        (
            "The function `missing_values_vertical` is deprecated. Use"
            "`missing_values_horizontal` instead."
        ),
        DeprecationWarning,
        stacklevel=2,
    )

    experiments = qtable.get_experiments()
    num_experiments = len(experiments)
    qtable_data = qtable.get_data(exclude_invalid=exclude_invalid)

    barwidth = 0.8
    barcolors = ["#31A590", "#FAB74E", "#EB3952"]
    figwidth = (num_experiments * 1.2) + 0.5
    figsize = (figwidth, 3.5)
    xtick_labels = ["No missing", "Some missing", "All missing"]

    fig, axes = plt.subplots(1, num_experiments, figsize=figsize, sharey=True)
    for exp_num, exp in enumerate(experiments):
        ax = axes[exp_num]

        exp_missing = qtable_data[f"Missing {exp}"]
        exp_values = qtable_data[f"Events {exp}"]
        missing_none = (exp_missing == 0).sum()
        missing_some = ((exp_missing > 0) & (exp_values > 0)).sum()
        missing_all = (exp_values == 0).sum()

        y = [missing_none, missing_some, missing_all]
        x = range(len(y))
        ax.bar(x, y, width=barwidth, color=barcolors)
        if exp_num == 0:
            ax.set_ylabel("# Proteins")
        ax.set_title(exp)
        ax.set_xticks(np.array([0, 1, 2]) + 0.4)
        ax.set_xticklabels(xtick_labels, rotation=45, va="top", ha="right")
        ax.grid(False, axis="x")
    sns.despine(top=True, right=True)
    fig.tight_layout()
    return fig, axes

sample_correlation

sample_correlation(
    qtable: Qtable,
    exclude_invalid: bool = True,
    labels: bool = False,
) -> tuple[Figure, list[Axes]]

Generates a pair-wise correlation matrix of samples 'Expression' values.

Correlation values are calculated using the Pearson method and the "Expression" values.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True
labels bool

If True, correlation values are displayed in the heatmap.

False

Raises:

Type Description
ValueError

If less than two samples are present in the qtable.

Returns:

Type Description
Figure

A matplotlib Figure and a list of Axes objects, containing the correlation

list[Axes]

matrix plot and the color bar

Source code in msreport\plot\quality.py
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
@with_active_style
def sample_correlation(
    qtable: Qtable, exclude_invalid: bool = True, labels: bool = False
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Generates a pair-wise correlation matrix of samples 'Expression' values.

    Correlation values are calculated using the Pearson method and the "Expression"
    values.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.
        labels: If True, correlation values are displayed in the heatmap.

    Raises:
        ValueError: If less than two samples are present in the qtable.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the correlation
        matrix plot and the color bar
    """
    num_samples = qtable.design.shape[0]
    if num_samples < 2:
        raise ValueError(
            "At least two samples are required to generate a correlation matrix."
        )
    data = qtable.make_expression_table(
        samples_as_columns=True, exclude_invalid=exclude_invalid
    )
    samples = data.columns.tolist()
    corr = data.corr()
    mask = np.triu(np.ones_like(corr, dtype=bool))

    num_cells = num_samples - 1
    cell_size_inch = 0.3
    suptitle_space_inch = 0.4
    ax_height_inch = ax_width_inch = cell_size_inch * num_cells
    ax_wspace_inch = 0.4
    cbar_height_inch = max(1.2, min(3, cell_size_inch * num_cells))
    cbar_width_inch = 0.27
    width_ratios = [ax_width_inch, cbar_width_inch]
    subplot_wspace = ax_wspace_inch / np.mean([ax_width_inch, cbar_width_inch])

    fig_width = ax_width_inch + cbar_width_inch + ax_wspace_inch
    fig_height = ax_height_inch + suptitle_space_inch
    fig_size = (fig_width, fig_height)

    subplot_top = 1 - (suptitle_space_inch / fig_height)
    cbar_width = cbar_width_inch / fig_width
    cbar_height = cbar_height_inch / fig_height
    cbar_x0 = (ax_width_inch + ax_wspace_inch) / fig_width
    cbar_y0 = (ax_height_inch / fig_height) - cbar_height

    fig, axes = plt.subplots(
        1,
        2,
        figsize=fig_size,
        gridspec_kw={
            "bottom": 0,
            "top": subplot_top,
            "left": 0,
            "right": 1,
            "wspace": subplot_wspace,
            "width_ratios": width_ratios,
        },
    )
    fig.suptitle('Pairwise correlation matrix of sample "Expression" values', y=1)
    ax_heatmap, ax_cbar = axes
    ax_cbar.set_position((cbar_x0, cbar_y0, cbar_width, cbar_height))

    palette = sns.color_palette("rainbow", desat=0.8)
    cmap = mcolors.LinearSegmentedColormap.from_list("rainbow_desat", palette)
    sns.heatmap(
        corr,
        mask=mask,
        cmap=cmap,
        vmax=1,
        vmin=0.5,
        square=False,
        linewidths=0.5,
        ax=ax_heatmap,
    )
    cbar = ax_heatmap.collections[0].colorbar
    if cbar is not None:
        cbar.remove()
    fig.colorbar(ax_heatmap.collections[0], cax=ax_cbar)

    if labels:
        for i, j in itertools.product(range(num_cells + 1), range(num_cells + 1)):
            if i <= j:
                continue
            corr_value = corr.iloc[i, j]
            ax_heatmap.text(
                j + 0.5,
                i + 0.5,
                f"{corr_value:.2f}",
                ha="center",
                va="center",
                fontsize=8,  # Fontsize cannot be larger to fit in the cell
            )
    # Need to manually set ticks because sometimes not all are properly included
    ax_heatmap.set_yticks([i + 0.5 for i in range(1, len(samples))])
    ax_heatmap.set_yticklabels(samples[1:], rotation=0)
    ax_heatmap.set_xticks([i + 0.5 for i in range(0, len(samples) - 1)])
    ax_heatmap.set_xticklabels(samples[:-1], rotation=90)

    ax_heatmap.grid(False)
    ax_heatmap.tick_params(labelsize=plt.rcParams["axes.labelsize"])
    ax_heatmap.set_xlim(0, num_cells)
    ax_heatmap.set_ylim(1 + num_cells, 1)

    sns.despine(left=False, bottom=False, ax=ax_heatmap)
    for ax in [ax_heatmap, ax_cbar]:
        for spine in ["top", "right", "left", "bottom"]:
            ax.spines[spine].set_linewidth(0.75)
    return fig, axes

sample_intensities

sample_intensities(
    qtable: Qtable,
    tag: str = "Intensity",
    exclude_invalid: bool = True,
) -> tuple[Figure, list[Axes]]

Figure to compare the overall quantitative similarity of samples.

Generates two subplots to compare the intensities of multiple samples. For the top subplot a pseudo reference sample is generated by calculating the average intensity values of all samples. For each row and sample the log2 ratios to the pseudo reference are calculated. Only rows without missing values are selected, and for each sample the log2 ratios to the pseudo reference are displayed as a box plot. The lower subplot displays the summed intensity of all rows per sample as bar plots.

It is possible to use intensity columns that are either log-transformed or not. The intensity values undergo an automatic evaluation to determine if they are already in log-space, and if necessary, they are transformed accordingly.

Parameters:

Name Type Description Default
qtable Qtable

A Qtable instance, which data is used for plotting.

required
tag str

A string that is used to extract intensity containing columns. Default "Intensity".

'Intensity'
exclude_invalid bool

If True, rows are filtered according to the Boolean entries of the "Valid" column.

True

Returns:

Type Description
tuple[Figure, list[Axes]]

A matplotlib Figure and a list of Axes objects, containing the intensity plots.

Source code in msreport\plot\quality.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
@with_active_style
def sample_intensities(
    qtable: Qtable, tag: str = "Intensity", exclude_invalid: bool = True
) -> tuple[plt.Figure, list[plt.Axes]]:
    """Figure to compare the overall quantitative similarity of samples.

    Generates two subplots to compare the intensities of multiple samples. For the top
    subplot a pseudo reference sample is generated by calculating the average intensity
    values of all samples. For each row and sample the log2 ratios to the pseudo
    reference are calculated. Only rows without missing values are selected, and for
    each sample the log2 ratios to the pseudo reference are displayed as a box plot. The
    lower subplot displays the summed intensity of all rows per sample as bar plots.

    It is possible to use intensity columns that are either log-transformed or not. The
    intensity values undergo an automatic evaluation to determine if they are already
    in log-space, and if necessary, they are transformed accordingly.

    Args:
        qtable: A `Qtable` instance, which data is used for plotting.
        tag: A string that is used to extract intensity containing columns.
            Default "Intensity".
        exclude_invalid: If True, rows are filtered according to the Boolean entries of
            the "Valid" column.

    Returns:
        A matplotlib Figure and a list of Axes objects, containing the intensity plots.
    """
    table = qtable.make_sample_table(
        tag, samples_as_columns=True, exclude_invalid=exclude_invalid
    )

    table = table.replace({0: np.nan})
    if msreport.helper.intensities_in_logspace(table):
        log2_table = table
        table = np.power(2, log2_table)
    else:
        log2_table = np.log2(table)
    samples = table.columns.tolist()

    finite_values = log2_table.isna().sum(axis=1) == 0
    pseudo_ref = np.nanmean(log2_table[finite_values], axis=1)
    log2_ratios = log2_table[finite_values].subtract(pseudo_ref, axis=0)

    bar_values = table.sum()
    box_values = [log2_ratios[c] for c in log2_ratios.columns]
    color_wheel = ColorWheelDict()
    colors = [color_wheel[exp] for exp in qtable.get_experiments(samples)]
    edge_colors = [
        color_wheel.modified_color(exp, 0.4) for exp in qtable.get_experiments(samples)
    ]

    fig, axes = box_and_bars(
        box_values, bar_values, samples, colors=colors, edge_colors=edge_colors
    )
    fig.suptitle(f'Comparison of "{tag}" values', y=1)
    axes[0].set_ylabel("Ratio [log2]\nto pseudo reference")
    axes[1].set_ylabel("Total intensity")
    return fig, axes

set_active_style

set_active_style(
    style: str | None, rc: dict[str, Any] | None = None
)

Set the active plotting style for the msreport.plot submodule.

The chosen style, potentially modified by the rc dictionary, will be applied temporarily using a context manager within the library's plotting functions. This does not modify the global matplotlib rcParams permanently.

Parameters:

Name Type Description Default
style str | None

The name of the base style to activate. This can be one of the built-in msreport styles (e.g., 'notebook', 'powerpoint'), a standard matplotlib style, or a style registered by another library like Seaborn (if available).

required
rc dict[str, Any] | None

An optional dictionary mapping matplotlib rcParams names (strings) to their desired values. These settings will be applied after the base style, overriding any conflicting parameters from the base style for the duration of the plot context.

None

Raises:

Type Description
ValueError

If the specified base style name is not found among the library's styles or the available matplotlib styles.

TypeError

If rc is not a dictionary or None.

Source code in msreport\plot\style.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def set_active_style(style: str | None, rc: dict[str, Any] | None = None):
    """Set the active plotting style for the msreport.plot submodule.

    The chosen style, potentially modified by the rc dictionary, will be
    applied temporarily using a context manager within the library's
    plotting functions. This does not modify the global matplotlib rcParams
    permanently.

    Args:
        style: The name of the base style to activate. This can be one of the
            built-in msreport styles (e.g., 'notebook', 'powerpoint'),
            a standard matplotlib style, or a style registered by another
            library like Seaborn (if available).
        rc: An optional dictionary mapping matplotlib rcParams names (strings)
            to their desired values. These settings will be applied *after*
            the base style, overriding any conflicting parameters from the
            base style for the duration of the plot context.

    Raises:
        ValueError: If the specified base style name is not found among the
            library's styles or the available matplotlib styles.
        TypeError: If rc is not a dictionary or None.
    """
    global _active_style_name, _active_style_rc_override

    if style is not None and style not in _AVAILABLE_STYLES:
        current_available = _get_available_styles()
        if style not in current_available:
            raise ValueError(
                f"Style '{style}' not found. Available styles are: "
                f"{', '.join(current_available)}"
            )

    if rc is not None and not isinstance(rc, dict):
        raise TypeError(f"rc argument must be a dictionary or None, got {type(rc)}")

    _active_style_name = style
    _active_style_rc_override = rc.copy() if rc is not None else None

set_dpi

set_dpi(dpi: int) -> None

Changes the default dots per inch settings for matplotlib plots.

This effectively makes figures smaller or larger, without affecting the relative sizes of elements within the figures.

Parameters:

Name Type Description Default
dpi int

New default dots per inch.

required
Source code in msreport\plot\style.py
101
102
103
104
105
106
107
108
109
110
def set_dpi(dpi: int) -> None:
    """Changes the default dots per inch settings for matplotlib plots.

    This effectively makes figures smaller or larger, without affecting the relative
    sizes of elements within the figures.

    Args:
        dpi: New default dots per inch.
    """
    plt.rcParams["figure.dpi"] = dpi