magine DocumentationRelease 0.1a1
James C. Pino
Aug 26, 2020
Contents
1 Table of contents 31.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 MAGINE Modules Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2 Indices and tables 61
Python Module Index 63
Index 65
i
ii
magine Documentation, Release 0.1a1
Welcome to MAGINEs documentation. MAGINE was created to help organize and analyze modern high throughputdata. Specifically, we designed it for multi-sample (time series, drug dose, experimental conditions) and multipleomics platform (RNAseq, ph-silac, silac, label-free, metabolomics). The tools are designed to organize and exploreraw data. - Organize data - Automate enrichment analysis - Enable sample series enrichment exploration - Integratenetwork and enrichment analysis
MAGINE environment.
MAGINE has four main modules
• Data
• Enrichment
• Networks
• Tools
Our Data classes are built to organize and facilate exploration for both the raw data and the analysis. The data class isthe central structure that enables this.
Contents 1
magine Documentation, Release 0.1a1
2 Contents
CHAPTER 1
Table of contents
1.1 Installation
1. Install Anaconda
Our recommended approach is to use Anaconda, which is a distribution of Python containing most of the nu-meric and scientific software needed to get started. If you are a Mac or Linux user, have used Python beforeand are comfortable using pip to install software, you may want to skip this step and use your existing Pythoninstallation.
Anaconda has a simple graphical installer which can be downloaded from https://www.anaconda.com/distribution/#download-section - select your operating system and download the Python 3.7 version. Thedefault installer options are usually appropriate.
2. Open a terminal
We will install most packages with conda:
$ conda create -n magine_env python=3.7$ source activate magine_env$ conda config --add channels conda-forge$ conda install jinja2 statsmodels networkx graphviz$ conda install -c marufr python-igraph
Windows users: Please download and install igraph and pycairo using the wheel files providedby Christoph Gohlke, found at https://www.lfd.uci.edu/~gohlke/ . Assuming 64 bit windows,download python_igraph-0.7.1.post6-cp37-cp37m-win_amd64.whl from https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-igraph and pycairo-1.18.0-cp37-cp37m-win_amd64.whl fromhttps://www.lfd.uci.edu/~gohlke/pythonlibs/#pycairo
$ pip install pycairo-1.18.0-cp37-cp37m-win_amd64.whl$ pip install python_igraph-0.7.1.post6-cp37-cp37m-win_amd64.whl
3. Install MAGINE
The installation is very straightforward with pip - type the following in a terminal:
3
magine Documentation, Release 0.1a1
$ git clone https://github.com/LoLab-VU/magine$ cd magine$ pip install -r requirements.txt$ export PYTHONPATH=`pwd`:$PYTHONPATH
4. Start Python and MAGINE
If you installed Python using Anaconda on Windows, search for and select jupyter notebook from yourStart Menu (Windows). Otherwise, open a terminal and type jupyter notebook.
You will then be at the Python prompt. Type import magine to try loading magine. If no error messagesappear and the next Python prompt appears, you have succeeded in installing magine!
1.1.1 Documentation
The manual is available online at http://magine.readthedocs.io.
1.2 Data
This demonstrates the basic format for MAGINEs data.
• Identifier column : use HGNC for gene names and HMDB for metabolites
• label : used to modify the identifier column. For proteins, we tag with any PTM or provide a suffix for theexperimental method.
• species_type : gene or metabolite
• significant : Boolean flag to specify if this is a significant species.
• fold_change : scalar value noted fold_change. Expects not log2 but can convert to it later if desired.
• p_value : only needed if you want to use in post analysis.
• source : Used to group data later, we use to tag which experimental platform used
• sample_id : Provide time point or condition. These can be chained together for more complicated systems
1.2.1 Data management
Tools to process, organize, and query data. The classes are derived from pandas.DataFrame, meaning everything youcan do with pandas you can do with MAGINE.
BaseData is the core DataFrame. We provide functions that are commonly used. This class is used by both “Sample”and “EnrichmentResult”.
4 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
1.3 Tutorial
[1]: from IPython.display import display%matplotlib inlineimport matplotlib.pyplot as plt
[2]: import pandas as pdimport seaborn as snsimport numpy as np
1.3.1 ExperimentalData
Since MAGINE is built for multi-sample, multi-omics data, it is no surprise that the data is the most important aspect.Here we should how to use the :py:class:ExperimentalData class.
[3]: # load the experimental datafrom magine.data.experimental_data import load_data_csv
exp_data = load_data_csv('Data/norris_et_al_2017_cisplatin_data.csv.gz', low_→˓memory=False)
C:\Users\James\miniconda3\envs\magine_37\lib\site-packages\ipykernel_launcher.py:4:→˓DeprecationWarning:
load_data_csv will be removed in a future version of MAGINE. Use load_data instead.
[4]: help(exp_data)
Help on ExperimentalData in module magine.data.experimental_data object:
class ExperimentalData(builtins.object)| ExperimentalData(data_file)|| Manages all experimental data|| Methods defined here:|| __getitem__(self, name)
(continues on next page)
1.3. Tutorial 5
magine Documentation, Release 0.1a1
(continued from previous page)
|| __init__(self, data_file)| Parameters| ----------| data_file : str, pandas.DataFrame| Name of file, generally csv.| If provided a str, the file will be read in as a pandas.DataFrame|| __setattr__(self, name, value)| Implement setattr(self, name, value).|| create_summary_table(self, sig=False, index='identifier', save_name=None,→˓plot=False, write_latex=False)| Creates a summary table of data.||| Parameters| ----------| sig: bool| Flag to summarize significant species only| save_name: str| Name to save csv and .tex file| index: str| Index for counts| plot: bool| If you want to create a plot of the table| write_latex: bool| Create latex file of table||| Returns| -------| pandas.DataFrame|| get_measured_by_datatype(self)| Returns dict of species per data type|| Returns| -------| dict|| subset(self, species, index='identifier')| Parameters| ----------| species : list, str| List of species to create subset dataframe from| index : str| Index to filter based on provided 'species' list|| Returns| -------| magine.data.experimental_data.Species|| volcano_analysis(self, out_dir, use_sig_flag=True, p_value=0.1, fold_change_→˓cutoff=1.5)| Creates a volcano plot for each experimental method|
(continues on next page)
6 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
(continued from previous page)
| Parameters| ----------| out_dir: str, path| Path to where the output figures will be saved| use_sig_flag: bool| Use significant flag of data| p_value: float, optional| p value criteria for significant| Will not be used if use_sig_flag| fold_change_cutoff: float, optional| fold change criteria for significant| Will not be used if use_sig_flag|| Returns| -------|| ----------------------------------------------------------------------| Data descriptors defined here:|| __dict__| dictionary for instance variables (if defined)|| __weakref__| list of weak references to the object (if defined)|| compounds| Only compounds in data|| Returns| -------| Sample|| exp_methods| List of source columns|| genes| All data tagged with gene|| Includes protein and RNA.|| Returns| -------|| proteins| Protein level data|| Tagged with "gene" identifier that is not RNA|| Returns| -------|| rna| RNA level data|| Tagged with "RNA"|| Returns
(continues on next page)
1.3. Tutorial 7
magine Documentation, Release 0.1a1
(continued from previous page)
| -------|| sample_ids| List of sample_ids|| species| Returns data in Sample format|| Returns| -------| Sample
Getting counts from data
[5]: display(exp_data.create_summary_table())display(exp_data.create_summary_table(sig=True))display(exp_data.create_summary_table(sig=True, index='label'))
sample_id 01hr 06hr 24hr 48hr Total Unique AcrosssourceC18 522 227 653 685 1402HILIC 471 605 930 613 1504label_free 2766 2742 2551 2261 3447ph_silac 2608 3298 3384 3236 5113rna_seq 18741 19104 19992 - 20642silac 2923 3357 3072 3265 4086
sample_id 01hr 06hr 24hr 48hr Total Unique AcrosssourceC18 522 227 653 685 1402HILIC 471 605 930 613 1504label_free 196 46 271 874 1085ph_silac 514 888 1227 851 2278rna_seq 73 1999 12215 - 12340silac 38 52 228 266 485
sample_id 01hr 06hr 24hr 48hr Total Unique AcrosssourceC18 528 227 657 689 1412HILIC 479 611 941 621 1521label_free 201 46 281 911 1149ph_silac 594 1370 2414 1368 4757rna_seq 73 1999 12215 - 12340silac 38 52 228 266 485
[6]: exp_data.species.head(5)
[6]: identifier label species_type fold_change p_value \0 HOXD1 HOXD1_rnaseq protein -520.256762 0.001021 MIR7704 MIR7704_rnaseq protein -520.256762 0.001022 AC078814.1 AC078814.1_rnaseq protein -76.022260 0.001023 PPM1H PPM1H_rnaseq protein -76.022260 0.001024 PLCH1 PLCH1_rnaseq protein -17.888990 0.00102
significant sample_id source
(continues on next page)
8 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
(continued from previous page)
0 True 06hr rna_seq1 True 06hr rna_seq2 True 06hr rna_seq3 True 06hr rna_seq4 True 06hr rna_seq
Filter by category (experimental method)
The .species index aggregates all data.
MAGINE uses the species_type and source column name to split data into compounds, genes (in-cludes species_type==gene), rna (includes species_type==gene, source == rna), or protein(species_type==gene, source != rna). They can be accessed with the “.prefix”, such as
[7]: exp_data.genes.head(5)
[7]: identifier label species_type fold_change p_value \0 HOXD1 HOXD1_rnaseq protein -520.256762 0.001021 MIR7704 MIR7704_rnaseq protein -520.256762 0.001022 AC078814.1 AC078814.1_rnaseq protein -76.022260 0.001023 PPM1H PPM1H_rnaseq protein -76.022260 0.001024 PLCH1 PLCH1_rnaseq protein -17.888990 0.00102
significant sample_id source0 True 06hr rna_seq1 True 06hr rna_seq2 True 06hr rna_seq3 True 06hr rna_seq4 True 06hr rna_seq
[8]: exp_data.compounds.head(5)
[8]: identifier label \128152 HMDB0036114 (-)-3-Thujone128153 HMDB0001320 (13E)-11a-Hydroxy-9,15-dioxoprost-13-enoic acid128154 HMDB0012113 (22Alpha)-hydroxy-campest-4-en-3-one128155 HMDB0010361 (23S)-23,25-dihdroxy-24-oxovitamine D3 23-(bet...128156 HMDB0011644 (24R)-Cholest-5-ene-3-beta,7-alpha,24-triol
species_type fold_change p_value significant sample_id source128152 metabolites 1.6 2.100000e-02 True 06hr C18128153 metabolites 88.8 5.800000e-12 True 24hr C18128154 metabolites 100.0 9.500000e-04 True 48hr HILIC128155 metabolites -100.0 1.000000e-12 True 48hr C18128156 metabolites 1.6 7.400000e-05 True 01hr C18
Similarily, we can also filter the data by source using the .name, where name is anything in the source column.We can get a list of these by printing exp_data.exp_methods.
[9]: # prints all the available exp_methodsexp_data.exp_methods
[9]: ['rna_seq', 'ph_silac', 'label_free', 'silac', 'C18', 'HILIC']
[10]: # filters to only the 'label_free'exp_data.label_free.shape
1.3. Tutorial 9
magine Documentation, Release 0.1a1
[10]: (13085, 8)
[11]: exp_data.label_free.head(5)
[11]: identifier label species_type fold_change p_value significant \102446 LIMS1 LIMS1_lf protein 12.42 0.00003 True102447 SMARCE1 SMARCE1_lf protein -2.49 0.00030 True102448 HEXA HEXA_lf protein 6.42 0.00060 True102449 SRSF1 SRSF1_lf protein -3.21 0.00060 True102450 SF3B1 SF3B1_lf protein -1.57 0.00130 True
sample_id source102446 01hr label_free102447 01hr label_free102448 01hr label_free102449 01hr label_free102450 01hr label_free
[12]: exp_data.HILIC.head(5)
[12]: identifier label species_type \128154 HMDB0012113 (22Alpha)-hydroxy-campest-4-en-3-one metabolites128157 HMDB0011644 (24R)-Cholest-5-ene-3-beta,7-alpha,24-triol metabolites128162 HMDB0012114 (3S)-3,6-Diaminohexanoate metabolites128164 HMDB0012114 (3S)-3,6-Diaminohexanoate metabolites128166 HMDB0012115 (3S,5S)-3,5-Diaminohexanoate metabolites
fold_change p_value significant sample_id source128154 100.0 0.000950 True 48hr HILIC128157 1.7 0.000072 True 24hr HILIC128162 -1.9 0.000030 True 06hr HILIC128164 -3.0 0.002000 True 24hr HILIC128166 -1.9 0.000030 True 06hr HILIC
Significant filter
The significant column is mapped to the .sig property.
[13]: exp_data.rna_seq.sig.head(5)
[13]: identifier label species_type fold_change p_value \0 HOXD1 HOXD1_rnaseq protein -520.256762 0.001021 MIR7704 MIR7704_rnaseq protein -520.256762 0.001022 AC078814.1 AC078814.1_rnaseq protein -76.022260 0.001023 PPM1H PPM1H_rnaseq protein -76.022260 0.001024 PLCH1 PLCH1_rnaseq protein -17.888990 0.00102
significant sample_id source0 True 06hr rna_seq1 True 06hr rna_seq2 True 06hr rna_seq3 True 06hr rna_seq4 True 06hr rna_seq
10 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
Filter data to up or down regulated species.
For enrichment analysis, we will want to access up-regulated and down-regulated species.
[14]: exp_data.rna_seq.up.head(10)
[14]: identifier label species_type fold_change p_value \13 DLX2 DLX2_rnaseq protein 2.874358 0.00102018 RETSAT RETSAT_rnaseq protein 2.325934 0.00102021 SLC52A1 SLC52A1_rnaseq protein 2.871869 0.00102024 OTUD3 OTUD3_rnaseq protein 1.821775 0.00102035 RP11-209D14.2 RP11-209D14.2_rnaseq protein 1.819533 0.02520458 ZNF554 ZNF554_rnaseq protein 2.309691 0.00415359 FZD9 FZD9_rnaseq protein 1.812798 0.00102071 SBK1 SBK1_rnaseq protein 1.806427 0.00268988 PPM1D PPM1D_rnaseq protein 1.803186 0.00102092 ZNF425 ZNF425_rnaseq protein 2.846581 0.001020
significant sample_id source13 True 06hr rna_seq18 True 06hr rna_seq21 True 06hr rna_seq24 True 06hr rna_seq35 True 06hr rna_seq58 True 06hr rna_seq59 True 06hr rna_seq71 True 06hr rna_seq88 True 06hr rna_seq92 True 06hr rna_seq
[15]: exp_data.rna_seq.down.head(10)
[15]: identifier label species_type fold_change p_value \0 HOXD1 HOXD1_rnaseq protein -520.256762 0.0010201 MIR7704 MIR7704_rnaseq protein -520.256762 0.0010202 AC078814.1 AC078814.1_rnaseq protein -76.022260 0.0010203 PPM1H PPM1H_rnaseq protein -76.022260 0.0010204 PLCH1 PLCH1_rnaseq protein -17.888990 0.0010205 RP11-639F1.1 RP11-639F1.1_rnaseq protein -17.888990 0.0010206 TP63 TP63_rnaseq protein -12.355659 0.0010207 JARID2 JARID2_rnaseq protein -7.891502 0.0010208 GLI2 GLI2_rnaseq protein -5.389009 0.0010209 MAP3K5 MAP3K5_rnaseq protein -4.262353 0.001893
significant sample_id source0 True 06hr rna_seq1 True 06hr rna_seq2 True 06hr rna_seq3 True 06hr rna_seq4 True 06hr rna_seq5 True 06hr rna_seq6 True 06hr rna_seq7 True 06hr rna_seq8 True 06hr rna_seq9 True 06hr rna_seq
1.3. Tutorial 11
magine Documentation, Release 0.1a1
Extracting by sample (time point)
[16]: for i in exp_data.sample_ids:print(i)display(exp_data[i].head(5))
01hr
identifier label species_type fold_change p_value \19160 GRIK4 GRIK4_rnaseq protein 77.555651 0.01982419161 GRIK4_3p_UTR GRIK4_3p_UTR_rnaseq protein 77.555651 0.01982419162 AP001187.9 AP001187.9_rnaseq protein -25.455050 0.01982419163 MIR192 MIR192_rnaseq protein -25.455050 0.01982419164 MIR194-2 MIR194-2_rnaseq protein -25.455050 0.019824
significant sample_id source19160 True 01hr rna_seq19161 True 01hr rna_seq19162 True 01hr rna_seq19163 True 01hr rna_seq19164 True 01hr rna_seq
06hr
identifier label species_type fold_change p_value \0 HOXD1 HOXD1_rnaseq protein -520.256762 0.001021 MIR7704 MIR7704_rnaseq protein -520.256762 0.001022 AC078814.1 AC078814.1_rnaseq protein -76.022260 0.001023 PPM1H PPM1H_rnaseq protein -76.022260 0.001024 PLCH1 PLCH1_rnaseq protein -17.888990 0.00102
significant sample_id source0 True 06hr rna_seq1 True 06hr rna_seq2 True 06hr rna_seq3 True 06hr rna_seq4 True 06hr rna_seq
24hr
identifier label species_type fold_change p_value \37960 LHX3 LHX3_rnaseq protein 202.225343 0.00518037961 C17orf67 C17orf67_rnaseq protein 2.571464 0.00012337962 ALX1 ALX1_rnaseq protein -2.572587 0.00012337963 MIR7844 MIR7844_rnaseq protein 2.573033 0.00934937964 TMCC3 TMCC3_rnaseq protein 2.573033 0.009349
significant sample_id source37960 True 24hr rna_seq37961 True 24hr rna_seq37962 True 24hr rna_seq37963 True 24hr rna_seq37964 True 24hr rna_seq
48hr
identifier label species_type fold_change p_value \58025 TNS3 TNS3_1188_1197_phsilac protein -3.837129 0.04958026 SIPA1L3 SIPA1L3_S(ph)158_phsilac protein -5.119600 0.04958027 TNS3 TNS3_Y(ph)780_phsilac protein -4.986421 0.04958028 FGD6 FGD6_S(ph)554_phsilac protein -3.900705 0.049
(continues on next page)
12 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
(continued from previous page)
58029 GPN1 GPN1_S(ph)312_phsilac protein 2.901199 0.049
significant sample_id source58025 True 48hr ph_silac58026 True 48hr ph_silac58027 True 48hr ph_silac58028 True 48hr ph_silac58029 True 48hr ph_silac
Pivot table to get table across time
[17]: exp_data.label_free.pivoter(convert_to_log=False,index='identifier',columns='sample_id',values=['fold_change', 'p_value']
).head(10)
[17]: fold_change p_value \sample_id 01hr 06hr 24hr 48hr 01hr 06hr 24hridentifierA2M 1.040000 1.140 51.93 11.58 0.514800 0.44370 0.24260AACS -1.100000 3.740 NaN NaN 0.281800 0.26950 NaNAAGAB 1.000000 -1.150 1.46 -2.03 0.968100 0.39240 0.84450AAK1 1.320000 1.590 NaN 1.72 0.715800 0.18110 NaNAAMP -1.200000 -1.460 1.85 1.78 0.836800 0.55420 0.13640AAR2 NaN -1.690 NaN NaN NaN 0.96510 NaNAARS 0.326667 -0.035 -1.44 -3.12 0.299867 0.62425 0.46725AARS2 1.170000 NaN NaN NaN 0.253000 NaN NaNAARSD1 1.210000 4.070 -2.05 NaN 0.459700 0.49160 0.78440AASDHPPT -0.330000 1.020 1.07 -1.11 0.709600 0.81160 0.45290
sample_id 48hridentifierA2M 0.11130AACS NaNAAGAB 0.09760AAK1 0.95660AAMP 0.32460AAR2 NaNAARS 0.00045AARS2 NaNAARSD1 NaNAASDHPPT 0.00070
Note that in the previous two examples, we find that there are NaN values. This is because of our experiental data. Wecan easy check what species are not found in all 4 of our label free experiements.
[18]: print(len(exp_data.label_free.present_in_all_columns(index='identifier',columns='sample_id',
).id_list))
Number in index went from 3447 to 18191819
1.3. Tutorial 13
magine Documentation, Release 0.1a1
This shows that out of the 3447 unique species measured in label-free proteomics, only 1819 were measured in alltime points. What one can do with this information is dependent on the analysis. For now, we will keep using the fulldataset.
Visualization
Volcano plots
[19]: exp_data.label_free.volcano_plot();
[20]: exp_data.label_free.volcano_by_sample(sig_column=True);
14 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[21]: exp_data.label_free.plot_histogram();
Plotting subset of species
We provide the a few plotting interfaces to explore that subsets of the data. Basically, you create a list of species andprovide it to the function. It filters based on these and then returns the results.
1.3. Tutorial 15
magine Documentation, Release 0.1a1
Time series using ploty and matplotlib
[22]: exp_data.label_free.plot_species(['LMNA', 'VDAC1'], plot_type='plotly')
Data type cannot be displayed: application/vnd.plotly.v1+json, text/html
[23]: exp_data.label_free.plot_species(['LMNA', 'VDAC1'], plot_type='matplotlib');
Heatplots
[24]: exp_data.label_free.heatmap(['LMNA', 'VDAC1'],figsize=(6,4),linewidths=0.01
);
Notice that the above plot doesn’t show any of the modifiers of LMBA (no _s(ph)22_lf). This is because the default in-dex to pivot plots is the identifier column. You can set the label column for plotting by passing index=labelto the function. Note, if you want to filter the data using the more generic ‘identifier’ column, you just specify thatwith subset_index=’identifier’
16 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[25]: exp_data.label_free.heatmap(['LMNA', 'VDAC1'],subset_index='identifier',index='label',figsize=(6,4),linewidths=0.01
);
Examples
Here are a few examples how all the above commands can be chained together to create plots with varying degrees ofcritera.
Query 1:
Heatmap of label-free proteomics that are signficantly change in at least 3 time→˓points.
[26]: lf_sig = exp_data.label_free.require_n_sig(index='label',columns='sample_id',n_sig=3
)lf_sig.heatmap(
convert_to_log=True,cluster_row=True,index='label',values='fold_change',columns='sample_id',annotate_sig=True,figsize=(8, 12),div_colors=True,num_colors=21,linewidths=0.01
);
1.3. Tutorial 17
magine Documentation, Release 0.1a1
Query 2:
Changes that happen at all 3 timepoints for RNA-seq.
[27]: exp_data.rna.require_n_sig(n_sig=3, index='label').plot_species(plot_type='plotly');
18 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
Data type cannot be displayed: application/vnd.plotly.v1+json, text/html
Query 3:
• Heatmap and time series plot of proteins that are consistently down regulated at 3 time points.
[28]: exp_data.proteins.up.require_n_sig(n_sig=3, index='label').plot_species(plot_type=→˓'matplotlib');exp_data.proteins.down.require_n_sig(n_sig=3, index='label').heatmap(index='label',→˓cluster_row=True);
1.3. Tutorial 19
magine Documentation, Release 0.1a1
Query 4:
Clustered heatmap of label-free data
[29]: exp_data.silac.heatmap(linewidths=0.01, index='label',cluster_row=True, min_sig=2, figsize=(12,18));
20 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
Extending to other plots
Since our exp_data is built off a pandas.DataFrame, we can use other packages that take that data format. Seaborn isone such tool that provides some very nice plots.
[30]: label_free = exp_data.label_free.copy()label_free.log2_normalize_df(column='fold_change', inplace=True)
g = sns.PairGrid(label_free,x_vars=('sample_id'),y_vars=('fold_change', 'p_value'),hue='source',aspect=3.25, height=3.5)
g.map(sns.violinplot,palette="pastel",split=True,order=label_free.sample_ids
);
1.3. Tutorial 21
magine Documentation, Release 0.1a1
Venn diagram comparisons between measurements
[31]: from magine.plotting.venn_diagram_maker import create_venn2, create_venn3
lf = exp_data.label_free.sig.id_listsilac = exp_data.silac.sig.id_listphsilac = exp_data.ph_silac.sig.id_listhilic = exp_data.HILIC.sig.id_listrplc = exp_data.C18.sig.id_list
create_venn2(hilic, rplc, 'HILIC', 'RPLC');
22 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[32]: create_venn3(lf, silac, phsilac, 'LF', 'SILAC', 'ph-SILAC');
1.3.2 Networks
Create data driven network
[33]: from magine.networks.network_generator import build_networkimport magine.networks.utils as utilsimport networkx as nximport os
2019-09-12 15:48:17.518 - magine - INFO - Logging started on MAGINE2019-09-12 15:48:17.519 - magine - INFO - Log entry time offset from UTC: -7.00 hours
1.3. Tutorial 23
magine Documentation, Release 0.1a1
[34]: if not os.path.exists('Data/cisplatin_network.p'):network = build_network(
seed_species=exp_data.species.sig.id_list, # genes seed speciesall_measured_list=exp_data.species.id_list, # all data measureduse_biogrid=True, # expand with biogriduse_hmdb=True, # expand with hmdbuse_reactome=True, # expand with reactomeuse_signor=True, # expand with signortrim_source_sink=True, # remove all source and sink nodes not measuredsave_name='Data/cisplatin_network'
)else:
# Load the network, note that it is returned above but for future use# we will use load innetwork = nx.read_gpickle('Data/cisplatin_network.p')
utils.add_data_to_graph(network, exp_data)print("Saving network")# write to GML for cytoscape or other programnx.write_gml(
network,os.path.join('Data', 'cisplatin_network_w_attributes.gml')
)
# write to gpickle for fast loading in pythonnx.write_gpickle(
network,os.path.join('Data', 'cisplatin_based_network.p'),
)
Saving network
Explore subgraphs of network
[35]: from magine.networks.subgraphs import Subgraphfrom magine.networks.visualization import draw_igraph, draw_graphviz, draw_mpl, draw_→˓cyjsnet_sub = Subgraph(network)
[36]: print(len(network.nodes()))print(len(network.edges()))
13308181300
[37]: bax_n = net_sub.neighbors('BAX', upstream=True, downstream=True)
[38]: # display_graph(bax_n)draw_igraph(bax_n, bbox=[400, 400], node_size=25, inline=True)
[38]:
[39]: draw_mpl(bax_n, layout='fdp', scale=3, node_size=100, font_size=12);
24 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[40]: draw_graphviz(bax_n, 'fdp')
[41]: expand = net_sub.expand_neighbors(bax_n, nodes='CASP3', downstream=True)
[42]: draw_igraph(expand,bbox=[800, 800],
(continues on next page)
1.3. Tutorial 25
magine Documentation, Release 0.1a1
(continued from previous page)
node_font_size=12,font_size=8,node_size=12,inline=True,layout='graphopt')
[42]:
[43]: draw_graphviz(expand, 'sfdp', width=500)
[44]: draw_graphviz(expand, 'twopi', width=500)
26 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
1.3.3 Enrichment analysis
[45]: from magine.enrichment.enrichr import Enrichr
[46]: e = Enrichr()
[47]: label_free_enrichment = e.run_samples(exp_data.label_free.sig.up_by_sample,exp_data.label_free.sig.sample_ids,gene_set_lib='Reactome_2016')
[48]: label_free_enrichment.head(10)
1.3. Tutorial 27
magine Documentation, Release 0.1a1
[48]: term_name rank p_value z_score→˓combined_score adj_p_value genes n_genes db→˓significant sample_id18 metabolism of fat-soluble vitamins_hsa-6806667 19 0.037587 26.143791→˓ 85.780009 1.0 VKORC1 1 Reactome_2016→˓False 01hr19 metabolism_hsa-1430728 20 0.047930 2.795248→˓ 8.491991 1.0 ACSL3,HEXA,RRM1,VKORC1 4 Reactome_2016→˓False 01hr25 l1cam interactions_hsa-373760 26 0.069654 13.888889→˓ 37.003049 1.0 NRCAM 1 Reactome_2016→˓False 01hr28 cell-cell communication_hsa-1500931 29 0.093902 10.178117→˓ 24.076406 1.0 LIMS1 1 Reactome_2016→˓False 01hr29 metabolism of vitamins and cofactors_hsa-196854 30 0.105465 9.009009→˓ 20.264628 1.0 VKORC1 1 Reactome_2016→˓False 01hr42 mrna splicing - major pathway_hsa-72163 2 0.005673 17.559263→˓ 90.816629 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr43 mrna splicing_hsa-72172 3 0.006522 16.339869→˓ 82.230741 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr46 processing of capped intron-containing pre-mrn... 6 0.011454 12.191405→˓ 54.488181 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr59 mrna 3'-end processing_hsa-72187 19 0.042493 23.068051→˓ 72.858353 1.0 EIF4A3 1 Reactome_2016→˓False 06hr60 post-elongation processing of intron-containin... 20 0.042493 23.068051→˓ 72.858353 1.0 EIF4A3 1 Reactome_2016→˓False 06hr
[49]: label_free_enrichment.term_name = label_free_enrichment.term_name.str.split('_').str.→˓get(0)
[50]: label_free_enrichment.head(10)
[50]: term_name rank p_value z_score→˓combined_score adj_p_value genes n_genes db→˓significant sample_id18 metabolism of fat-soluble vitamins 19 0.037587 26.143791→˓ 85.780009 1.0 VKORC1 1 Reactome_2016→˓False 01hr19 metabolism 20 0.047930 2.795248→˓ 8.491991 1.0 ACSL3,HEXA,RRM1,VKORC1 4 Reactome_2016→˓False 01hr25 l1cam interactions 26 0.069654 13.888889→˓ 37.003049 1.0 NRCAM 1 Reactome_2016→˓False 01hr28 cell-cell communication 29 0.093902 10.178117→˓ 24.076406 1.0 LIMS1 1 Reactome_2016→˓False 01hr29 metabolism of vitamins and cofactors 30 0.105465 9.009009→˓ 20.264628 1.0 VKORC1 1 Reactome_2016→˓False 01hr
(continues on next page)
28 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
(continued from previous page)
42 mrna splicing - major pathway 2 0.005673 17.559263→˓ 90.816629 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr43 mrna splicing 3 0.006522 16.339869→˓ 82.230741 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr46 processing of capped intron-containing pre-mrna 6 0.011454 12.191405→˓ 54.488181 1.0 EIF4A3,HNRNPC 2 Reactome_2016→˓False 06hr59 mrna 3'-end processing 19 0.042493 23.068051→˓ 72.858353 1.0 EIF4A3 1 Reactome_2016→˓False 06hr60 post-elongation processing of intron-containin... 20 0.042493 23.068051→˓ 72.858353 1.0 EIF4A3 1 Reactome_2016→˓False 06hr
[51]: label_free_enrichment.heatmap(min_sig=1,figsize=(4,16),linewidths=0.01,cluster_by_set=True);
1.3. Tutorial 29
magine Documentation, Release 0.1a1
[52]: label_free_enrichment_slim = label_free_enrichment.remove_redundant(level='dataframe')
Number of rows went from 72 to 27
[53]: label_free_enrichment_slim.heatmap(min_sig=1,figsize=(4,12),linewidths=0.01,cluster_by_set=True
);
30 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[54]: display(sorted(label_free_enrichment_slim.term_name.unique()))
['amino acid transport across the plasma membrane','antigen processing-cross presentation','basigin interactions','binding and uptake of ligands by scavenger receptors','cargo concentration in the er','caspase-mediated cleavage of cytoskeletal proteins','cleavage of growing transcript in the termination region','copi-mediated anterograde transport',
(continues on next page)
1.3. Tutorial 31
magine Documentation, Release 0.1a1
(continued from previous page)
'formation of atp by chemiosmotic coupling','golgi-to-er retrograde transport','hdl-mediated lipid transport','initiation of nuclear envelope reformation','integrin cell surface interactions','l1cam interactions','metabolism of fat-soluble vitamins','mitochondrial protein import','mrna splicing - major pathway','n-glycan trimming in the er and calnexin/calreticulin cycle','nephrin interactions','regulation of complement cascade','regulation of insulin secretion','respiratory electron transport','response to elevated platelet cytosolic ca2+','srp-dependent cotranslational protein targeting to membrane','syndecan interactions','vitamin c (ascorbate) metabolism','xbp1(s) activates chaperone genes']
For a select term, we can extract out the species of interest to visualize.
[55]: exp_data.label_free.heatmap(label_free_enrichment.sig.term_to_genes('caspase-mediated cleavage of
→˓cytoskeletal proteins'),subset_index='identifier',index='label',cluster_row=True,rank_index=True,min_sig=2,linewidths=0.01,figsize=(2, 4),
);
[56]: exp_data.label_free.heatmap(label_free_enrichment.sig.term_to_genes('tp53 regulates metabolic genes'),subset_index='identifier',index='label',
(continues on next page)
32 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
(continued from previous page)
cluster_row=True,rank_index=True,min_sig=2,linewidths=0.01,figsize=(2,6),
);
No terms match subset
[57]: ph_silac_enrichment = e.run_samples(exp_data.ph_silac.sig.up_by_sample,exp_data.ph_silac.sig.sample_ids,gene_set_lib='Reactome_2016')
[58]: ph_silac_enrichment.term_name = ph_silac_enrichment.term_name.str.split('_').str.→˓get(0)
[59]: ph_silac_enrichment.heatmap(min_sig=2,figsize=(4,24),linewidths=0.01,cluster_by_set=True);
1.3. Tutorial 33
magine Documentation, Release 0.1a1
34 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[60]: ph_silac_enrichment_slim = ph_silac_enrichment.remove_redundant(level='dataframe')
ph_silac_enrichment_slim.heatmap(min_sig=3,figsize=(4,16),linewidths=0.01,cluster_by_set=True);
Number of rows went from 315 to 84
1.3. Tutorial 35
magine Documentation, Release 0.1a1
36 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
[61]: exp_data.ph_silac.heatmap(ph_silac_enrichment.sig.term_to_genes('apoptosis'),subset_index='identifier',index='label',cluster_row=True,rank_index=True,min_sig=2,linewidths=0.01,figsize=(2,12),
);
1.3. Tutorial 37
magine Documentation, Release 0.1a1
1.4 MAGINE Modules Reference
1.4.1 Data management
Tools to process, organize, and query data. The classes are derived from pandas.DataFrame, meaning everything youcan do with pandas you can do with MAGINE.
38 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
BaseData is the core DataFrame. We provide functions that are commonly used. This class is used by both “Sample”and “EnrichmentResult”.
1.4.2 BaseData
class magine.data.base.BaseData(*args, **kwargs)Bases: pandas.core.frame.DataFrame
This class derived from pd.DataFrame
heatmap(subset=None, subset_index=None, convert_to_log=True, y_tick_labels=’auto’, clus-ter_row=False, cluster_col=False, cluster_by_set=False, index=None, values=None,columns=None, annotate_sig=True, figsize=(8, 12), div_colors=True, linewidths=0,num_colors=21, rank_index=False, min_sig=0)
Creates heatmap of data, providing pivot and formatting.
Parameters
subset [list or str] Will filter to only contain a provided list. If a str, will filter based on.contains(subset)
subset_index [str] Index to for subset list to match against
convert_to_log [bool] Convert values to log2 scale
y_tick_labels [str] Column of values, default = ‘auto’
cluster_row [bool]
cluster_col [bool]
cluster_by_set [bool] Clusters by gene set, only used in EnrichmentResult derived class
index [str] Index of heatmap, will be ‘row’ variables
values [str] Values to display in heatmap
columns [str] Value that will be used as columns
annotate_sig [bool] Add ‘+’ annotation to not ‘significant=True’ column
figsize [tuple] Figure size to pass to matplotlib
div_colors [bool] Use colors that are divergent (red to blue, instead of shades of blue)
num_colors [int] How many colors to include on color bar
linewidths [float] line width between individual cols and rows
rank_index [bool] Rank index alphabetically
min_sig [int] Minimum number of significant ‘index’ across samples. Can be used to re-move rows that are not significant across any sample.
Returns
matplotlib.figure
log2_normalize_df(column=’fold_change’, inplace=False)Convert “fold_change” column to log2.
Does so by taking log2 of all positive values and -log2 of all negative values.
Parameters
column [str] Column to convert
1.4. MAGINE Modules Reference 39
magine Documentation, Release 0.1a1
inplace [bool] Where to apply log2 in place or return new dataframe
pivoter(convert_to_log=False, columns=’sample_id’, values=’fold_change’, index=None,fill_value=None, min_sig=0)
Pivot data on provided axis.
Parameters
convert_to_log [bool] Convert values column to log2
index [str] Index for pivot table
columns [str] Columns to pivot
values [str] Values of pivot table
fill_value [float, optional] Fill pivot table nans with
min_sig [int] Required number of significant terms to keep in a row, default 0
present_in_all_columns(columns=’sample_id’, index=None, inplace=False)Require index to be present in all columns
Parameters
columns [str] Columns to consider
index [str, list] The column with which to filter by counts
inplace [bool] Filter in place or return a copy of the filtered data
Returns
new_data [BaseData]
require_n_sig(columns=’sample_id’, index=None, n_sig=3, inplace=False, verbose=False)Filter index to have at least “min_terms” significant species.
Parameters
columns [str] Columns to consider
index [str, list] The column with which to filter by counts
n_sig [int] Number of terms required to not be filtered
inplace [bool] Filter in place or return a copy of the filtered data
verbose [bool]
Returns
new_data [BaseData]
sigterms with significant flag
1.4.3 Species data
class magine.data.experimental_data.Sample(*args, **kwargs)Bases: magine.data.base.BaseData
Provides tools for subsets of data types
by_sampleList of significantly flagged species by sample
40 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
downreturn down regulated species
down_by_sampleList of down regulated species by sample
exp_methodsList of sample_ids in data
id_listSet of species identifiers
label_listSet of species labels
plot_all(html_file_name, out_dir=’out’, plot_type=’plotly’, run_parallel=False)Creates a plot of all metabolites
Parameters
html_file_name [str] filename to save html of all plots
out_dir: str, path Directory that will contain all proteins
plot_type [str] plotly or matplotlib output
run_parallel [bool] Create the plots in parallel
Returns
——-
plot_histogram(save_name=None, y_range=None, out_dir=None)Plots a histogram of data
Parameters
save_name: str Name of figure
out_dir: str, path Path to location to save figure
y_range: array_like range of data
plot_pie_sig_ratio(save_name=None, ax=None, fig=None, figsize=None)
Parameters
save_name [str]
ax [matplotlib.axes, optional]
fig [matplotlib.figure]
figsize [tuple] Size of figure
plot_species(species_list=None, subset_index=None, save_name=None, out_dir=None, ti-tle=None, plot_type=’plotly’, image_format=’png’)
Create scatter plot of species list
Parameters
species_list [list] list of compounds
subset_index [list] Column to filter based on species_list
save_name [str] Name of html output file
out_dir [str] Location to place plots
1.4. MAGINE Modules Reference 41
magine Documentation, Release 0.1a1
title [str] Title for HTML page
plot_type [str] Type of plot outputs, can be “plotly” or “matplotlib”
image_format [str] pdf or png, only used if plot_type=”matplotlib”
Returns
matplotlib.Figure or plotly.Figure
sample_idsList of sample_ids in data
subset(species=None, index=’identifier’, sample_ids=None, exp_methods=None)
Parameters
species [list, str] List of species to create subset dataframe from
index [str] Index to filter based on provided ‘species’ list
sample_ids [str, list] List or string to filter sample
exp_methods [str, list] List or string to filter sample
Returns
magine.data.experimental_data.Species
upreturn up regulated species
up_by_sampleList of up regulated species by sample
volcano_by_sample(save_name=None, p_value=0.1, out_dir=None, fold_change_cutoff=1.5,y_range=None, x_range=None, sig_column=False)
Creates a figure of subplots of provided experimental method
Parameters
save_name: str name to save figure
out_dir: str, directory Location to save figure
sig_column: bool, optional If to use significant flags of data
p_value: float, optional Criteria for significant
fold_change_cutoff: float, optional Criteria for significant
y_range: array_like upper and lower bounds of plot in y direction
x_range: array_like upper and lower bounds of plot in x direction
volcano_plot(save_name=None, out_dir=None, sig_column=False, p_value=0.1,fold_change_cutoff=1.5, x_range=None, y_range=None)
Create a volcano plot of data
Parameters
save_name: str name to save figure
out_dir: str, directory Location to save figure
sig_column: bool, optional If to use significant flags of data
p_value: float, optional Criteria for significant
fold_change_cutoff: float, optional Criteria for significant
42 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
y_range: array_like upper and lower bounds of plot in y direction
x_range: array_like upper and lower bounds of plot in x direction
Returns
matplotlib.Figure
class magine.data.experimental_data.ExperimentalData(data_file)Bases: object
Manages all experimental data
compoundsOnly compounds in data
Returns
Sample
create_summary_table(sig=False, index=’identifier’, save_name=None, plot=False,write_latex=False)
Creates a summary table of data.
Parameters
sig: bool Flag to summarize significant species only
save_name: str Name to save csv and .tex file
index: str Index for counts
plot: bool If you want to create a plot of the table
write_latex: bool Create latex file of table
Returns
pandas.DataFrame
exp_methodsList of source columns
genesAll data tagged with gene
Includes protein and RNA.
get_measured_by_datatype()Returns dict of species per data type
Returns
dict
proteinsProtein level data
Tagged with “gene” identifier that is not RNA
rnaRNA level data
Tagged with “RNA”
sample_idsList of sample_ids
1.4. MAGINE Modules Reference 43
magine Documentation, Release 0.1a1
speciesReturns data in Sample format
Returns
Sample
subset(species, index=’identifier’)
Parameters
species [list, str] List of species to create subset dataframe from
index [str] Index to filter based on provided ‘species’ list
Returns
magine.data.experimental_data.Species
volcano_analysis(out_dir, use_sig_flag=True, p_value=0.1, fold_change_cutoff=1.5)Creates a volcano plot for each experimental method
Parameters
out_dir: str, path Path to where the output figures will be saved
use_sig_flag: bool Use significant flag of data
p_value: float, optional p value criteria for significant Will not be used if use_sig_flag
fold_change_cutoff: float, optional fold change criteria for significant Will not be used ifuse_sig_flag
1.4.4 Network Generators
magine.networks.network_generator module
magine.networks.network_generator.build_network(seed_species, species=’hsa’,save_name=None,all_measured_list=None,trim_source_sink=False,use_reactome=True, use_hmdb=False,use_biogrid=True, use_signor=True,verbose=False)
Construct a network from a list of gene names.
Parameters
seed_species [list] list of genes to construct network
save_name [str, optional] output name to save network. Will save one before and after IDconversion
species [str] species of proteins (‘hsa’: human, ‘mmu’:murine)
all_measured_list [list] list of all species that should be considered in network
use_reactome [bool] Add ReactomeFunctionalInteraction reaction to network
use_biogrid [bool] Add BioGrid reaction to network
use_hmdb [bool] Add HMDB reaction to network all_measured_list
use_signor [bool] Add SIGNOR reaction to network
44 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
trim_source_sink [bool, optional] Remove source and sink nodes if they are not measured innetwork
verbose [bool]
Returns
networkx.DiGraph
magine.networks.network_generator.create_background_network(save_name=’background_network’,fresh_download=False,verbose=True, cre-ate_overlap=False)
Parameters
save_name [str] Name of the network
fresh_download [bool] Download a fresh copy of the databases
verbose: bool Print information about the databases
create_overlap [bool] Creates a figure comparing the databses
Returns
——-
nx.DiGraph
magine.networks.network_generator.expand_by_db(starting_network, expansion_source,measured_list, verbose=False)
add reference network to main network
Parameters
starting_network [nx.DiGraph]
expansion_source [nx.DiGraph]
measured_list [list_like]
verbose [bool]
Returns
new_graph [nx.DiGraph]
magine.networks.annotated_set module
magine.networks.subgraphs module
Subpackages
The following subpackages to download and parse data
Network Databases
Database downloads.
MAGINE downloads network information from * Reactome Functional Interactions * HMDB * BioGrid * KEGG *Signor With the exception of KEGG and HMDB, all databases are downloaded and processed with pandas. KEGG isdownloaded using Bioservices. HMDB is download in xml format and processed using lxml parser.
1.4. MAGINE Modules Reference 45
magine Documentation, Release 0.1a1
magine.networks.databases.load_reactome_fi()Load reactome functional interaction network
Returns
pandas.DataFrame
magine.networks.databases.download_reactome_fi()Downloads reactome functional interaction network
magine.networks.databases.load_biogrid_network(fresh_download=False)
Parameters
fresh_download [bool] Download a fresh copy from biogrid
Returns
nx.DiGraph
magine.networks.databases.download_biogrid()
magine.networks.databases.load_signor(fresh_download=False)Load reactome functional interaction network
Parameters
fresh_download: bool Download fresh network
verbose [bool]
Returns
nx.DiGraph
magine.networks.databases.download_signor()
magine.networks.databases.load_kegg_mappings(species, fresh_download=False)Load mappings of kegg_pathway_id to nodes and nodes to kegg_pathway_id
Parameters
species [str] Species type, currently ‘hsa’ is the only species with automatic name conversion
fresh_download [bool] Download KEGG fresh
Returns
dict, dict
magine.networks.databases.load_kegg(species=’hsa’, fresh_download=False)Loads all KEGG pathways as a single network
Parameters
species [species] Default ‘hsa’
fresh_download [bool] Download kegg new
Returns
nx.DiGraph
magine.networks.databases.load_hmdb_network(fresh_download=False)Create HMDB network containing all metabolite-protein interactions
Parameters
fresh_download [bool] Download fresh copy from HMDB
46 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
verbose [bool]
Returns
nx.DiGraph
Visualization tools
1.4.5 Enrichment Module
enrichR module
class magine.enrichment.enrichr.Enrichr(verbose=False)Bases: object
print_valid_libs()Print a list of all available libraries EnrichR has to offer.
run(list_of_genes, gene_set_lib=’GO_Biological_Process_2017’)
Parameters
list_of_genes [list_like] List of genes using HGNC gene names
gene_set_lib [str or list] Name of gene set library To print options use En-richr.print_valid_libs
Returns
df [EnrichmentResult] Results from enrichR
Examples
>>> import pandas as pd>>> pd.set_option('display.max_colwidth', 40)>>> pd.set_option('precision', 3)>>> e = Enrichr()>>> df = e.run(['BAX', 'BCL2', 'CASP3', 'CASP8'],→˓gene_set_lib='Reactome_2016')>>> print(df[['term_name','combined_score']].head(5))#doctest: +NORMALIZE_→˓WHITESPACE
term_name combined_score0 intrinsic pathway for apoptosis hsa ... 11814.4101 apoptosis hsa r-hsa-109581 2365.1412 programmed cell death hsa r-hsa-5357801 2313.5273 caspase-mediated cleavage of cytoske... 10944.2614 caspase activation via extrinsic apo... 4245.542
run_samples(sample_lists, sample_ids, gene_set_lib=’GO_Biological_Process_2017’,save_name=None, create_html=False, out_dir=None, run_parallel=False,exp_data=None, pivot=False)
Run enrichment analysis on a list of samples.
Parameters
sample_lists [list_like] List of lists of genes for enrichment analysis
sample_ids [list] list of ids for the provided sample list
1.4. MAGINE Modules Reference 47
magine Documentation, Release 0.1a1
gene_set_lib [str, list] Type of gene set, refer to Enrichr.print_valid_libs
save_name [str, optional] if provided it will save a file as a pivoted table with the term_idsvs sample_ids
create_html [bool] Creates html of output with plots of species across sample
out_dir [str] If create_html, it will place all html plots into this directory
run_parallel [bool] If create_html, it will create plots using multiprocessing
exp_data [magine.data.ExperimentalData] Must be provided if create_html=True
pivot [bool]
Returns
EnrichmentResult
Examples
>>> import pandas as pd>>> import matplotlib.pyplot as plt>>> from magine.enrichment.enrichr import Enrichr>>> pd.set_option('display.max_colwidth', 40)>>> pd.set_option('precision', 3)>>> samples = [['BAX', 'BCL2', 'CASP3', 'CASP8'], ['ATR', 'ATM', 'TP53',→˓'CHEK1']]>>> sample_ids = ['apoptosis', 'dna_repair']>>> e = Enrichr()>>> df = e.run_samples(samples, sample_ids, gene_set_lib='Reactome_2016')>>> print(df[['term_name','combined_score']].head(5))#doctest: +NORMALIZE_→˓WHITESPACE
term_name combined_score0 intrinsic pathway for apoptosis hsa ... 11814.4101 apoptosis hsa r-hsa-109581 2365.1412 programmed cell death hsa r-hsa-5357801 2313.5273 caspase-mediated cleavage of cytoske... 10944.2614 caspase activation via extrinsic apo... 4245.542
>>> df.filter_multi(rank=10, inplace=True)>>> df['term_name'] = df['term_name'].str.split('_').str.get(0)>>> fig = df.sig.heatmap(figsize=(6, 6), linewidths=.05)
Using a ExperimentalData instance, we can run enrichR for all the databases using a simple wrapper around.
magine.enrichment.enrichr.run_enrichment_for_project(exp_data, project_name,databases=None, out-put_path=None)
Parameters
exp_data [magine.data.experimental_data.ExprerimentalData]
project_name [str]
databases [list]
output_path [str] Location to save all individual enrichment output files created.
We also provide some tools to clean up and standardize enrichRs output.
48 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
Functions to cleanup enrichR term names
Note: this are in progress and not fully tested! Warning!
magine.enrichment.enrichr.clean_term_names(row)
magine.enrichment.enrichr.clean_lincs(df)Cleans the lincs databases term_names from enrichR.
Parameters
df
magine.enrichment.enrichr.clean_drug_pert_geo(df)
magine.enrichment.enrichr.clean_tf_names(data)Cleans transcription factors databases by removing everything after ‘_’.
Parameters
data [pd.DataFrame]
Download reference databases
magine.enrichment.enrichr.get_background_list(lib_name)Return reference list for given gene referecen set
Parameters
lib_name [str]
EnrichmentResult
The results from enrichR are return in an EnrichmentResult.
class magine.enrichment.enrichment_result.EnrichmentResult(*args, **kwargs)Bases: magine.data.base.BaseData
1.4. MAGINE Modules Reference 49
magine Documentation, Release 0.1a1
all_genes_from_df()Returns all genes from gene columns in a set
Returns
set
calc_dist(level=’datafame’)
dist_matrix(figsize=(8, 8), level=’dataframe’)Create a distance matrix of all term similarity
Parameters
figsize [tuple] Size of figure
level [str, {‘dataframe’, ‘each’}] How to treats term_name to genes. Dataframe compressesall genes from all sample_ids into same term. ‘each’ treats each term_name individually.
Returns
matplotlib.Figure
filter_based_on_words(words, inplace=False)Filter term_name based on key terms
Parameters
words [list, str] List of words to use to keep rows in dataframe
inplace [bool] Filter the dataframe in place or return filtered copy
Returns
pandas.DataFrame
filter_multi(p_value=None, combined_score=None, db=None, sample_id=None, category=None,rank=None, inplace=False)
Filters an enrichment array.
This is an aggregate function that allows ones to filter an entire dataframe with a single function call.
Parameters
p_value [float] filters all values less than or equal
combined_score [float] filters all values greater than or equal
db [str, list]
sample_id [str, list]
category [str, list]
rank [int]
inplace [bool] Filter inplace
Returns
new_data [EnrichmentResult]
filter_rows(column, options, inplace=False)Filters a pandas dataframe provides a column and filter selection.
Parameters
column [str]
options [str, list] Can be a single entry or a list
50 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
inplace [bool] Filter inplace
Returns
——-
pd.DataFrame
find_similar_terms(term, level=’sample’, remove_subset=True)Calculates similarity of all other terms to given term
Parameters
term [str]
level [str] Sample or dataframe level, flattens all terms to one set of genes
remove_subset [bool] If any term is a subset of the other term, a score of 1 will be usedinstead of jaccard index.
Returns
pd.DataFrame
remove_redundant(threshold=0.75, verbose=False, level=’sample’, sort_by=’combined_score’, in-place=False)
Calculate similarity between all term sets and removes redundant terms.
Parameters
threshold [float, default 0.75]
verbose [bool, default False] Print similarity scores and removed terms.
level [{‘sample’, ‘dataframe’}, default ‘sample’] Level to filter dataframe. ‘sample’ willpivot the dataframe and filter each group of ‘sample_id’ individually. ‘dataframe’ willmerge all genes that share the same ‘term_name’.
sort_by [{‘combined_score’, ‘rank’, ‘adj_p_value’, ‘n_genes’},] default ‘combined_score’Keyword to sort the dataframe. The scoring starts at the top term and compares to all thelower terms. Options are
inplace [bool] Filter the dataframe in place or return filtered copy
Returns
pandas.DataFrame
show_terms_below(term, level=’dataframe’, threshold=0.7, remove_subset=True)Find terms that were removed by remove_redundant
Parameters
term [str]
level [str]
threshold [float]
remove_subset [bool]
Returns
EnrichmentResult
term_to_genes(term)Get set of genes of provides term(s)
Parameters
1.4. MAGINE Modules Reference 51
magine Documentation, Release 0.1a1
term [str, list]
Returns
set
term_to_genes_dict(term_list=None)
Parameters
term_list [list]
Returns
OrderedDict
unique_terms(threshold=0.75, verbose=False, level=’dataframe’)
Parameters
threshold [float]
verbose [bool]
level [str, {‘dataframe’, ‘each’}]
This can be saved just like a pandas.DataFrame and loaded in using
magine.enrichment.enrichment_result.load_enrichment_csv(file_name, **args)Load data into EnrichmentResult data class
Parameters
file_name [str]
Returns
EnrichmentResult
1.4.6 Plotting tools
Generate Heatmaps
magine.plotting.heatmaps.cluster_distance_mat(dist_mat, names, figsize=(8, 8))
Parameters
dist_mat [np.array] Distance matrix array.
names [list_like] Names of ticks for distance matrix
figsize [tuple] Size of figure, passed to matplotlib
magine.plotting.heatmaps.heatmap_by_terms(data, term_labels, term_sets, colors=None,min_sig=None, convert_to_log=False,y_tick_labels=’auto’, columns=’sample_id’,index=’identifier’, values=’fold_change’,linewidths=0, cluster_row=False,cluster_col=False, div_colors=False,num_colors=21, figsize=None, anno-tate_sig=False, **kwargs)
Parameters
data [pd.DataFrame]
52 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
term_labels [list_like] List of labels for grouping
term_sets [list_like] List of list like that create the terms
colors [list_like] Colors for plotting, if not provided it will be created
min_sig [int] Number of sign
convert_to_log [bool]
y_tick_labels [list_like]
columns [str] Name of columns of df for pivotn
index [str] Name of index of df for pivot
values [str] Name of values of df for pivot
cluster_col [bool] Cluster the data using searborn.clustermap
cluster_row [bool] Cluster rows
div_colors [bool] Use divergent colors for plotting
figsize [tuple] Size of figure, passed to matplotlib/seaborn
num_colors [int] Number of colors for color bar
annotate_sig [bool] Add ‘*’ annotation to plot for significant changed terms
linewidths [float or None] Add white line between plots
min_sig [int] Minimum number of significant ‘index’ across samples. Can be used to removerows that are not significant across any sample.
Returns
plt.Figure
magine.plotting.heatmaps.heatmap_from_array(data, convert_to_log=False,y_tick_labels=’auto’, cluster_row=False,cluster_col=False, columns=’sample_id’,index=’term_name’, val-ues=’combined_score’, div_colors=False,num_colors=7, figsize=(6, 4),rank_index=False, annotate_sig=False,linewidths=0.0, cluster_by_set=False,min_sig=0)
Parameters
data [magine.data.base.BaseData]
convert_to_log [bool] Convert fold_change column to log2 scale
y_tick_labels [list_like]
columns [str] Name of columns of df for pivot
index [str] Name of index of df for pivot
values [str] Name of values of df for pivot
cluster_col [bool] Cluster the data using searborn.clustermap
cluster_row [bool] Cluster the data using searborn.clustermap
div_colors [bool] Use divergent colors for plotting
1.4. MAGINE Modules Reference 53
magine Documentation, Release 0.1a1
figsize [tuple] Size of figure, passed to matplotlib/seaborn
rank_index [bool] Order by index.
num_colors [int] Number of colors for color bar
annotate_sig [bool] Add ‘*’ annotation to plot for significant changed terms
linewidths [float or None] Add white line between plots
cluster_by_set: bool Cluster by gene set column. Only works for enrichment_array
min_sig [int] Minimum number of significant ‘index’ across samples. Can be used to removerows that are not significant across any sample.
Returns
plt.Figure
magine.plotting.species_plotting module
magine.plotting.species_plotting.plot_dataframe(exp_data, html_filename,out_dir=’proteins’, plot_type=’plotly’,run_parallel=False)
Creates
Parameters
exp_data [magine.BaseData.]
html_filename [str]
out_dir: str, path Directory that will contain all proteins
plot_type [str] plotly or matplotlib output
run_parallel [bool] create plots in parallel
Returns
——-
magine.plotting.species_plotting.plot_genes_by_ont(data, list_of_terms, save_name,out_dir=None, exp_data=None,run_parallel=False,plot_type=’plotly’)
Creates a figure for each GO term in data
BaseData should be a result of running calculate_enrichment. This function creates a plot of all proteins perterm if a term is significant and the number of the reference set is larger than 5 and the total number of speciesmeasured is less than 100.
Parameters
data [pandas.DataFrame] previously ran enrichment analysis
list_of_terms [list_list]
save_name [str] name to save file
out_dir [str] output path for file
exp_data [magine.ExperimentalData] data to plot
run_parallel [bool] To run in parallel using pathos.multiprocessing
plot_type [str] plotly or matplotlib
54 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
Returns
out_array [dict] dict where keys are pointers to figure locations
magine.plotting.species_plotting.plot_species(df, species_list=None, save_name=’test’,out_dir=None, title=None,plot_type=’plotly’, image_format=’pdf’,close_plots=False)
Parameters
df: pandas.DataFrame magine formatted dataframe
species_list: list List of genes to be plotter
save_name: str Filename to be saved as
out_dir: str Path for output to be saved
title: str Title of plot, useful when list of genes corresponds to a GO term
plot_type [str] Use plotly to generate html output or matplotlib to generate pdf
image_format [str] pdf or png, only used if plot_type=”matplotlib”
close_plots [bool] Close plot after making, use when creating lots of plots in parallel.
magine.plotting.species_plotting.write_table_to_html(data, save_name=’index’,out_dir=None,run_parallel=False,exp_data=None,plot_type=’matplotlib’)
Creates a html table of plots of genes for each ontology term.
Parameters
data [magine.enrichment.enrichment_result.EnrichmentResult]
save_name [str] name of html output file
out_dir [str, optional] output path for all plots
run_parallel [bool] Create plots in parallel
exp_data [magine.data.ExperimentalData]
plot_type [str {‘matplotlib’, ‘plotly’}]
magine.plotting.venn_diagram_maker module
magine.plotting.venn_diagram_maker.create_venn2(list1, list2, label1, label2,save_name=None, title=None, im-age_format=’png’, ax=None)
Creates a venn digram containing for 2 lists
Parameters
list1 [list_like]
list2 [list_like]
label1 [str]
label2 [str]
save_name [str]
1.4. MAGINE Modules Reference 55
magine Documentation, Release 0.1a1
title [str]
image_format [str, optional] default png
ax [matplotlib.axes]
magine.plotting.venn_diagram_maker.create_venn3(list1, list2, list3, label1, label2,label3, save_name=None, im-age_format=’png’, title=None,ax=None, colors=(’g’, ’r’, ’b’))
Creates a venn digram containing for 3 lists
Parameters
list1 [list_like]
list2 [list_like]
list3 [list_like]
label1 [str]
label2 [str]
label3 [str]
save_name [str]
image_format [str] default png
title: str
ax [matplotlib.axes]
magine.plotting.volcano_plots module
magine.plotting.volcano_plots.add_volcano_plot(fig_axis, section_0, section_1, sec-tion_2)
Adds a volcano plot to a fig axis
Parameters
fig_axis [plt.Figure.axes]
section_0 [pd.DataFrame]
section_1 [pd.DataFrame]
section_2 [pd.DataFrame]
magine.plotting.volcano_plots.create_mask(data, use_sig=True, p_value=0.1,fold_change_cutoff=1.5)
Creates a mask for volcano plots.
# Visual example of volcano plot # section 0 are significant criteria
# 0 # 1 # 0 # # # # # ################################# # # # # # 2 # 2 # 2 # # # # ##################################
Parameters
data [pd.DataFrame]
use_sig [bool]
p_value [float] p_value threshold
56 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
fold_change_cutoff [float] fold change threshold
magine.plotting.volcano_plots.save_plot(fig, save_name, out_dir=None, im-age_type=’png’)
Saves fig
Parameters
fig [plt.Figure] Figure to be saved
save_name [str] output file name
out_dir [str, optional] output path
image_type [str, optional] output type of file, {“png”, “pdf”, etc..}
magine.plotting.volcano_plots.volcano_plot(data, save_name=None, out_dir=None,sig_column=False, p_value=0.1,fold_change_cutoff=1.5, x_range=None,y_range=None)
Create a volcano plot of data
Creates a volcano plot of data type provided
Parameters
data [pandas.DataFrame] data to create volcano plots from
save_name: str name to save figure
out_dir: str, directory Location to save figure
sig_column: bool, optional If to use significant flags of data
p_value: float, optional Criteria for significant
fold_change_cutoff: float, optional Criteria for significant
y_range: array_like upper and lower bounds of plot in y direction
x_range: array_like upper and lower bounds of plot in x direction
1.4.7 Other useful tools
magine.html_templates.html_tools module
magine.html_templates.html_tools.create_yadf_filters(table)
magine.html_templates.html_tools.format_ploty(text, save_name)
Parameters
text [str] html code to embed in file
save_name [str] html output filename
magine.html_templates.html_tools.write_filter_table(table, save_name)
Parameters
table [pandas.DataFrame]
save_name [str]
1.4. MAGINE Modules Reference 57
magine Documentation, Release 0.1a1
ID Mapping
Interface to mapping between species IDs.
We constructed two classes for mappings ids. 1. GeneMapper 2. ChemicalMapper
Currently supported options
1. Genes KEGG, HGNC, Uniprot, Entrez, ensembl_gene_id
2. Metabolites/compounds kegg_id, name, accession, chebi_id, chemspider_id, biocyc_id, synonyms, pub-chem_compound_id, protein_associations, inchikey, iupac_name, ontology, drugbank_id, chemi-cal_formula, smiles, metlin_id, average_molecular_weight
class magine.mappings.ChemicalMapper(fresh_download=False)Bases: object
Convert chemical species across various ids.
Database was creating using HMDB
check_synonym_dict(term, format_name)checks hmdb database for synonyms and returns formatted name
Parameters
term [str]
format_name [str]
Returns
dict
Examples
>>> cm = ChemicalMapper()>>> cm.check_synonym_dict(term='dodecene', format_name='main_accession')['HMDB0000933', 'HMDB0059874']
chem_name_to_hmdb
convert_kegg_nodes(network)Maps network from kegg to gene names
Parameters
network [nx.DiGraph]
Returns
dict
drugbank_to_hmdb
hmdb_accession_to_main
hmdb_main_to_protein
hmdb_to_kegg
hmdb_to_protein
58 Chapter 1. Table of contents
magine Documentation, Release 0.1a1
print_info()print information about the dataframe
valid_columns = ['kegg_id', 'name', 'accession', 'chebi_id', 'inchikey', 'chemspider_id', 'biocyc_id', 'synonyms', 'iupac_name', 'pubchem_compound_id', 'protein_associations', 'ontology', 'drugbank_id', 'chemical_formula', 'smiles', 'metlin_id', 'average_molecular_weight', 'secondary_accessions']
class magine.mappings.GeneMapper(species=’hsa’)Bases: object
Mapping class between common gene ids
Database was creating by pulling down from NCBI, UNIPROT, HGNC
check_synonym_dict(term, format_name)checks hmdb database for synonyms and returns formatted name
Parameters
term [str]
format_name [str]
Returns
dict
convert_kegg_nodes(network, species=’hsa’)Convert kegg ids to HGNC gene symbol.
Parameters
network [nx.DiGraph]
species [str {‘hsa’}] Main support for humans only.
Returns
kegg_to_gene_name, kegg_short [dict, dict]
gene_name_to_alias_name
gene_name_to_ensembl
gene_name_to_kegg
gene_name_to_uniprot
kegg_to_gene_name
kegg_to_hugo(genes, species=’hsa’)Converts all KEGG names to HGNC
Parameters
genes [list]
species [str]
Returns
dict
kegg_to_symbol_through_uniprot(unknown_genes)
kegg_to_uniprot
ncbi_to_symbol
uniprot_to_gene_name
uniprot_to_kegg
1.4. MAGINE Modules Reference 59
magine Documentation, Release 0.1a1
Database Mapping
Interface to downloading ID mapping databases.
Databases supported
URL https://www.uniprot.org/
URL https://www.ncbi.nlm.nih.gov/
URL http://www.hmdb.ca/
magine.mappings.databases.load_hgnc()
magine.mappings.databases.load_ncbi()
magine.mappings.databases.load_uniprot()
class magine.mappings.databases.HMDBBases: object
Downloads and processes HMDB metabolites database
http://www.hmdb.ca/
download_db(fresh_download)parse HMDB to Pandas.DataFrame
load_db(fresh_download=False)
60 Chapter 1. Table of contents
CHAPTER 2
Indices and tables
• genindex
• modindex
• search
61
magine Documentation, Release 0.1a1
62 Chapter 2. Indices and tables
Python Module Index
mmagine.html_templates.html_tools, 57magine.mappings, 58magine.mappings.databases, 60magine.networks.databases, 45magine.networks.network_generator, 44magine.plotting.heatmaps, 52magine.plotting.species_plotting, 54magine.plotting.venn_diagram_maker, 55magine.plotting.volcano_plots, 56
63
magine Documentation, Release 0.1a1
64 Python Module Index
Index
Aadd_volcano_plot() (in module mag-
ine.plotting.volcano_plots), 56all_genes_from_df() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 49
BBaseData (class in magine.data.base), 39build_network() (in module mag-
ine.networks.network_generator), 44by_sample (magine.data.experimental_data.Sample
attribute), 40
Ccalc_dist() (magine.enrichment.enrichment_result.EnrichmentResult
method), 50check_synonym_dict() (mag-
ine.mappings.ChemicalMapper method),58
check_synonym_dict() (mag-ine.mappings.GeneMapper method), 59
chem_name_to_hmdb (mag-ine.mappings.ChemicalMapper attribute),58
ChemicalMapper (class in magine.mappings), 58clean_drug_pert_geo() (in module mag-
ine.enrichment.enrichr), 49clean_lincs() (in module mag-
ine.enrichment.enrichr), 49clean_term_names() (in module mag-
ine.enrichment.enrichr), 49clean_tf_names() (in module mag-
ine.enrichment.enrichr), 49cluster_distance_mat() (in module mag-
ine.plotting.heatmaps), 52compounds (magine.data.experimental_data.ExperimentalData
attribute), 43
convert_kegg_nodes() (mag-ine.mappings.ChemicalMapper method),58
convert_kegg_nodes() (mag-ine.mappings.GeneMapper method), 59
create_background_network() (in module mag-ine.networks.network_generator), 45
create_mask() (in module mag-ine.plotting.volcano_plots), 56
create_summary_table() (mag-ine.data.experimental_data.ExperimentalDatamethod), 43
create_venn2() (in module mag-ine.plotting.venn_diagram_maker), 55
create_venn3() (in module mag-ine.plotting.venn_diagram_maker), 56
create_yadf_filters() (in module mag-ine.html_templates.html_tools), 57
Ddist_matrix() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 50
down (magine.data.experimental_data.Sample at-tribute), 40
down_by_sample (mag-ine.data.experimental_data.Sample attribute),41
download_biogrid() (in module mag-ine.networks.databases), 46
download_db() (magine.mappings.databases.HMDBmethod), 60
download_reactome_fi() (in module mag-ine.networks.databases), 46
download_signor() (in module mag-ine.networks.databases), 46
drugbank_to_hmdb (mag-ine.mappings.ChemicalMapper attribute),58
65
magine Documentation, Release 0.1a1
EEnrichmentResult (class in mag-
ine.enrichment.enrichment_result), 49Enrichr (class in magine.enrichment.enrichr), 47exp_methods (magine.data.experimental_data.ExperimentalData
attribute), 43exp_methods (magine.data.experimental_data.Sample
attribute), 41expand_by_db() (in module mag-
ine.networks.network_generator), 45ExperimentalData (class in mag-
ine.data.experimental_data), 43
Ffilter_based_on_words() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 50
filter_multi() (mag-ine.enrichment.enrichment_result.EnrichmentResultmethod), 50
filter_rows() (mag-ine.enrichment.enrichment_result.EnrichmentResultmethod), 50
find_similar_terms() (mag-ine.enrichment.enrichment_result.EnrichmentResultmethod), 51
format_ploty() (in module mag-ine.html_templates.html_tools), 57
Ggene_name_to_alias_name (mag-
ine.mappings.GeneMapper attribute), 59gene_name_to_ensembl (mag-
ine.mappings.GeneMapper attribute), 59gene_name_to_kegg (mag-
ine.mappings.GeneMapper attribute), 59gene_name_to_uniprot (mag-
ine.mappings.GeneMapper attribute), 59GeneMapper (class in magine.mappings), 59genes (magine.data.experimental_data.ExperimentalData
attribute), 43get_background_list() (in module mag-
ine.enrichment.enrichr), 49get_measured_by_datatype() (mag-
ine.data.experimental_data.ExperimentalDatamethod), 43
Hheatmap() (magine.data.base.BaseData method), 39heatmap_by_terms() (in module mag-
ine.plotting.heatmaps), 52heatmap_from_array() (in module mag-
ine.plotting.heatmaps), 53
HMDB (class in magine.mappings.databases), 60hmdb_accession_to_main (mag-
ine.mappings.ChemicalMapper attribute),58
hmdb_main_to_protein (mag-ine.mappings.ChemicalMapper attribute),58
hmdb_to_kegg (magine.mappings.ChemicalMapperattribute), 58
hmdb_to_protein (mag-ine.mappings.ChemicalMapper attribute),58
Iid_list (magine.data.experimental_data.Sample at-
tribute), 41
Kkegg_to_gene_name (mag-
ine.mappings.GeneMapper attribute), 59kegg_to_hugo() (magine.mappings.GeneMapper
method), 59kegg_to_symbol_through_uniprot() (mag-
ine.mappings.GeneMapper method), 59kegg_to_uniprot (magine.mappings.GeneMapper
attribute), 59
Llabel_list (magine.data.experimental_data.Sample
attribute), 41load_biogrid_network() (in module mag-
ine.networks.databases), 46load_db() (magine.mappings.databases.HMDB
method), 60load_enrichment_csv() (in module mag-
ine.enrichment.enrichment_result), 52load_hgnc() (in module mag-
ine.mappings.databases), 60load_hmdb_network() (in module mag-
ine.networks.databases), 46load_kegg() (in module magine.networks.databases),
46load_kegg_mappings() (in module mag-
ine.networks.databases), 46load_ncbi() (in module mag-
ine.mappings.databases), 60load_reactome_fi() (in module mag-
ine.networks.databases), 45load_signor() (in module mag-
ine.networks.databases), 46load_uniprot() (in module mag-
ine.mappings.databases), 60log2_normalize_df() (mag-
ine.data.base.BaseData method), 39
66 Index
magine Documentation, Release 0.1a1
Mmagine.html_templates.html_tools (mod-
ule), 57magine.mappings (module), 58magine.mappings.databases (module), 60magine.networks.databases (module), 45magine.networks.network_generator (mod-
ule), 44magine.plotting.heatmaps (module), 52magine.plotting.species_plotting (mod-
ule), 54magine.plotting.venn_diagram_maker (mod-
ule), 55magine.plotting.volcano_plots (module), 56
Nncbi_to_symbol (magine.mappings.GeneMapper at-
tribute), 59
Ppivoter() (magine.data.base.BaseData method), 40plot_all() (magine.data.experimental_data.Sample
method), 41plot_dataframe() (in module mag-
ine.plotting.species_plotting), 54plot_genes_by_ont() (in module mag-
ine.plotting.species_plotting), 54plot_histogram() (mag-
ine.data.experimental_data.Sample method),41
plot_pie_sig_ratio() (mag-ine.data.experimental_data.Sample method),41
plot_species() (in module mag-ine.plotting.species_plotting), 55
plot_species() (mag-ine.data.experimental_data.Sample method),41
present_in_all_columns() (mag-ine.data.base.BaseData method), 40
print_info() (magine.mappings.ChemicalMappermethod), 58
print_valid_libs() (mag-ine.enrichment.enrichr.Enrichr method),47
proteins (magine.data.experimental_data.ExperimentalDataattribute), 43
Rremove_redundant() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 51
require_n_sig() (magine.data.base.BaseDatamethod), 40
rna (magine.data.experimental_data.ExperimentalDataattribute), 43
run() (magine.enrichment.enrichr.Enrichr method), 47run_enrichment_for_project() (in module
magine.enrichment.enrichr), 48run_samples() (magine.enrichment.enrichr.Enrichr
method), 47
SSample (class in magine.data.experimental_data), 40sample_ids (magine.data.experimental_data.ExperimentalData
attribute), 43sample_ids (magine.data.experimental_data.Sample
attribute), 42save_plot() (in module mag-
ine.plotting.volcano_plots), 57show_terms_below() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 51
sig (magine.data.base.BaseData attribute), 40species (magine.data.experimental_data.ExperimentalData
attribute), 43subset() (magine.data.experimental_data.ExperimentalData
method), 44subset() (magine.data.experimental_data.Sample
method), 42
Tterm_to_genes() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 51
term_to_genes_dict() (mag-ine.enrichment.enrichment_result.EnrichmentResultmethod), 52
Uuniprot_to_gene_name (mag-
ine.mappings.GeneMapper attribute), 59uniprot_to_kegg (magine.mappings.GeneMapper
attribute), 59unique_terms() (mag-
ine.enrichment.enrichment_result.EnrichmentResultmethod), 52
up (magine.data.experimental_data.Sample attribute),42
up_by_sample (mag-ine.data.experimental_data.Sample attribute),42
Vvalid_columns (magine.mappings.ChemicalMapper
attribute), 59
Index 67
magine Documentation, Release 0.1a1
volcano_analysis() (mag-ine.data.experimental_data.ExperimentalDatamethod), 44
volcano_by_sample() (mag-ine.data.experimental_data.Sample method),42
volcano_plot() (in module mag-ine.plotting.volcano_plots), 57
volcano_plot() (mag-ine.data.experimental_data.Sample method),42
Wwrite_filter_table() (in module mag-
ine.html_templates.html_tools), 57write_table_to_html() (in module mag-
ine.plotting.species_plotting), 55
68 Index