Quick Start
In this vignette we will demonstrate how to use LAMP python package. The input data and reference files are located in https://github.com/wanchanglin/lamp/tree/master/examples/data.
Setup
To use LAMP, the first step is to import some python libraries including LAMP.
[1]:
import sqlite3
import pandas as pd
from lamp import anno, stats, utils
Data Loading
LAMP supports text files separated by comma (,) or tab (\t). The Microsoft’s XLSX is also supported, using argument sheet_name to indicate which sheet is used for input data. The default is 0 for the first sheet.
Here we use a small example data set with tsv format. Load it into python and check its format:
[2]:
d_data = "./data/df_pos_2.tsv"
data = pd.read_table(d_data, header=0, sep="\t")
data
[2]:
| name | namecustom | mz | mzmin | mzmax | rt | rtmin | rtmax | npeaks | . | ... | X210 | X209 | X208 | X207 | X206 | X205 | X204 | X203 | X202 | X201 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M151T34 | M150.8867T34 | 150.886715 | 150.886592 | 150.886863 | 34.152700 | 33.637595 | 35.465548 | 97 | 97 | ... | 4.224942e+06 | 3.946599e+06 | 3.668948e+06 | 3.754321e+06 | 3.853724e+06 | 3.787350e+06 | 3.584464e+06 | 3.499711e+06 | 3.623205e+06 | 4.145770e+06 |
| 1 | M151T40 | M151.0402T40 | 151.040235 | 151.040092 | 151.040350 | 39.838172 | 37.556072 | 40.532315 | 95 | 95 | ... | 1.419062e+06 | 1.251606e+06 | 1.214826e+06 | 8.143028e+05 | 5.331963e+05 | 1.930928e+06 | 1.479001e+06 | 1.076354e+06 | 9.293218e+05 | 5.298062e+05 |
| 2 | M152T40 | M152.0436T40 | 152.043607 | 152.043451 | 152.043737 | 40.303700 | 38.092678 | 40.909428 | 81 | 81 | ... | 1.203919e+05 | 9.970442e+04 | 9.384000e+04 | 4.186335e+04 | NaN | 2.115447e+05 | 1.285713e+05 | 9.389346e+04 | 7.163655e+04 | 4.916483e+04 |
| 3 | M153T34 | M152.8838T34 | 152.883824 | 152.883678 | 152.883959 | 34.174647 | 33.637595 | 35.465548 | 98 | 98 | ... | 5.592065e+06 | 5.761380e+06 | 5.845419e+06 | 5.576013e+06 | 5.552878e+06 | 6.132789e+06 | 5.891378e+06 | 5.418082e+06 | 5.036840e+06 | 5.733794e+06 |
| 4 | M153T36 | M153.0195T36 | 153.019474 | 153.019331 | 153.019633 | 35.785847 | 34.130244 | 36.287354 | 98 | 98 | ... | 7.284938e+06 | 1.083289e+07 | 1.140072e+07 | 8.220552e+06 | 9.255154e+06 | 7.648211e+06 | 7.723814e+06 | 5.571163e+06 | 5.362560e+06 | 9.259675e+06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | M283T339 | M283.2646T339 | 283.264583 | 283.264341 | 283.264809 | 338.763489 | 338.398380 | 339.165948 | 94 | 94 | ... | 3.509767e+05 | 4.117633e+05 | 3.948000e+05 | 4.338804e+05 | 5.335221e+05 | 6.224684e+05 | 7.009340e+05 | 3.005173e+05 | 3.133173e+05 | 8.204783e+05 |
| 396 | M284T60 | M284.1953T60 | 284.195294 | 284.194939 | 284.195536 | 59.593561 | 58.844217 | 60.107058 | 59 | 59 | ... | NaN | NaN | NaN | NaN | NaN | 2.558004e+04 | 4.020517e+04 | NaN | 3.162670e+04 | 5.446684e+04 |
| 397 | M284T108 | M284.2235T108 | 284.223499 | 284.223156 | 284.223692 | 108.406389 | 107.880510 | 108.971046 | 72 | 72 | ... | 7.477652e+04 | 7.482219e+04 | 3.399667e+04 | 7.233564e+04 | 1.043879e+05 | 2.506785e+04 | 2.753769e+04 | NaN | NaN | NaN |
| 398 | M284T339 | M284.268T339 | 284.267962 | 284.267634 | 284.268204 | 338.725056 | 338.268300 | 339.370098 | 84 | 84 | ... | 3.697604e+04 | 5.398264e+04 | 5.340109e+04 | 6.557698e+04 | 7.656575e+04 | 1.040606e+05 | 1.063727e+05 | NaN | 3.059370e+04 | 1.358056e+05 |
| 399 | M285T34 | M284.775T34 | 284.775031 | 284.774635 | 284.775287 | 34.079641 | 33.667172 | 35.198181 | 97 | 97 | ... | 3.439330e+06 | 3.359842e+06 | 3.375577e+06 | 3.789056e+06 | 3.478506e+06 | 3.391588e+06 | 5.067802e+06 | 3.497546e+06 | 3.316025e+06 | 3.906000e+06 |
400 rows × 110 columns
This data set includes peak list and intensity data matrix. LAMP requires peak list’s name, m/z value and retention time. User needs to indicate the locations of feature name, m/z value, retention time and starting points of data matrix from data. Here they are 1, 3, 6 and 11, respectively.
Load input data with xlsx format for LAMP:
[3]:
cols = [1, 3, 6, 11]
# d_data = "./data/df_pos_2.tsv"
# df = anno.read_peak(d_data, cols, sep='\t')
d_data = "./data/df_pos_2.xlsx" # use xlsx file
df = anno.read_peak(d_data, cols, sheet_name=0)
df
[3]:
| name | mz | rt | QC9 | QC5 | QC4 | QC3 | QC26 | QC25 | QC24 | ... | X210 | X209 | X208 | X207 | X206 | X205 | X204 | X203 | X202 | X201 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M151T34 | 150.886715 | 34.152700 | 3.664879e+06 | 3.735147e+06 | 5.190263e+06 | 2.742966e+06 | 3.824723e+06 | 3.722932e+06 | 3.804188e+06 | ... | 4.224942e+06 | 3.946599e+06 | 3.668948e+06 | 3.754321e+06 | 3.853724e+06 | 3.787350e+06 | 3.584464e+06 | 3.499711e+06 | 3.623205e+06 | 4.145770e+06 |
| 1 | M151T40 | 151.040235 | 39.838172 | 7.406381e+05 | 7.524075e+05 | NaN | 6.429245e+05 | 1.167016e+06 | 1.175981e+06 | 1.122533e+06 | ... | 1.419062e+06 | 1.251606e+06 | 1.214826e+06 | 8.143028e+05 | 5.331963e+05 | 1.930928e+06 | 1.479001e+06 | 1.076354e+06 | 9.293218e+05 | 5.298062e+05 |
| 2 | M152T40 | 152.043607 | 40.303700 | 6.105241e+04 | 5.335546e+04 | NaN | NaN | 6.875157e+04 | 7.807399e+04 | 8.943068e+04 | ... | 1.203919e+05 | 9.970442e+04 | 9.384000e+04 | 4.186335e+04 | NaN | 2.115447e+05 | 1.285713e+05 | 9.389346e+04 | 7.163655e+04 | 4.916483e+04 |
| 3 | M153T34 | 152.883824 | 34.174647 | 5.141479e+06 | 5.496344e+06 | 8.335846e+06 | 3.860588e+06 | 5.316874e+06 | 5.988232e+06 | 5.844917e+06 | ... | 5.592065e+06 | 5.761380e+06 | 5.845419e+06 | 5.576013e+06 | 5.552878e+06 | 6.132789e+06 | 5.891378e+06 | 5.418082e+06 | 5.036840e+06 | 5.733794e+06 |
| 4 | M153T36 | 153.019474 | 35.785847 | 5.336758e+06 | 5.558265e+06 | 1.118557e+07 | 6.876715e+06 | 9.967314e+06 | 9.073822e+06 | 9.328573e+06 | ... | 7.284938e+06 | 1.083289e+07 | 1.140072e+07 | 8.220552e+06 | 9.255154e+06 | 7.648211e+06 | 7.723814e+06 | 5.571163e+06 | 5.362560e+06 | 9.259675e+06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | M283T339 | 283.264583 | 338.763489 | 7.330602e+05 | 8.243956e+05 | NaN | 1.159506e+06 | 4.294760e+05 | 4.641813e+05 | 4.570657e+05 | ... | 3.509767e+05 | 4.117633e+05 | 3.948000e+05 | 4.338804e+05 | 5.335221e+05 | 6.224684e+05 | 7.009340e+05 | 3.005173e+05 | 3.133173e+05 | 8.204783e+05 |
| 396 | M284T60 | 284.195294 | 59.593561 | 2.310932e+04 | NaN | NaN | NaN | 1.759336e+04 | 2.645392e+04 | 2.727266e+04 | ... | NaN | NaN | NaN | NaN | NaN | 2.558004e+04 | 4.020517e+04 | NaN | 3.162670e+04 | 5.446684e+04 |
| 397 | M284T108 | 284.223499 | 108.406389 | 3.748444e+04 | 2.993283e+04 | NaN | NaN | 3.175596e+04 | 3.879604e+04 | 4.299529e+04 | ... | 7.477652e+04 | 7.482219e+04 | 3.399667e+04 | 7.233564e+04 | 1.043879e+05 | 2.506785e+04 | 2.753769e+04 | NaN | NaN | NaN |
| 398 | M284T339 | 284.267962 | 338.725056 | 1.161886e+05 | 1.476514e+05 | NaN | NaN | NaN | 6.753490e+04 | 5.436219e+04 | ... | 3.697604e+04 | 5.398264e+04 | 5.340109e+04 | 6.557698e+04 | 7.656575e+04 | 1.040606e+05 | 1.063727e+05 | NaN | 3.059370e+04 | 1.358056e+05 |
| 399 | M285T34 | 284.775031 | 34.079641 | 4.063268e+06 | 3.807148e+06 | 4.645099e+06 | 2.232221e+06 | 4.576754e+06 | 4.533339e+06 | 4.559356e+06 | ... | 3.439330e+06 | 3.359842e+06 | 3.375577e+06 | 3.789056e+06 | 3.478506e+06 | 3.391588e+06 | 5.067802e+06 | 3.497546e+06 | 3.316025e+06 | 3.906000e+06 |
400 rows × 103 columns
The argument sep will be ignored if the input data is an xlsx file. Data frame df now includes only name, mz, rt and intensity data matrix.
Metabolite Annotation
To perform metabolite annotation, users should provide their own reference file. Otherwise, LAMP will use its default reference file for annotation. Here we load the default reference file for compound annotation. Since the input data is positive mode here, we only use positive part of reference file. If ion_mode is empty, all reference items will be used for matching.
[4]:
ion_mode = "pos"
ref_path = "" # if empty, use default reference file for matching
# load reference library
cal_mass = False
ref = anno.read_ref(ref_path, ion_mode=ion_mode, calc=cal_mass)
ref
[4]:
| compound_name | molecular_formula | monoisotopic_mass | exact_mass | ion_type | ion_mode | smiles | inchikey | inchi | kegg_id | hmdb_id | chebi_id | pubchem_id | lipidmaps_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34230 | (-)-Salsoline | C11H15NO2 | 193.110265 | 232.073425 | [M+39K]+ | positive | COc1cc2c(cc1O)CCN[C@H]2C | YTPRLBGPGZHUPD-ZETCQYMHSA-N | InChI=1S/C11H15NO2/c1-7-9-6-11(14-2)10(13)5-8(... | C09640 | -X- | CHEBI:112 | 442356 | -X- |
| 34231 | (-)-trans-carveol | C10H16O | 152.120110 | 191.083270 | [M+39K]+ | positive | C=C(C)[C@@H]1CC=C(C)[C@@H](O)C1 | BAVONGHXFVOKBV-ZJUUUORDSA-N | InChI=1S/C10H16O/c1-7(2)9-5-4-8(3)10(11)6-9/h4... | C00964 | -X- | CHEBI:15389 | -X- | -X- |
| 34232 | (-)-ureidoglycolic acid | C3H6N2O4 | 134.032730 | 172.995890 | [M+39K]+ | positive | NC(=O)N[C@@H](O)C(=O)O | NWZYYCVIOKVTII-SFOWXEAESA-N | InChI=1S/C3H6N2O4/c4-3(9)5-1(6)2(7)8/h1,6H,(H,... | C00603 | HMDB0001005 | CHEBI:15412 | 439269 | -X- |
| 34233 | (11R)-11-hydroperoxylinoleic acid | C18H32O4 | 312.230040 | 351.193200 | [M+39K]+ | positive | CCCCCC=CC(C=CCCCCCCCC(=O)O)OO | PLWDMWAXENHPLY-UHFFFAOYSA-N | -X- | -X- | -X- | CHEBI:134247 | 5230520 | -X- |
| 34234 | (11Z,14Z)-eicosadienoylcarnitine | C27H49NO4 | 451.366135 | 490.329295 | [M+39K]+ | positive | CCCCC/C=C\C/C=C\CCCCCCCCCC(=O)OC(CC(=O)[O-])C[... | OLZWDVKTOGTVLC-UTJQPWESSA-N | InChI=1S/C27H49NO4/c1-5-6-7-8-9-10-11-12-13-14... | -X- | -X- | CHEBI:73119 | -X- | -X- |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 83155 | N(6),N(6),N(6)-trimethyl-L-lysine | C9H21N2O2+ | 189.160301 | 189.159751 | M+ | positive | C[N+](C)(C)CCCC[C@H](N)C(=O)O | MXNRLFUSFKVQSK-QMMMGPOBSA-O | InChI=1S/C9H20N2O2/c1-11(2,3)7-5-4-6-8(10)9(12... | C03793 | HMDB0001325 | CHEBI:17311 | 440120 | -X- |
| 83156 | nicotinic acid D-ribonucleotide | C11H15NO9P+ | 336.048436 | 336.047886 | M+ | positive | O=C(O)c1ccc[n+]([C@@H]2O[C@H](COP(=O)(O)O)[C@@... | JOUIQRNQJGXQDC-ZYUZMQFOSA-O | InChI=1S/C11H14NO9P/c13-8-7(5-20-22(17,18)19)2... | C01185 | -X- | CHEBI:15763 | 53477721 | -X- |
| 83157 | phosphocholine | C5H15NO4P+ | 184.073866 | 184.073316 | M+ | positive | C[N+](C)(C)CCOP(=O)(O)O | YHHSONZFOIEMCP-UHFFFAOYSA-O | InChI=1S/C5H14NO4P/c1-6(2,3)4-5-10-11(7,8)9/h4... | C00588 | HMDB0001565 | CHEBI:18132 | 1014 | -X- |
| 83158 | S-adenosyl-L-methionine | C15H23N6O5S+ | 399.145060 | 399.144510 | M+ | positive | C[S+](CC[C@H](N)C(=O)O)C[C@H]1O[C@@H](n2cnc3c(... | MEFKEPWMEQBLKI-AIRLBKTGSA-O | InChI=1S/C15H22N6O5S/c1-27(3-2-7(16)15(24)25)4... | C00019 | HMDB0001185 | CHEBI:15414 | 16757548 | -X- |
| 83159 | S-adenosylmethioninamine | C14H23N6O3S+ | 355.155232 | 355.154682 | M+ | positive | C[S+](CCCN)C[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@... | ZUNBITIXDCPNSD-LSRJEVITSA-N | InChI=1S/C14H23N6O3S/c1-24(4-2-3-15)5-8-10(21)... | C01137 | HMDB0000988 | CHEBI:15625 | 439415 | -X- |
39150 rows × 14 columns
The reference file must have one column: molecular_formula (or formula) if there is no column called ion m/z (or, m/z, exact_mass). The exact_mass is optional. if absent, LAMP will use molecular_formula to calculate ‘exact_mass’ based on the NIST Atomic Weights and Isotopic Compositions for All Elements. If your reference file has exact_mass and you still want to calculate it using NIST database, set calc as True. The exact_mass is used to match
against a range of mz, controlled by ppm, in data frame df.
As the same as input data, the reference file can be xlsx file. Another reference file is HMDB database for urine:
[5]:
ref_path = "./data/hmdb_urine_v4_0_20200910_v1.tsv"
ref = anno.read_ref(ref_path, calc=True)
ref
[5]:
| id | molecular_formula | compound_name | inchi | inchi_key | exact_mass | |
|---|---|---|---|---|---|---|
| 0 | HMDB0000001 | C7H11N3O2 | 1-Methylhistidine | InChI=1S/C7H11N3O2/c1-10-3-5(9-4-10)2-6(8)7(11... | BRMWTNUJHUMWMS-LURJTMIESA-N | 169.085127 |
| 1 | HMDB0000002 | C3H10N2 | 1,3-Diaminopropane | InChI=1S/C3H10N2/c4-2-1-3-5/h1-5H2 | XFNJVJPLKCPIBV-UHFFFAOYSA-N | 74.084398 |
| 2 | HMDB0000005 | C4H6O3 | 2-Ketobutyric acid | InChI=1S/C4H6O3/c1-2-3(5)4(6)7/h2H2,1H3,(H,6,7) | TYEYBOSBBBHJIV-UHFFFAOYSA-N | 102.031694 |
| 3 | HMDB0000008 | C4H8O3 | 2-Hydroxybutyric acid | InChI=1S/C4H8O3/c1-2-3(5)4(6)7/h3,5H,2H2,1H3,(... | AFENDNXGAFYKQO-VKHMYHEASA-N | 104.047344 |
| 4 | HMDB0000010 | C19H24O3 | 2-Methoxyestrone | InChI=1S/C19H24O3/c1-19-8-7-12-13(15(19)5-6-18... | WHEUWNKSCXYKBU-QPWUGHHJSA-N | 300.172545 |
| ... | ... | ... | ... | ... | ... | ... |
| 1606 | HMDB0012308 | C8H8O3 | Vanillin | InChI=1S/C8H8O3/c1-11-8-4-6(5-9)2-3-7(8)10/h2-... | MWOOGOJBHIARFG-UHFFFAOYSA-N | 152.047344 |
| 1607 | HMDB0012322 | C10H8O | 2-Naphthol | InChI=1S/C10H8O/c11-10-6-5-8-3-1-2-4-9(8)7-10/... | JWAZRIHNYRIHIV-UHFFFAOYSA-N | 144.057515 |
| 1608 | HMDB0012325 | C5H10O5 | Arabinofuranose | InChI=1S/C5H10O5/c6-1-2-3(7)4(8)5(9)10-2/h2-9H... | HMFHBZSHGGEWLO-HWQSCIPKSA-N | 150.052823 |
| 1609 | HMDB0012451 | C20H28O3 | all-trans-5,6-Epoxyretinoic acid | InChI=1S/C20H28O3/c1-15(8-6-9-16(2)14-17(21)22... | KEEHJLBAOLGBJZ-WEDZBJJJSA-N | 316.203845 |
| 1610 | HMDB0012467 | C15H13O9S | (-)-Epicatechin sulfate | InChI=1S/C15H14O9S/c16-9-3-8-5-13(24-25(20,21)... | WTXWEAXATVSZQX-AFYYWNPRSA-M | 369.028028 |
1611 rows × 6 columns
Next we use HMDB reference file for compounds match. Here function argument ppm is used to control the m/z matching tolerance(range).
[6]:
ppm = 5.0
match = anno.comp_match_mass(df, ppm, ref)
match
[6]:
| id | mz | molecular_formula | compound_name | inchi | inchi_key | exact_mass | ppm_error | |
|---|---|---|---|---|---|---|---|---|
| 0 | M154T37 | 154.062402 | C8H10O3 | Hydroxytyrosol | InChI=1S/C8H10O3/c9-4-3-6-1-2-7(10)8(11)5-6/h1... | JUUBCHWRXWPFFH-UHFFFAOYSA-N | 154.06 | -3.84 |
| 1 | M164T119 | 164.046774 | C9H8O3 | Phenylpyruvic acid | InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/... | BTNMPGBKDVTSJY-UHFFFAOYSA-N | 164.05 | -3.47 |
| 2 | M164T119 | 164.046774 | C9H8O3 | m-Coumaric acid | InChI=1S/C9H8O3/c10-8-3-1-2-7(6-8)4-5-9(11)12/... | KKSDGJDHHZEWEP-SNAWJCMRSA-N | 164.05 | -3.47 |
| 3 | M164T119 | 164.046774 | C9H8O3 | 4-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/... | NGSWKAQJJWESNS-ZZXKWVIFSA-N | 164.05 | -3.47 |
| 4 | M164T119 | 164.046774 | C9H8O3 | 2-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-2-1-3-7(8)5-6-9(11)12/... | PMOWTIHVNWZYFI-AATRIKPKSA-N | 164.05 | -3.47 |
| 5 | M164T233 | 164.046832 | C9H8O3 | Phenylpyruvic acid | InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/... | BTNMPGBKDVTSJY-UHFFFAOYSA-N | 164.05 | -3.12 |
| 6 | M164T233 | 164.046832 | C9H8O3 | m-Coumaric acid | InChI=1S/C9H8O3/c10-8-3-1-2-7(6-8)4-5-9(11)12/... | KKSDGJDHHZEWEP-SNAWJCMRSA-N | 164.05 | -3.12 |
| 7 | M164T233 | 164.046832 | C9H8O3 | 4-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/... | NGSWKAQJJWESNS-ZZXKWVIFSA-N | 164.05 | -3.12 |
| 8 | M164T233 | 164.046832 | C9H8O3 | 2-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-2-1-3-7(8)5-6-9(11)12/... | PMOWTIHVNWZYFI-AATRIKPKSA-N | 164.05 | -3.12 |
| 9 | M164T53 | 164.046825 | C9H8O3 | Phenylpyruvic acid | InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/... | BTNMPGBKDVTSJY-UHFFFAOYSA-N | 164.05 | -3.16 |
| 10 | M164T53 | 164.046825 | C9H8O3 | m-Coumaric acid | InChI=1S/C9H8O3/c10-8-3-1-2-7(6-8)4-5-9(11)12/... | KKSDGJDHHZEWEP-SNAWJCMRSA-N | 164.05 | -3.16 |
| 11 | M164T53 | 164.046825 | C9H8O3 | 4-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/... | NGSWKAQJJWESNS-ZZXKWVIFSA-N | 164.05 | -3.16 |
| 12 | M164T53 | 164.046825 | C9H8O3 | 2-Hydroxycinnamic acid | InChI=1S/C9H8O3/c10-8-4-2-1-3-7(8)5-6-9(11)12/... | PMOWTIHVNWZYFI-AATRIKPKSA-N | 164.05 | -3.16 |
| 13 | M167T35 | 167.021095 | C7H5NO4 | Quinolinic acid | InChI=1S/C7H5NO4/c9-6(10)4-2-1-3-8-5(4)7(11)12... | GJAWHXHKYYXBSV-UHFFFAOYSA-N | 167.02 | -4.56 |
| 14 | M173T36_3 | 173.104423 | C8H15NO3 | Hexanoylglycine | InChI=1S/C8H15NO3/c1-2-3-4-5-7(10)9-6-8(11)12/... | UPCKIPHSXMXJOX-UHFFFAOYSA-N | 173.11 | -4.45 |
| 15 | M174T35 | 174.088395 | C8H14O4 | Suberic acid | InChI=1S/C8H14O4/c9-7(10)5-3-1-2-4-6-8(11)12/h... | TYFQFVWCELRYAO-UHFFFAOYSA-N | 174.09 | -4.67 |
| 16 | M181T36 | 181.060407 | C6H7N5O2 | 8-Hydroxy-7-methylguanine | InChI=1S/C6H7N5O2/c1-11-2-3(9-6(11)13)8-5(7)10... | VHPXSVXJBWZORQ-UHFFFAOYSA-N | 181.06 | 2.39 |
| 17 | M212T39 | 212.067866 | C10H12O5 | Vanillactic acid | InChI=1S/C10H12O5/c1-15-9-5-6(2-3-7(9)11)4-8(1... | SVYIZYRTOYHQRE-UHFFFAOYSA-N | 212.07 | -2.87 |
| 18 | M276T36 | 276.077397 | C10H16N2O5S | Biotin sulfone | InChI=1S/C10H16N2O5S/c13-8(14)4-2-1-3-7-9-6(5-... | QPFQYMONYBAUCY-ZKWXMUAHSA-N | 276.08 | -2.16 |
match gives the compound matching results. LAMP also provides a mass adjust option by adduct library. You can provide your own adducts library otherwise LAMP uses its default adducts library.
The adducts library’s format looks like:
[7]:
add_path = './data/adducts_short.tsv'
lib_df = pd.read_csv(add_path, sep="\t")
lib_df
[7]:
| label | exact_mass | charge | ion_mode | |
|---|---|---|---|---|
| 0 | [M+H]+ | 1.007276 | 1 | pos |
| 1 | [M+NH4]+ | 18.033826 | 1 | pos |
| 2 | [M+Na]+ | 22.989221 | 1 | pos |
| 3 | [M+Mg]+ | 23.984493 | 1 | pos |
| 4 | [M+K]+ | 38.963158 | 1 | pos |
| 5 | [M+Fe]+ | 55.934388 | 1 | pos |
| 6 | [M+Cu]+ | 62.929049 | 1 | pos |
| 7 | [M+2H]+ | 2.015101 | 1 | pos |
| 8 | [M+3H]+ | 3.022926 | 1 | pos |
| 9 | [M-H]- | -1.007276 | 1 | neg |
| 10 | [M+35Cl]- | 34.969401 | 1 | neg |
| 11 | [M+Formate]- | 44.998203 | 1 | neg |
| 12 | [M+Acetate]- | 59.013853 | 1 | neg |
The adducts library must have columns of label, exact_mass, charge and ion_mode.
We use this adducts file to adjust mass:
[8]:
# if empty, use default adducts library
add_path = "./data/adducts_short.tsv"
lib_add = anno.read_lib(add_path, ion_mode)
lib_add
[8]:
| label | exact_mass | charge | |
|---|---|---|---|
| 0 | [M+H]+ | 1.007276 | 1 |
| 1 | [M+NH4]+ | 18.033826 | 1 |
| 2 | [M+Na]+ | 22.989221 | 1 |
| 3 | [M+Mg]+ | 23.984493 | 1 |
| 4 | [M+K]+ | 38.963158 | 1 |
| 5 | [M+Fe]+ | 55.934388 | 1 |
| 6 | [M+Cu]+ | 62.929049 | 1 |
| 7 | [M+2H]+ | 2.015101 | 1 |
| 8 | [M+3H]+ | 3.022926 | 1 |
Now use function comp_match_mass_add to match compounds:
[9]:
match_1 = anno.comp_match_mass_add(df, ppm, ref, lib_add)
match_1
[9]:
| id | mz | molecular_formula | compound_name | inchi | inchi_key | exact_mass | adduct | ppm_error | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M152T40 | 152.043607 | C5H8N2O2 | Dihydrothymine | InChI=1S/C5H8N2O2/c1-3-2-6-5(9)7-4(3)8/h3H,2H2... | NBAKTGXDIBVZOO-VKHMYHEASA-N | 152.04 | [M+Mg]+ | 3.52 |
| 1 | M154T37 | 154.062402 | C8H8O3 | p-Hydroxyphenylacetic acid | InChI=1S/C8H8O3/c9-7-3-1-6(2-4-7)5-8(10)11/h1-... | XQXPVVBIMDBYFF-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 2 | M154T37 | 154.062402 | C8H8O3 | 3-Hydroxyphenylacetic acid | InChI=1S/C8H8O3/c9-7-3-1-2-6(4-7)5-8(10)11/h1-... | FVMDYYGIDFPZAX-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 3 | M154T37 | 154.062402 | C8H8O3 | ortho-Hydroxyphenylacetic acid | InChI=1S/C8H8O3/c9-7-4-2-1-3-6(7)5-8(10)11/h1-... | CCVYRRGZDBSHFU-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 4 | M154T37 | 154.062402 | C8H8O3 | Mandelic acid | InChI=1S/C8H8O3/c9-7(8(10)11)6-4-2-1-3-5-6/h1-... | IWYDHOAUDWTVEP-ZETCQYMHSA-N | 154.06 | [M+2H]+ | -0.28 |
| 5 | M154T37 | 154.062402 | C8H8O3 | 3-Cresotinic acid | InChI=1S/C8H8O3/c1-5-3-2-4-6(7(5)9)8(10)11/h2-... | WHSXTWFYRGOBGO-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 6 | M154T37 | 154.062402 | C8H8O3 | 4-Hydroxy-3-methylbenzoic acid | InChI=1S/C8H8O3/c1-5-4-6(8(10)11)2-3-7(5)9/h2-... | LTFHNKUKQYVHDX-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 7 | M154T37 | 154.062402 | C8H8O3 | Vanillin | InChI=1S/C8H8O3/c1-11-8-4-6(5-9)2-3-7(8)10/h2-... | MWOOGOJBHIARFG-UHFFFAOYSA-N | 154.06 | [M+2H]+ | -0.28 |
| 8 | M157T35 | 157.036819 | C4H10N2O2 | 2,4-Diaminobutyric acid | InChI=1S/C4H10N2O2/c5-2-1-3(6)4(7)8/h3H,1-2,5-... | OGNSCSPNOLGXSM-UHFFFAOYSA-N | 157.04 | [M+K]+ | -3.61 |
| 9 | M157T35 | 157.036819 | C4H10N2O2 | L-2,4-diaminobutyric acid | InChI=1S/C4H10N2O2/c5-2-1-3(6)4(7)8/h3H,1-2,5-... | OGNSCSPNOLGXSM-VKHMYHEASA-N | 157.04 | [M+K]+ | -3.61 |
| 10 | M167T35 | 167.021095 | C5H8N2O2 | Dihydrothymine | InChI=1S/C5H8N2O2/c1-3-2-6-5(9)7-4(3)8/h3H,2H2... | NBAKTGXDIBVZOO-VKHMYHEASA-N | 167.02 | [M+K]+ | -3.83 |
| 11 | M174T35 | 174.088395 | C9H13NO | Phenylpropanolamine | InChI=1S/C9H13NO/c1-7(10)9(11)8-5-3-2-4-6-8/h2... | DLNKOYKMWOXYQA-VXNVDRBHSA-N | 174.09 | [M+Na]+ | -3.10 |
| 12 | M174T35 | 174.088395 | C10H14O | Thymol | InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)6-10(9)11/h4... | MGSRCZKZVOBKFT-UHFFFAOYSA-N | 174.09 | [M+Mg]+ | -3.23 |
| 13 | M174T35 | 174.088395 | C10H14O | (S)-Carvone | InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4... | ULDHMXUKGWMISQ-VIFPVBQESA-N | 174.09 | [M+Mg]+ | -3.23 |
| 14 | M174T35 | 174.088395 | C8H12O4 | 2-Octenedioic acid | InChI=1S/C8H12O4/c9-7(10)5-3-1-2-4-6-8(11)12/h... | BNTPVRGYUHJFHN-HWKANZROSA-N | 174.09 | [M+2H]+ | -1.52 |
| 15 | M174T35 | 174.088395 | C8H12O4 | cis-4-Octenedioic acid | InChI=1S/C8H12O4/c9-7(10)5-3-1-2-4-6-8(11)12/h... | LQVYKEXVMZXOAH-UPHRSURJSA-N | 174.09 | [M+2H]+ | -1.52 |
| 16 | M181T36 | 181.060407 | C8H8N2O3 | Nicotinuric acid | InChI=1S/C8H8N2O3/c11-7(12)5-10-8(13)6-2-1-3-9... | ZBSGKPYXQINNGF-UHFFFAOYSA-N | 181.06 | [M+H]+ | -2.00 |
| 17 | M184T38 | 184.097942 | C10H13N2 | Nicotine imine | InChI=1S/C10H13N2/c1-12-7-3-5-10(12)9-4-2-6-11... | GTQXYYYOJZZJHL-UHFFFAOYSA-N | 184.10 | [M+Na]+ | 4.60 |
| 18 | M185T39_2 | 185.082034 | C5H15NO4P | Phosphorylcholine | InChI=1S/C5H14NO4P/c1-6(2,3)4-5-10-11(7,8)9/h4... | YHHSONZFOIEMCP-UHFFFAOYSA-O | 185.08 | [M+H]+ | 4.80 |
| 19 | M186T36 | 186.045606 | C6H14N2O | N-Acetylputrescine | InChI=1S/C6H14N2O/c1-6(9)8-5-3-2-4-7/h2-5,7H2,... | KLZGKIDSEJWEDW-UHFFFAOYSA-N | 186.05 | [M+Fe]+ | 3.25 |
| 20 | M187T38 | 187.097642 | C5H15NO4P | Phosphorylcholine | InChI=1S/C5H14NO4P/c1-6(2,3)4-5-10-11(7,8)9/h4... | YHHSONZFOIEMCP-UHFFFAOYSA-O | 187.10 | [M+3H]+ | 4.52 |
| 21 | M193T40 | 193.050761 | C5H14N4 | Agmatine | InChI=1S/C5H14N4/c6-3-1-2-4-9-5(7)8/h1-4,6H2,(... | QYPPJABKJHAVHS-UHFFFAOYSA-N | 193.05 | [M+Cu]+ | -0.70 |
| 22 | M200T36 | 200.061328 | C7H16N2O | N-Acetylcadaverine | InChI=1S/C7H16N2O/c1-7(10)9-6-4-2-3-5-8/h2-6,8... | RMOIHHAKNOFHOE-UHFFFAOYSA-N | 200.06 | [M+Fe]+ | 3.39 |
| 23 | M201T39_1 | 201.051849 | C10H10O3 | 4-Methoxycinnamic acid | InChI=1S/C10H10O3/c1-13-9-5-2-8(3-6-9)4-7-10(1... | AFDXODALSZRGIH-QPJJXVBHSA-N | 201.05 | [M+Na]+ | -1.82 |
| 24 | M203T36_1 | 203.002108 | C9H9NO | Indole-3-carbinol | InChI=1S/C9H9NO/c11-6-7-5-10-9-4-2-1-3-8(7)9/h... | IVYPNXXAYMYVSP-UHFFFAOYSA-N | 203.00 | [M+Fe]+ | -3.42 |
| 25 | M212T39 | 212.067866 | C8H15NO3 | Hexanoylglycine | InChI=1S/C8H15NO3/c1-2-3-4-5-7(10)9-6-8(11)12/... | UPCKIPHSXMXJOX-UHFFFAOYSA-N | 212.07 | [M+K]+ | -2.29 |
| 26 | M212T39 | 212.067866 | C10H10O5 | Vanilpyruvic acid | InChI=1S/C10H10O5/c1-15-9-5-6(2-3-7(9)11)4-8(1... | YGQHQTMRZPHIBB-UHFFFAOYSA-N | 212.07 | [M+2H]+ | -0.28 |
| 27 | M217T37_1 | 217.018279 | C10H11NO | Tryptophol | InChI=1S/C10H11NO/c12-6-5-8-7-11-10-4-2-1-3-9(... | MBBOMCVGYCRMEA-UHFFFAOYSA-N | 217.02 | [M+Fe]+ | -0.79 |
| 28 | M221T37 | 221.012328 | C9H11NO2 | L-Phenylalanine | InChI=1S/C9H11NO2/c10-8(9(11)12)6-7-4-2-1-3-5-... | COLNVLDHVKWLRT-QMMMGPOBSA-N | 221.01 | [M+Fe]+ | -4.70 |
| 29 | M223T38 | 223.008162 | C4H10NO6P | O-Phosphothreonine | InChI=1S/C4H10NO6P/c1-2(3(5)4(6)7)11-12(8,9)10... | USRGIUJOYOXOQJ-GBXIJSLDSA-N | 223.01 | [M+Mg]+ | -4.06 |
| 30 | M223T40 | 223.096863 | C12H14O4 | Monoisobutyl phthalic acid | InChI=1S/C12H14O4/c1-8(2)7-16-12(15)10-6-4-3-5... | RZJSUWQGFCHNFS-UHFFFAOYSA-N | 223.10 | [M+H]+ | 1.69 |
| 31 | M226T44 | 226.128007 | C8H18N4O2 | Asymmetric dimethylarginine | InChI=1S/C8H18N4O2/c1-12(2)8(10)11-5-3-4-6(9)7... | YDGMGEXADBMOMJ-LURJTMIESA-N | 226.13 | [M+Mg]+ | 2.38 |
| 32 | M226T44 | 226.128007 | C8H18N4O2 | Symmetric dimethylarginine | InChI=1S/C8H18N4O2/c1-10-8(11-2)12-5-3-4-6(9)7... | HVPFXCBJHIIJGS-LURJTMIESA-N | 226.13 | [M+Mg]+ | 2.38 |
| 33 | M227T36 | 227.066175 | C9H10N2O5 | 3-Nitrotyrosine | InChI=1S/C9H10N2O5/c10-6(9(13)14)3-5-1-2-8(12)... | FBTSQILOGYXGMD-LURJTMIESA-N | 227.07 | [M+H]+ | -0.32 |
| 34 | M229T38 | 229.069418 | C4H10N3O5P | Phosphocreatine | InChI=1S/C4H10N3O5P/c1-7(2-3(8)9)4(5)6-13(10,1... | DRBBFCLWYRJSJZ-UHFFFAOYSA-N | 229.07 | [M+NH4]+ | -0.94 |
| 35 | M233T38 | 233.043479 | C8H10N4O2 | Caffeine | InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)... | RYYVLZVUVIJVGH-UHFFFAOYSA-N | 233.04 | [M+K]+ | -0.23 |
| 36 | M245T44 | 245.045772 | C7H15N3O3 | Homocitrulline | InChI=1S/C7H15N3O3/c8-5(6(11)12)3-1-2-4-10-7(9... | XIGSAGMEBXLVJJ-YFKPBYRVSA-N | 245.05 | [M+Fe]+ | 0.17 |
| 37 | M245T37_2 | 245.093315 | C13H18O2 | Ibuprofen | InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10... | HEFNNWSXXWATRW-UHFFFAOYSA-N | 245.09 | [M+K]+ | -2.13 |
| 38 | M249T38 | 249.038309 | C8H10N4O3 | 1,3,7-Trimethyluric acid | InChI=1S/C8H10N4O3/c1-10-4-5(9-7(10)14)11(2)8(... | BYXCFUMGEBZDDI-UHFFFAOYSA-N | 249.04 | [M+K]+ | -0.56 |
| 39 | M261T43 | 260.972975 | C10H7NO4 | Xanthurenic acid | InChI=1S/C10H7NO4/c12-7-3-1-2-5-8(13)4-6(10(14... | FBZONXHGGPHHIY-UHFFFAOYSA-N | 260.97 | [M+Fe]+ | 4.14 |
| 40 | M269T37_2 | 269.088048 | C10H12N4O5 | Inosine | InChI=1S/C10H12N4O5/c15-1-4-6(16)7(17)10(19-4)... | UGQMRVRMYYASKQ-KQYNXXCUSA-N | 269.09 | [M+H]+ | 0.01 |
| 41 | M275T168 | 275.201932 | C18H24O2 | Estradiol | InChI=1S/C18H24O2/c1-18-9-8-14-13-5-3-12(19)10... | VOXZDWNPVJITMN-ZBRFXRBCSA-N | 275.20 | [M+3H]+ | 5.00 |
| 42 | M275T168 | 275.201932 | C18H24O2 | 17a-Estradiol | InChI=1S/C18H24O2/c1-18-9-8-14-13-5-3-12(19)10... | VOXZDWNPVJITMN-SFFUCWETSA-N | 275.20 | [M+3H]+ | 5.00 |
| 43 | M277T181 | 277.217564 | C18H28O2 | 19-Norandrosterone | InChI=1S/C18H28O2/c1-18-9-8-14-13-5-3-12(19)10... | UOUIARGWRPHDBX-CQZDKXCPSA-N | 277.22 | [M+H]+ | 4.90 |
| 44 | M277T181 | 277.217564 | C18H28O2 | 19-Noretiocholanolone | InChI=1S/C18H28O2/c1-18-9-8-14-13-5-3-12(19)10... | UOUIARGWRPHDBX-DHMVHTBWSA-N | 277.22 | [M+H]+ | 4.90 |
| 45 | M278T71 | 278.148195 | C11H20N2O6 | Saccharopine | InChI=1S/C11H20N2O6/c12-7(10(16)17)3-1-2-6-13-... | ZDGJAHTZVHVLOT-YUMQZZPRSA-N | 278.15 | [M+2H]+ | 3.44 |
| 46 | M279T233 | 279.233232 | C18H30O2 | alpha-Linolenic acid | InChI=1S/C18H30O2/c1-2-3-4-5-6-7-8-9-10-11-12-... | DTOSIQBPPRVQHS-PDBXOOCHSA-N | 279.23 | [M+H]+ | 4.93 |
| 47 | M279T233 | 279.233232 | C18H28O2 | 19-Norandrosterone | InChI=1S/C18H28O2/c1-18-9-8-14-13-5-3-12(19)10... | UOUIARGWRPHDBX-CQZDKXCPSA-N | 279.23 | [M+3H]+ | 4.93 |
| 48 | M279T233 | 279.233232 | C18H28O2 | 19-Noretiocholanolone | InChI=1S/C18H28O2/c1-18-9-8-14-13-5-3-12(19)10... | UOUIARGWRPHDBX-DHMVHTBWSA-N | 279.23 | [M+3H]+ | 4.93 |
| 49 | M281T287 | 281.248903 | C18H32O2 | Linoleic acid | InChI=1S/C18H32O2/c1-2-3-4-5-6-7-8-9-10-11-12-... | OYHQOLUKZRVURQ-HZJYTTRNSA-N | 281.25 | [M+H]+ | 4.97 |
| 50 | M281T287 | 281.248903 | C18H30O2 | alpha-Linolenic acid | InChI=1S/C18H30O2/c1-2-3-4-5-6-7-8-9-10-11-12-... | DTOSIQBPPRVQHS-PDBXOOCHSA-N | 281.25 | [M+3H]+ | 4.97 |
| 51 | M282T61 | 282.070271 | C10H14N2O6 | Ribothymidine | InChI=1S/C10H14N2O6/c1-4-2-12(10(17)11-8(4)16)... | DWRXFEITVBNRMK-JXOAFFINSA-N | 282.07 | [M+Mg]+ | 2.10 |
| 52 | M282T61 | 282.070271 | C10H14N2O6 | 3-Methyluridine | InChI=1S/C10H14N2O6/c1-11-6(14)2-3-12(10(11)17... | UTQUILVPBZEHTK-UHFFFAOYSA-N | 282.07 | [M+Mg]+ | 2.10 |
| 53 | M283T37 | 283.103695 | C11H14N4O5 | 1-Methylinosine | InChI=1S/C11H14N4O5/c1-14-3-13-9-6(10(14)19)12... | WJNGQIYEQLPJMN-IOSLPCCCSA-N | 283.10 | [M+H]+ | -0.00 |
Note that this adducts library is also used to adjust mass calculation in loading reference file if there is a column called ion_type.
Correlation Analysis
Next step is correlation analysis, based on intensity data matrix along all peaks. All results are filtered by the correlation coefficient, p-values and retention time difference. That is: keep correlation results in an retention time differences/window (such as 1 second) with correlation coefficient larger than a threshold (such as 0.5) and their correlation p-values less than a threshold (such as 0.05).
LAMP supports two correlation methods, pearson and spearman. Also parameter positive allows user to select only positive correlation results, otherwise positive and negative correlations will be used.
Two functions, _tic and _toc, record the correlation computation time in seconds.
[10]:
thres_rt = 1.0
thres_corr = 0.5
thres_pval = 0.05
method = "spearman" # "pearson"
positive = True
[11]:
utils._tic()
corr = stats.comp_corr_rt(df, thres_rt, thres_corr, thres_pval, method,
positive)
utils._toc()
corr
Elapsed time: 4.374748706817627 seconds.
[11]:
| name_a | name_b | r_value | p_value | rt_diff | |
|---|---|---|---|---|---|
| 0 | M151T34 | M153T34 | 0.80 | 1.267076e-23 | 0.02 |
| 1 | M151T34 | M155T34 | 0.71 | 1.752854e-16 | 0.20 |
| 2 | M151T34 | M161T34 | 0.78 | 1.869949e-21 | 0.14 |
| 3 | M151T34 | M163T34 | 0.69 | 3.239594e-15 | 0.20 |
| 4 | M151T34 | M167T35 | 0.51 | 5.776482e-08 | 0.73 |
| ... | ... | ... | ... | ... | ... |
| 1783 | M283T34_1 | M283T34_2 | 0.62 | 4.214876e-12 | 0.29 |
| 1784 | M283T34_1 | M285T34 | 0.82 | 5.937139e-26 | 0.08 |
| 1785 | M283T34_2 | M285T34 | 0.66 | 7.898957e-14 | 0.37 |
| 1786 | M283T60 | M284T60 | 0.86 | 1.033010e-29 | 0.15 |
| 1787 | M283T339 | M284T339 | 0.91 | 4.031333e-39 | 0.04 |
1788 rows × 5 columns
corr gives results of correlation coefficient(r_value), correlation p-values(p_value) and retention time difference(rt_diff).
Based on the correlation analysis, we can extract the groups and their sizes by:
[12]:
# get correlation group and size
corr_df = stats.corr_grp_size(corr)
corr_df
[12]:
| name | cor_grp_size | cor_grp | |
|---|---|---|---|
| 0 | M219T35 | 52 | M221T34::M223T34::M225T35::M226T35::M229T34::M... |
| 1 | M217T35 | 52 | M218T35::M219T34::M219T35::M221T34::M223T34::M... |
| 2 | M216T35 | 52 | M217T35::M218T35::M219T34::M219T35::M221T34::M... |
| 3 | M215T35 | 52 | M216T35::M217T35::M218T35::M219T34::M219T35::M... |
| 4 | M218T35 | 51 | M219T34::M219T35::M221T34::M223T34::M225T35::M... |
| ... | ... | ... | ... |
| 335 | M171T180 | 1 | M173T181 |
| 336 | M257T51 | 1 | M258T51 |
| 337 | M163T415 | 1 | M219T415 |
| 338 | M203T34 | 1 | M229T35 |
| 339 | M171T119 | 1 | M173T119 |
340 rows × 3 columns
Summarize Results
The final step gets the summary table in different format and save for the further analysis.
[13]:
# get summary of metabolite annotation
sr, mr = anno.comp_summ(df, match)
This function combines peak table with compound matching results and returns two results in different formats. sr is single row results for each peak id in peak table df:
[14]:
sr
[14]:
| name | mz | rt | exact_mass | ppm_error | molecular_formula | compound_name | inchi | inchi_key | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M151T34 | 150.886715 | 34.152700 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | M151T40 | 151.040235 | 39.838172 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | M152T40 | 152.043607 | 40.303700 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | M153T34 | 152.883824 | 34.174647 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | M153T36 | 153.019474 | 35.785847 | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | M283T61 | 283.068474 | 60.739869 | NaN | NaN | NaN | NaN | NaN | NaN |
| 396 | M284T108 | 284.223499 | 108.406389 | NaN | NaN | NaN | NaN | NaN | NaN |
| 397 | M284T339 | 284.267962 | 338.725056 | NaN | NaN | NaN | NaN | NaN | NaN |
| 398 | M284T60 | 284.195294 | 59.593561 | NaN | NaN | NaN | NaN | NaN | NaN |
| 399 | M285T34 | 284.775031 | 34.079641 | NaN | NaN | NaN | NaN | NaN | NaN |
400 rows × 9 columns
mr is multiple rows format if the match more than once from the reference file:
[15]:
mr
[15]:
| name | mz | rt | molecular_formula | compound_name | inchi | inchi_key | exact_mass | ppm_error | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M151T34 | 150.886715 | 34.152700 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | M151T40 | 151.040235 | 39.838172 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | M152T40 | 152.043607 | 40.303700 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | M153T34 | 152.883824 | 34.174647 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | M153T36 | 153.019474 | 35.785847 | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 404 | M283T61 | 283.068474 | 60.739869 | NaN | NaN | NaN | NaN | NaN | NaN |
| 405 | M284T108 | 284.223499 | 108.406389 | NaN | NaN | NaN | NaN | NaN | NaN |
| 406 | M284T339 | 284.267962 | 338.725056 | NaN | NaN | NaN | NaN | NaN | NaN |
| 407 | M284T60 | 284.195294 | 59.593561 | NaN | NaN | NaN | NaN | NaN | NaN |
| 408 | M285T34 | 284.775031 | 34.079641 | NaN | NaN | NaN | NaN | NaN | NaN |
409 rows × 9 columns
Now we merges single format results with correlation results:
[16]:
# merge summery table with correlation analysis
res = anno.comp_summ_corr(sr, corr_df)
res
[16]:
| name | mz | rt | exact_mass | ppm_error | molecular_formula | compound_name | inchi | inchi_key | cor_grp_size | cor_grp | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M167T35 | 167.021095 | 34.882147 | 167.02 | -4.56 | C7H5NO4 | Quinolinic acid | InChI=1S/C7H5NO4/c9-6(10)4-2-1-3-8-5(4)7(11)12... | GJAWHXHKYYXBSV-UHFFFAOYSA-N | 25.0 | M171T34::M197T36::M209T34::M211T34::M213T34::M... |
| 1 | M276T36 | 276.077397 | 36.385373 | 276.08 | -2.16 | C10H16N2O5S | Biotin sulfone | InChI=1S/C10H16N2O5S/c13-8(14)4-2-1-3-7-9-6(5-... | QPFQYMONYBAUCY-ZKWXMUAHSA-N | 13.0 | M277T36_2::M278T36::M173T36_2::M186T36::M187T3... |
| 2 | M154T37 | 154.062402 | 37.183625 | 154.06 | -3.84 | C8H10O3 | Hydroxytyrosol | InChI=1S/C8H10O3/c9-4-3-6-1-2-7(10)8(11)5-6/h1... | JUUBCHWRXWPFFH-UHFFFAOYSA-N | 12.0 | M155T38::M158T37_2::M164T36::M171T37_2::M173T3... |
| 3 | M181T36 | 181.060407 | 35.734801 | 181.06 | 2.39 | C6H7N5O2 | 8-Hydroxy-7-methylguanine | InChI=1S/C6H7N5O2/c1-11-2-3(9-6(11)13)8-5(7)10... | VHPXSVXJBWZORQ-UHFFFAOYSA-N | 9.0 | M224T36::M225T35::M226T35::M227T36::M269T37_2:... |
| 4 | M174T35 | 174.088395 | 35.001130 | 174.09 | -4.67 | C8H14O4 | Suberic acid | InChI=1S/C8H14O4/c9-7(10)5-3-1-2-4-6-8(11)12/h... | TYFQFVWCELRYAO-UHFFFAOYSA-N | 9.0 | M211T34::M213T34::M219T34::M221T34::M229T35::M... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 395 | M279T50 | 279.159930 | 50.055451 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 396 | M279T79 | 279.163910 | 78.758079 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 397 | M282T85 | 282.207859 | 84.719202 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 398 | M283T47 | 283.110871 | 46.822069 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 399 | M284T108 | 284.223499 | 108.406389 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
400 rows × 11 columns
The result data frame res is re-arranged as four parts from top to bottom:
1st part: identified metabolites, satisfied with correlation analysis
2nd part: identified metabolites, not satisfied with correlation
3rd part: no identified metabolites, satisfied with correlation
4th part: no identified metabolites, not satisfied with correlation
The users should focus on the first part and perform their further analysis.
You can save all results in different forms, such as text format TSV or CSV. You can also save all results into a sqlite3 database and use DB Browser for SQLite to view:
[17]:
f_save = False # here we do NOT save results
db_out = "test.db"
sr_out = "test_s.tsv"
[18]:
if f_save:
# save all results into a sqlite3 database
conn = sqlite3.connect(db_out)
df[["name", "mz", "rt"]].to_sql("peaklist",
conn,
if_exists="replace",
index=False)
corr_df.to_sql("corr_grp", conn, if_exists="replace", index=False)
corr.to_sql("corr_pval_rt", conn, if_exists="replace", index=False)
match.to_sql("match", conn, if_exists="replace", index=False)
mr.to_sql("anno_mr", conn, if_exists="replace", index=False)
res.to_sql("anno_sr", conn, if_exists="replace", index=False)
conn.commit()
conn.close()
# save final results
res.to_csv(sr_out, sep="\t", index=False)
End User Usages
For end users, LAMP provides two computation options: command line interface(CLI) and graphical user interface (GUI).
To use GUI, you need to open a terminal and type in:
$ lamp gui
To use CLI, open a terminal and type in command with required arguments, something like:
$ lamp cli \
--input-data "./data/df_pos_3.tsv" \
--sep "tab" \
--col-idx "1, 2, 3, 4" \
--add-path "" \
--ref-path "" \
--ion-mode "pos" \
--cal-mass \
--thres-rt "1.0" \
--thres-corr "0.5" \
--thres-pval "0.05" \
--method "pearson" \
--positive \
--ppm "5.0" \
--save-db \
--save-mr \
--db-out "./res/test.db" \
--sr-out "./res/test_s.tsv" \
--mr-out "./res/test_m.tsv"
For the best practice, you can create a bash script .sh (Linux and MacOS) or Windows script .bat to contain these CLI arguments. Change parameters in these files each time when processing new data set.
For example, there are lamp_cli.sh and lamp_cli.bat in https://github.com/wanchanglin/lamp/tree/master/examples. You can run them and check the results in directory examples/res:
For Linux and MacOS terminal:
$ chmod +x lamp_cli.sh $ ./lamp_cli.sh
For Windows terminal:
$ lamp_cli.bat
Note that if users use xlsx files for input data and reference file when using GUI or CLI, all data must be in the first sheet. If you use LAMP functions in your python scripts, there are no such requirementss.