1. Introduction
The BIOPEP-UWM database is freely-accessible without registration at the following website:
The BIOPEP-UWM™ database is continuously updated and modified. Several new options have been introduced since the publication of the last article describing it [9]. The aim of the present publication is to provide information helpful in work with the current version of the database and associated tools, including the use of new options introduced in the last three years.
2. Database Organization
The scheme of organization of the BIOPEP-UWM homepage is presented in Figure 1. The screenshot of the homepage is available in Supplementary Figure S1. Apart from a database of bioactive peptides described in this article, the BIOPEP-UWM contains databases of proteins, allergenic proteins, and their epitopes [14] as well as sensory peptides and amino acids [9]. The homepage also has a tab that allows users to submit new peptide sequences (not annotated yet in the database) or new activities (not annotated) of the existing peptides (See Supplementary Figure S2), and also a new BIOPEP-UWM news tab (not indicated in Figure 1).
The “bioactive peptides” tab links with the list of bioactive peptides (Supplementary Figure S3). Access to more detailed information about a particular peptide sequence is available via the “peptide data” tab attributed to each peptide. The page with a peptide list contains links to associated tools enabling the processing of peptide and protein sequences (via the “analysis” tab). Scrolling down using the bar left from the table (Supplementary Figure S3) opens the window, which allows the input of queries, enabling a search.
3. Enlarging the Number of Peptides in the Database by BIOPEP-UWM™ Users
The BIOPEP-UWM database is a curated database. Although it is regularly enriched with the new peptides, it is rather impossible to insert all bioactive peptides that are continuously being found in the literature. Thus, the “submit new peptides” option (see the BIOPEP-UWM homepage; Supplementary Figure S1) enables users to send us a peptide sequence not found in our database so far. The peptide sequence to be added to BIOPEP-UWM has to be provided in a one-letter code by pasting it to the window that appears after clicking the “submit new peptides” tab. All peptides sent this way are verified by our curators and can be uploaded to the database on condition that the sender had provided e-mail and reference data (i.e., details of an article the peptide was published in). Providing the senders’ address enables generating an automatic e-mail confirming that the peptide of interest was successfully submitted by the user to the BIOPEP-UWM database. Publication details are needed to verify the information sent. The lack of the sender’s e-mail as well as reference data on peptide to be inserted to BIOPEP-UWM (mandatory fields for successful submission) makes the submitted information incomplete and may temporarily eliminate the sequence from the process of uploading it to our database.
4. Peptide Information
The current layout of peptide information in the BIOPEP-UWM database has earlier been used in the database of sensory peptides and amino acids [9]. Its implementation into the database of bioactive peptides is still in progress. Information about an example peptide with a GHS sequence (BIOPEP-UWM ID 9473) [15] is presented in Table 1. The screenshot of a peptide page is presented in Figure S4.
The ID number is the first piece of information displayed on a peptide page. A peptide with a single activity annotated in the BIOPEP-UWM possesses one ID number. A peptide annotated as multifunctional possesses more ID numbers. The representative GHS peptide is annotated in the BIOPEP-UWM database twice, i.e., as an inhibitor of renin (EC 3.4.23.15) and angiotensin-converting enzyme (EC 3.4.15.1) [15] (ID 9472 and 9473, respectively). Database ID may serve as an unambiguous identifier of a compound, e.g., peptide. Examples of using ID numbers from peptide databases (e.g., BIOPEP-UWM) as peptide identifiers may be found in, e.g., recent publications of Skrzypczak et al. [16] and Khazaei et al. [17].
Name is the second piece of information on the page of an individual compound. Peptide names are often identical to their activity (e.g., ACE inhibitor). Some well-known peptides possess their own names, e.g., soybean lunasin (BIOPEP-UWM ID 9525 and 9526), the role of which has been reviewed by Hsieh et al. [18].
Peptide sequences are annotated in the BIOPEP-UWM database of bioactive peptides using a standard one-letter code describing 20 protein amino acids and their
Information about the biological activity is inserted as activity (short version), activity code (abbreviation of activity), and function (more detailed version). The current list of activities of peptides found in the BIOPEP-UWM database of bioactive peptides is provided in Table 2. The list of bioactivities has been rearranged as compared to this published in 2008 [8] to remove redundancy (e.g., remove synonymous or extremely rare activities). On the other hand, several new activities, especially these concerning inhibition of enzymes, have recently been added. Annotation of bioactive peptides as compounds interacting with individual enzymes is preferred by users of the BIOPEP-UWM database, as in the case of, e.g., renin inhibitors [23,24,25]. Information concerning the role particular enzymes play in metabolic pathways has recently become available in specialized databases [26].
The peptide entry page also provides the chemical (average) and monoisotopic molecular mass of the peptide and a reference describing its given activity.
Completion of the contents of “additional information” and “database references” tabs is in progress. The “additional information” tab includes peptide structure written using chemical codes called SMILES [27]—the most popular chemical code, and InChI—recommended by IUPAC [21]. These codes represent a typical language of cheminformatics (i.e., chemical informatics) [26]. Cheminformatics is considered as an emerging method in food science [28,29]. SMILES and InChI codes, as well as InChIKeys, are used as input data for the search of molecules in chemical databases [26,30]. The supplement to our previous review [31] may provide insights on how much information about peptide bioactivity is presented in chemical databases. InChIKey is sufficient to search via common search engines such as GoogleTM. This option enables, e.g., finding peptides annotated in the BIOPEP-UWM database. There are many types of software that enable predicting the physicochemical and biological properties of chemical compounds and using, e.g., SMILES. This code may be converted into more than one hundred formats used in chemical informatics, for instance, by OpenBabel software [32]. Examples of using programs which require chemical codes as input data for in silico analysis and prediction of properties of food peptides have been recently presented by Ortiz-Martinez et al. [33], Mojica et al. [34], and Yu et al. [35]. Amino acid sequences are converted into SMILES code using applications available in the BIOPEP-UWM database via the “analysis” tab. Conversion of SMILES representations into InChI and InChIKeys is performed using OpenBabel or MarvinSketch software.
A peptide with a C-terminal amide group cannot be found in protein sequences. Precursors of these peptides, containing C-terminal glycine residues, are thus added. The mechanism of amidation includes the substitution of C-terminal glycine with an amide group [36]. A peptide with ID 2580 may serve as an example of this type of annotation. It is a precursor of antibacterial peptide [37] annotated as ID 2579. Information about amidation is provided in the “additional information” tab of a peptide, being a precursor of the amidated form (in the above example, peptide annotated as ID 2580).
The “additional information” tab also contains brief information about activities of the peptide taken from the BIOPEP-UWM database of bioactive peptides and other databases as well as information about peptide taste from the BIOPEP-UWM database of sensory peptides and amino acids [9].
Information about food resources and products, different values of IC50 are also included in the “additional information” tab for some of the peptides.
The “database reference” summarizes databases providing information about a given peptide (for example, see Table 1). The list of databases most commonly cited in the above tab is presented in Table 3. The list has been significantly enriched since the publication of our previous article describing the database [9]. ID numbers of peptides are also provided in particular databases. Some databases (such as ACToR [43] or ChemIDPlus [44]) use CAS registry numbers as compound identifiers. The databases are available via the MetaComBio website [11] or the “useful links” tab on the BIOPEP-UWM website. The list of databases cited has been significantly enlarged since 2016 (Table 3).
The last tab “screen and print peptide data” summarizes all data concerning a given peptide. Supplementary Table S1 is copied directly from the above tab. In the supplement to our previous publication [9], we have pointed out the opportunity for providing links to this tab from other resources. Examples of such links are available in the supplement to our review concerning taste-affecting peptides [31]. Here we offer the opportunity to construct links to peptide pages (“activity” tabs). The data of example peptide (ID 9473) can be found at the following address:
5. Search Options
Search options are summarized in Table 4 and Supplementary Figure S5.
Search options available in the BIOPEP-UWM database of bioactive peptides fall into the following major categories: text-based (ID, name, activity, reference, and InChIKey), structure-based (sequence-based), and property-based (number of amino acid residues and molecular mass). They are typical of peptide databases [4]. The use of an ID number as a query is the first search option. A single ID number corresponds to a single peptide with one defined activity. Search by name or by activity offers two possibilities to the user: finding all names or all activities including the chosen word or text fragment or exact search (see Supplementary Figure S5). The first opportunity leads to finding more peptides that fulfill the search criterion. Using the word “hemorphin-7” as a query, we can find four peptides (ID 2570, 2973, 3079, and 9001) without using the exact search option and only one (ID 3079) using the exact search option (search performed on 30 August 2019).
The search menu contains a link to the list of activities (Supplementary Figure S5), which serve for a query choice. In contrast to Table 2, the bioactivities are listed in the chronological (not alphabetical) order. Again, it is possible to use the exact search option. Using the word “inhibitor” as a query without using the exact search option has given a list of 1552 peptides as an output (30 August 2019). The list contains all inhibitors of enzymes (e.g., ACE, dipeptidyl peptidase IV, and dipeptidyl peptidase III). The exact search option with the same query found only 67 peptides with the activity annotated as “inhibitor” (see Table 2).
InChIKey is the most typical identifier of compounds (e.g., peptides) in chemical databases (e.g., PubChem [22]; ChemSpider [54], and ChEMBL [40]). Although it is a unique identifier of any chemical compound, it does not provide information about its structure [21]. InChIKeys in the BIOPEP-UWM database correspond to linear peptides with all chirality centers defined, acidic and basic groups electrically neutral, and cysteine residues reduced (if any in the peptide sequence). Incomplete InChIKey used as a query may result in finding more peptides. For instance, a “DYKIIFRCSA-N” fragment occurs in three InChIKeys corresponding to the celiac toxic peptide with the sequence PSQQQP (ID 2578), ACE inhibitor GPAGAPGAA (ID 3363), and antibacterial peptide ALCSEK (ID 4011). These peptides have no common fragments (subsequences). The use of incomplete InChIKey with the exact search option will fail to produce any results.
The sequence-based search is the most common and most intuitive option used to find peptide information in the database [4]. The BIOPEP-UWM database offers an opportunity to find all longer sequences containing a query fragment and to find a given sequence (exact). The first opportunity allows user to find peptides containing a defined continuous motif, e.g., attributed to the given function [74,75]. This search option also follows the fragmentomics concept [76]. It assumes that shorter (functional) bioactive subsequences present in a sequence may be crucial for the biological activity of the entire peptide molecule (peptide). Examples of peptides inscribing into this concept may be found in the BIOPEP-UWM (e.g., hemorphins or ACE inhibitors from caseins) and in other peptide databases such as EROP-Moscow [20], PepBank [67], SATPdb [70] or AHTPDB [45]. The exact search option is sufficient to check the bioactivity of peptides identified among protein hydrolysis products. An example of such an experiment has recently been described by Martini et al. [77] and Garcia-Vaquero et al. [78].
In the case of the property-based search (involving the number of amino acid residues or molecular mass range), choosing the exact search option does not change the output. We generally recommend using the exact search option for the sequence-based search.
6. Analysis
The “analysis” page includes the following tabs: “profiles of potential biological activity”, “calculations”, “enzyme(s) action”, “find”, “batch processing”, “definitions”, “SMILES”, and “find the enzyme for peptide release” (Supplementary Figure S6).
The profile of a potential biological activity is defined as the type and location of bioactive fragments in a protein or a peptide chain [79]. This idea is based on the assumption that the same bioactive fragment, especially a short one (2–3 amino acid residues), cannot be attributed to a given protein, but may be present in many sequences (many form the so-called common subsequences) [75,79]. The concept of profiles of the potential activity of peptide fragments is consistent with the fragmentomic approach proposed by Zamyatnin [76] (see above). The profiles of potential biological activity of proteins can be obtained using the asterisk by default. Examples of published profiles of the potential activity of peptide or protein fragments may be found in publications of Bauchart et al. [80], Huang et al. [81], Tapal et al. [82], Khazaei et al. [17], and Jakubczyk et al. [83]. The profile may also be constructed for the specific bioactivity (bioactivity of interest) when selecting the activity instead of an asterisk from a toolbar. The menu to be used for the construction of potential biological activity profiles is shown in Supplementary Figures S7–S9. The profile of a potential biological activity of a protein or a peptide sequence is presented as a table including the following columns: ID, name of peptide, activity, number of repetitions of a particular bioactive fragment in a query sequence, sequence of the bioactive fragment, and location of the bioactive fragment in a query sequence. An example of the above profile is presented in Supplementary Table S2.
The “calculations” tab enables calculating two quantitative parameters that characterize proteins as potential precursors of bioactive peptides: the frequency of bioactive fragments occurrence in a protein sequence (A) and a potential biological activity of protein fragments (B). Equations 1 and 2 enabling calculation of the above parameters are provided in Table 5. The menu of the “calculations” tab is shown in Supplementary Figure S10. An example of the output is presented in Supplementary Table S3. The frequency of bioactive fragments occurrence in a protein sequence (A) is calculated for all bioactive peptides present in the query sequence (using the asterisk) or for one specific peptide (by choosing the bioactivity from a toolbar). Potential biological activity of protein fragments (B) may be calculated only if peptide IC50 or EC50 is available. The program skips peptides without known IC50 or EC50 value. For instance, Supplementary Table S3 provides B values for ACE and DPPIV inhibitors only. In the case of other activities, B values have not been calculated due to the lack of IC50 or EC50 attributed to particular peptides. Articles published by Udenigwe et al. [84] and Lin et al. [85] contain representative results of calculations of quantitative parameters characterizing food proteins as potential precursors of bioactive peptides.
The “Enzyme(s) action” tab allows simulating proteolysis catalyzed by endopeptidases. The scheme of steps required to obtain the peptides potentially released by a given enzyme (or enzymes) is presented in Figure 2. Screenshots of menus of particular tabs are presented in Supplementary Figures S11–S16. The menu also enables enzyme choice (Supplementary Figures S12 and S13). It allows the simulation of proteolysis using one to three enzymes. Example information about a single enzyme (plasmin; EC 3.4.21.7; MEROPS ID: S01.233) is presented in Supplementary Figure S14. The enzyme is annotated using a connection ID, indicating a single peptide bond hydrolyzed by the enzyme and enzyme ID. One enzyme may cover few connection IDs (in the case of plasmin—two). Enzyme specificity is described using two terms: a recognition sequence understood as a fragment of an amino acid sequence recognized by the proteolytic enzyme and a cutting sequence understood as an amino acid residue preceding or following the bond hydrolyzed by protease [8]. The recognition sequence may contain a single amino acid residue (e.g., for plasmin) or a longer fragment such as for a ginger protease—zingipain (EC 3.4.22.67; MEROPS ID: C01.017). Annotations “C-terminus” and “N-terminus” indicate bonds formed by a carboxyl and amine group of an amino acid residue, respectively, hydrolyzed by the enzyme. Data concerning particular enzymes contain references: databases such as MEROPS [41] and CutDB [56] or publications (Bastian and Brown [89] for plasmin and Huang et al. [90] for zingipain). Apart from the addition of new enzymes, the specificity has recently been modified for some of the existing ones. The modification included the addition of new recognition sequences and cutting sequences (possessing connection IDs within the range 141–184), according to data presented in the so-called specificity matrices in the MEROPS database. These matrices are continuously updated to follow newly appearing information about new sites susceptible to proteolysis in protein sequences [41]. Proteolysis simulation is simplified. It assumes that all bonds theoretically susceptible to a given proteinase are hydrolyzed. In real experiments, the proteolysis is often incomplete. This finding may explain false-positive results, i.e., lack of expected peptides. False-negative results may be explained by incomplete knowledge about proteolytic specificity, i.e., the situation when some bonds susceptible to the proteolytic enzyme are considered resistant. The addition of new recognition and cutting sequences to the enzyme data aims to minimize the occurrence of false-negative results.
Results of simulated proteolysis of an example peptide can be found in Supplementary Figure S15. Displayed results of the initial step of simulation include sequences of peptides being products of proteolysis and their location in the precursor sequence. The next step may include the search for bioactive peptides among products of simulated proteolysis or calculation of quantitative parameters characterizing the proteolysis (Figure 2 and Figure S15). The parameters available via the “enzyme(s) action” tab are calculated according to Equations (3)–(7) from Table 5. Representative results of the search for active peptides among simulated proteolysis products and calculation of quantitative parameters are presented in the Supplement (Tables S4 and S5, respectively). Calculation of parameters BE and V involves EC50 or IC50 values. If they are not available, peptides are not taken into account. Simulation of proteolysis using the BIOPEP-UWM database has recently been described by, e.g., Lin et al. [85], Yu D. et al. [91], and Kandemir-Cavas et al. [92]. Data concerning proteolysis simulation may be interpreted together with protein structures [93].
A new tab named “search for enzymes with given specificity” enables the search for information about an enzyme using recognition sequence, cutting sequence, and choice between C- and N-terminus (bond formed by carboxyl or amine group of amino acid residue, respectively). Results include the list of enzymes with a given specificity. An example of a query and result produced using the above option may be found in Supplementary Figure S16 and Supplementary Table S6. For most of the enzymes, the recognition sequence contains only one amino acid residue.
The content of the new “find” tab enables quickly finding of some information in protein and bioactive peptide databases. Particular tabs enable display of a full list of protein sequences annotated in the BIOPEP-UWM database, a full list of peptides revealing a given activity, and a list of all proteins or peptides containing the query sequence (see Supplementary Figure S17). The last option enables finding all proteins or peptides containing a given bioactive fragment or a recognition sequence available for the proteolytic enzyme. An example result of a search for a VPP sequence in the database of bioactive peptides is presented in Table S7 (Supplement). Results cover links to peptide or protein data, ID number, name, and sequence.
Another new “batch processing” option serves for the simultaneous processing of a set of few sequences of proteins or peptides being potential precursors of bioactive peptides. The total length of all sequences forming the query set may be up to c.a. 1500 amino acid residues. The scheme of activities available via this option is presented in Figure 3. The screenshot of the input window is available in Supplementary Figure S18. The FASTA format [94] is used to input a set of sequences. The “batch processing” option enables performing any action available via the tabs: “profiles of potential biological activity”, “calculations”, and “enzyme(s) action”. Moreover, there are new parameters characterizing the occurrence and possibility of enzymatic release of an individual peptide from few precursor sequences (aT, aS, AS, aTE, aSE, ATE) calculated according to Equations (8)–(10) and (12)–(14) in Table 5. Distribution of particular fragments in the set of sequences may be in the focus of scientific interests when using in silico methodologies [76,77,95]. Analysis may cover all possible or selected options. Supplementary Figure S18 shows a set of sequences ready for an analysis concerning bioactive peptides (excluding options concerning data from the database of allergenic proteins). The batch analysis is performed in two steps (Figure 3). The first step may be performed for all activities (default option) or a selected one. The second step may be performed after the first one had been completed. The parameters may be calculated for all bioactive fragments found in the set of sequences or for manually selected peptides. Results of the first and the second step of batch analysis are presented in Supplementary Tables S8 and S9, respectively.
The “definitions” tab summarizes terms and definitions used in the BIOPEP-UWM database including equations used to calculate quantitative parameters, as shown in Table 5.
The “SMILES” tab, introduced in 2018, enables translating amino acid sequences (written using standard one-letter code) into the chemical language “SMILES”. SMILES representations are built according to a simplified algorithm described by Siani et al. [96]. SMILES codes of particular amino acid residues are written using the same layout as used in the SwissSidechain database [72] and source codes of the CycloPs program [97] (program temporarily unavailable). The procedure was tested and verified according to recommendations proposed in our previous publication [98]. The MarvinSketch 17.28 software (ChemAxon, Budapest, Hungary) was used to test and verify SMILES strings of peptides. The application utilizes the sequences of peptides built from 20 protein amino acids, their D-enantiomers, L- and D-phosphoserine (Symbols B and b, respectively), and C-terminal amide group. It is easy and fast in use and can process linear peptides only. Disulfide bonds and other modifications may be inserted using molecule editors (e.g., MarvinSketch, Dendrimer Builder program provided by the University of Bern, Switzerland, and molecule editor of the NANPDB [65] database) which may serve as alternatives to our application. The first one may be used to construct any molecules from building blocks drawn or imported as SMILES strings, the second, to build representations of branched peptides containing some non-protein amino acids, whereas the third to encode pyrrolysine and selenocysteine apart from 20 most common protein amino acids. Our application converts amino acid sequences into the so-called aromatic SMILES. Some search engines do not utilize this version [30]. The aromatic version of the SMILES string may be converted into an alternative, so-called Kekule version using, e.g., the molecule editor of the PubChem database [99] or MarvinSketch software. Screenshots of the “SMILES” tab window with query and result are given in the Supplementary Figures S19 and S20, respectively. Two types of SMILES representations of the example peptide may be found in Supplementary Table S10.
The way of understanding the output information when using the new tab entitled “find the enzyme for peptide release” is summarized in Figure 4. The screenshot of the menu of this tab and representative results are shown in the Supplementary Figure S21 and Supplementary Table S11, respectively. The input includes peptide sequences provided in FASTA format and the precursor (protein or peptide) sequence. The output includes a list of all enzymes with the specificity sufficient to catalyze particular proteolytic events. A proteolytic event is understood as a case of cleavage of an individual peptide bond. This term has been introduced in the CutDB database [56]. Release of a peptide from the precursor sequence requires two proteolytic events: cleavage of bond preceding N- and following C-terminus (indicated in Supplementary Table S11 as N and C, respectively). If a given peptide appears in the precursor sequence more than once, then the particular events attributed to this peptide are indicated as 1N, 1C, 2N, 2C, and so forth. An example peptide with the AP sequence occurs in the precursor sequence RWAFAPGFAPGHIP twice (positions 5–6, 9–10). Its release is associated with four proteolytic events: 1N—cleavage of the bond between the residues 4 and 5, 1C—cleavage of the bond between the residues 6 and 7, 2N—cleavage of the bond between the residues 8 and 9, and 2C—cleavage of the bond between the residues 10 and 11. Displayed results concerning enzyme catalyzing the particular proteolytic events cover the following data: name, EC number, enzyme ID in the BIOPEP-UWM database, connection ID, cutting sequence, and recognition sequence. The cutting sequences are described using the symbols “+” and “−“ assigned to the amino acid symbols. Symbol “+” means that the amino acid residue follows the cleaved bond, i.e., this bond is formed by the amine group. For example, the symbol “A+” means that the cleaved bond is formed by the amine group of alanine. The symbol “−“ means that the amino acid residue is located before the cleaved bond, i.e., this bond is formed by the carboxyl group of the amino acid. For instance, the symbol “W-“ means a bond formed by the carboxyl group of tryptophan. Enzymes releasing N- and C-terminus are summarized separately. This solution may be justified by the fact that peptides may be released by more than one enzyme (N- and C- terminus are not released by the same enzyme). This process can be exemplified by protein digestion in the human gastrointestinal tract [100].
7. Useful Links and Other Tabs
The BIOPEP-UWM plays the role of a metaserver enabling access to databases and software useful in research concerning peptides and proteins. The linked tools available via the “useful links” tab (Figure 1; Supplementary Figure S1) are divided into categories according to Minkiewicz et al. [101]. These categories are summarized in Table 6.
Other tabs available from the BIOPEP-UWM main page are as follows: List of publications of our group concerning the BIOPEP-UWM database, brief summary concerning the database (“about BIOPEP-UWM” tab), publications concerning particular parts of the BIOPEP-UWM database recommended to be cited by users, and contact data of database curators.
8. Final Remarks
This paper presents the current status of the BIOPEP-UWMTM database including changes introduced within the period of 2016–2019. Apart from the addition of new peptides (562 items added since submission of our last publication describing the database of sensory peptides and amino acids [9]), information about the existing ones has been completed (especially chemical codes and database references). We also added several new options that are summarized in the Table 7.
The content of this publication is not restricted to description of new changes in the database and associated tools during the last three years. We try to provide a complete description including both old and new options.
The next modifications would be aimed at removing the weak points of the database and associated applications. We would like to ask users to submit new peptides (via the current version of the “submit new peptide” tab) and any remarks helpful in improving the bioinformatic tool described in this paper.
Supplementary Materials
Supplementary materials can be found at
Author Contributions
P.M., A.I., and M.D. are curators of the BIOPEP-UWM database. P.M., M.D., and A.I. designed new options and applications associated with the BIOPEP-UWM database. P.M., A.I., and M.D. have written the manuscript. Funding acquisition—M.D. and A.I.
Funding
The project was financially supported by the Minister of Science and Higher Education in the range of the program entitled “Regional Initiative of Excellence” for the years 2019–2022, Project No. 010/RID/2018/19, amount of funding 12,000,000 PLN and University of Warmia and Mazury, grant number 17.610.014-300.
Acknowledgments
Authors thank Krzysztof Sieniawski and Mariusz Falkowski (Enter Krzysztof Sieniawski, Olsztyn, Poland) for IT support; and also Monika Hrynkiewicz, Marta Turło, Agnieszka Skwarek, Monika Pliszka, and Piotr Starowicz, for adding new data to the BIOPEP-UWM database; furthermore Iwona Szerszunowicz and Kamila Licka for pointing out some weak points of the database and associated software; and finally ChemAxon (Budapest, Hungary) for academic license for MarvinSketch program.
Conflicts of Interest
The authors declare no conflict of interests.
Abbreviations
ACE | Angiotensin-converting enzyme (EC 3.4.15.1) |
ACToR | Aggregated Computational Toxicology Online Resource |
AHTPDB | Antihypertensive Peptide Database |
APD | Antimicrobial peptide database |
BioPepDB | Bioactive Peptide Database |
BRENDA | Braunschweig Enzyme Database |
CAMKII | Ca2+/calmodulin-dependent protein kinase (EC 2.7.11.17) |
CAMP | Collection of antimicrobial peptides |
CaMPDE | Calmodulin-dependent phosphodiesterase 1 (EC 3.1.4.17) |
CancerPPD | Anticancer protein and peptide database |
CAS | Chemical Abstract Service provided by American Chemical Society |
CID | Compound Identifier (in PubChem database) |
DB | database |
DBAASP | Database of Antimicrobial Activity and Structure of Peptides |
EBI | European Bioinformatics Institute |
EC50 | Concentration corresponding to half-maximal activity |
EMBL | European Molecular Biology Laboratory |
EROP | Endogenous Regulatory Oligopeptide knowledgebase |
FeptideDB | Food Peptide Database |
GPR14 | Abbreviation of urotensin II receptor |
HMDB | Human Metabolome Database |
IC50 | Concentration corresponding to half-maximal inhibition |
InChI | International Chemical Identifier |
HMG-CoA | 3-hydroxy-3-methyl-glutaryl-coenzyme A (PubChem CID: 445127; CAS registry No 1553-55-5) |
InChIKey | Key of International Chemical Identifier |
IUPAC | International Union of Pure and Applied Chemistry |
KEGG | Kyoto Encyclopedia of Genes and Genomes |
MBPDB | Milk Bioactive Peptide Database |
MetaComBio | Meta Compound Bioactivity |
MilkAMP | Milk antimicrobial peptide database |
SATPdb | Structurally Annotated Therapeutic Peptide database |
SMILES | Simplified Molecular Input Line Entry System or Simplified Molecular Input Line Entry Specification |
UWM | University of Warmia and Mazury |
Figures and Tables
Figure 1. Scheme of organization of the BIOPEP-UWM database of bioactive peptides.
Figure 2. Scheme of the “enzyme(s) action” tab. Option (see Figure 1) “search for enzyme with given specificity” is not included in the Figure. A screenshot of the menu of this tab is presented in Supplementary Figure S11.
Content of a page of a representative peptide.
ID | 9473 | ||
Name | ACE inhibitor | ||
Sequence | GHS | ||
InChIKey | LPCKHUXOGVNZRS-YUMQZZPRSA-N | ||
Function | Inhibitor of Angiotensin-Converting Enzyme (ACE) (EC 3.4.15.1) (MEROPS ID: M02-001) | ||
Number of Amino Acid Residues | 3 | Activity Code | ah |
Activity | ACE inhibitor | ||
Chemical Mass | 299.2740 | Monoisotopic Mass | 299.1110 |
IC50 | 0.00 µM | ||
Bibliographic Data | |||
Authors | He R., Malomo S. A., Alashi A., Girgih A. T., Ju X., Aluko R. E. | ||
Title | Glycinyl-histidinyl-serine (GHS), a novel rapeseed protein-derived peptide, has a blood pressure-lowering effect in spontaneously hypertensive rats. J. Agric. Food Chem., 61, 8396-8402, 2013 | ||
Year | 2013 | Source | Journal |
Additional Information | |||
BIOPEP-UWM database of bioactive peptides |
|||
Database Reference | |||
AHTPDB: ID 1053, 2949 |
List of activities of peptides annotated in the BIOPEP-UWM database of bioactive peptides.
Activity | Description 1 |
---|---|
ACE inhibitor 2 | Inhibitors of angiotensin-converting enzyme (ACE) (EC 3.4.15.1) (MEROPS ID: M02-001) |
activating ubiquitin-mediated proteolysis | Peptides activating proteolysis mediated by ubiquitin |
alpha-amylase inhibitor 2 | Inhibitors of α-amylase (EC 3.2.1.1) |
alpha-glucosidase inhibitor 2 | Inhibitors of α-glucosidase (EC 3.2.1.20) |
anorectic | Peptides causing a decrease in food intake and suppression of appetite. |
antiamnestic | Inhibitors of prolyl oligopeptidase (EC 3.4.21.26) (MEROPS ID: S09.001). The enzyme catalyzes degradation of neuropeptides, e.g., involved in processes associated with memory. |
antibacterial | Peptides revealing any action against bacteria |
anticancer | Peptides revealing any action against cancers |
antifungal | Peptides revealing any action against fungi |
anti-inflammatory | Peptides reducing inflammation or swelling |
antioxidative | Peptides inhibiting oxidation |
antithrombotic | Inhibitors of blood coagulation. Inhibitors of thrombin (EC 3.4.21.5) (MEROPS ID: S01.217) are attributed to this activity. |
antiviral | Peptides revealing any action against viruses. Inhibitors of viral enzymes are included. |
bacterial permease ligand | Ligands of bacterial permeases |
binding 2 | Peptides binding any biomolecules. Mineral binding peptides are also attributed to this activity. |
CaMKII inhibitor 2 | Inhibitors of Ca2+/calmodulin-dependent protein kinase (CaMKII) (EC 2.7.11.17) |
CaMPDE inhibitor 2 | Inhibitors of 3′,5′-cyclic-nucleotide phosphodiesterase (Calmodulin-dependent phosphodiesterase 1—CaMPDE) (EC 3.1.4.17) |
chemotactic | Peptides inducing chemotaxis, i.e. movement in response to a chemical stimulus |
celiac toxic | Peptides toxic to people suffering from celiac disease |
contracting | Peptides stimulating muscle contraction |
dipeptidyl peptidase III inhibitor 2 | Inhibitors of dipeptidyl peptidase III (EC 3.4.14.4) (MEROPS ID M49.001) |
dipeptidyl peptidase IV inhibitor 2 | Inhibitors of dipeptidyl peptidase IV (EC 3.4.14.5) (MEROPS ID S09.003) |
embryotoxic | Peptides toxic to animal embryos |
hemolytic | Peptides destroying red blood cells |
heparin binding 2 | Heparin binding peptides |
HMG-CoA reductase inhibitor 2 | Inhibitors of 3-hydroxy-3-methyl-glutaryl-coenzyme A reductase (HMG-CoA reductase) (EC 1.1.1.34) |
hypotensive | Peptides causing blood pressure decrease |
immunomodulating | Peptides modulating activity of the immune system |
immunostimulating | Peptides stimulating activity of the immune system |
inhibitor 2 | Peptides inhibiting various biological processes. Information about processes is provided on the pages of individual peptides. |
membrane-active 2 | Peptides affecting transmembrane transport |
natriuretic | Peptides inducing the excretion of sodium by kidneys (natriuresis) |
neuropeptide | Peptides affecting activity of the nervous system |
opioid | Ligands of opioid receptors |
opioid agonist | Agonists of opioid receptors |
opioid antagonist | Antagonists of opioid receptors |
orphan receptor GPR14 agonist | Agonists of orphan receptor GPR14 |
Protein Kinase C inhibitor 2 | Inhibitors of protein kinase C (EC 2.7.11.13) |
regulating | Peptides regulating various biological processes. Information about processes is provided on the pages of individual peptides. |
renin inhibitor 2 | Inhibitors of renin (EC 3.4.23.15) (MEROPS ID A01.007) |
stimulating | Peptides stimulating various biological processes. Information about processes is provided on the pages of individual peptides. |
toxic 2 | Toxic peptides |
vasoconstrictor | Peptides causing blood pressure increase |
1 More information concerning enzymes inhibited by peptides is available in the following databases: ExplorEnz [38], BRENDA [39], ChEMBL [40], and MEROPS [41]. Information about associations between abnormal enzyme activity and diseases may be found in the OpenTargets database [42]. 2 Activities absent in the version described in our publication from 2008 [8].
Table 3Databases cited on the “Database reference” page and other bioinformatic tools mentioned in the publication.
Database Name | Website 1 | Reference |
---|---|---|
ACToR 2 |
|
[43] |
AHTPDB 2 |
|
[45] |
APD |
|
[46] |
BindingDB 2 |
|
[47] |
BioPepDB |
|
[48] |
BitterDB 2 |
|
[49] |
Brainpeps |
|
[50] |
BRENDA 1 |
|
[39] |
CAMP |
|
[51] |
CancerPPD |
|
[52] |
ChEBI 2 |
|
[53] |
ChEMBL 2 |
|
[40] |
ChemIDplus 2 |
|
[44] |
ChemSpider 2 |
|
[54] |
CompTox |
|
[55] |
CutDB |
|
[56] |
DBAASP |
|
[57] |
Dendrimer Builder |
|
* |
DrugBank 2 |
|
[58] |
EROP-Moscow 2 |
|
[20] |
ExplorEnz |
|
[38] |
FeptideDB |
|
[13] |
FooDB 2 |
|
* |
Hemolytik |
|
[59] |
HMDB 2 |
|
[60] |
J-Global |
|
* |
KEGG 2 |
|
[61] |
LabWorm |
|
* |
MarvinSketch |
|
* |
MBPDB |
|
[62] |
MEROPS 2 |
|
[41] |
MetaboLights |
|
[63] |
MetaComBio 2 |
|
[11] |
MilkAMP |
|
[64] |
NANPDB |
|
[65] |
NeuroPep |
|
[66] |
omicX |
|
* |
OpenBabel 2 |
|
[32] |
OpenTargets |
|
[42] |
PepBank 2 |
|
[67] |
PeptideDB |
|
[68] |
ProPepper |
|
[69] |
PubChem 2 |
|
[22] |
SATPdb 2 |
|
[70] |
SpirPep |
|
[12] |
SureChEMBL |
|
[71] |
SwissSidechain |
|
[72] |
ZINC 2 |
|
[73] |
1 Accessed in July and August 2019. 2 Tools cited in our previous publication [9]. * No reference available.
Table 4List of search options available in the BIOPEP-UWM database of bioactive peptides. Options described in this table have been announced in [30].
Search Option | Output | |
---|---|---|
Version without Exact Search | Version with Exact Search 1 | |
ID | Peptide with given ID | |
Name | List of all peptides with the name containing the given word (words) | Peptide with the given name (may appear more than once if it is annotated with more activities) |
Activity | Complete list of peptides with all activities named using the given word (e.g., inhibitor) | List of all peptides with the given activity |
Mass | List of all peptides having molecular masses within the given range (e.g., 500–600) | |
Reference | List of all peptides described in articles published by the given author (or authors with the same second name) | |
Sequence | List of all peptides with sequences containing the given fragment | Peptide with the given sequence (may appear more than once if it is annotated with more activities). 2 |
Number of amino acid residues | List of all peptides containing the given number of amino acid residues (e.g., 3) | |
InChIKey 1 | Peptide with the given InChIKey. Peptide exhibiting more than one activity annotated in the BIOPEP-UWM will appear more than once 2 |
1 New search options. 2 These options give equivalent search results.
Table 5Quantitative parameters characterizing proteins as potential precursors of bioactive peptides, available in the BIOPEP-UWM database.
Equation No. | Parameter | Reference |
---|---|---|
1. 1 | The frequency of bioactive fragments occurrence in a protein sequence (A) |
[86] |
2. 1 | Potential biological activity of protein fragments (B) [μM−1] |
[86] |
3. 2 | The frequency of release of fragments with a given activity by selected enzymes (AE) |
[87] |
4. 2 | The relative frequency of release of fragments with a given activity by selected enzymes (W) |
[87] |
5. 2 | Activity of fragments potentially released by proteolytic enzyme (enzymes) (BE) |
* |
6. 2 | Relative activity of fragments potentially released by proteolytic enzyme (enzymes) (V) |
* |
7. 2 | Theoretical degree of hydrolysis (DHT) |
[88] |
8. 3 | The number of repetitions of the bioactive fragment in all sequences of the protein/peptide set analyzed (aT) |
* |
9. 3 | The number of repetitions of a given fragment in all sequences of the selected protein/peptide fraction (aS) |
* |
10. 3 | The mean frequency of the occurrence of a single fragment in a sequence of protein/peptide classified to a given group (AS) |
* |
11. 4 | The total number of amino acid residues in all protein/peptide sequences belonging to the set (NT) |
* |
12. 3 | The number of cases of release of the bioactive fragment from all sequences of the protein/peptide set analyzed (aTE) |
* |
13. 3 | Mean number of cases of predicted release of a single fragment by a selected enzyme from the chain of protein/peptide belonging to the set analyzed (aSE) |
* |
14. 3 | Predicted frequency of release of a single peptide by proteolytic enzyme from the set of protein/peptide sequences analyzed (ASE) |
* |
1 available via the “profiles” tab and “batch processing” tab. 2 available via the “enzyme (s) action” tab and “Batch processing” tab. 3 available via the “batch processing” tab only. 4 not displayed among the results. Shown only to explain the calculation of other parameters. * New parameters described for the first time in this publication. Some of them have been announced in [4].
Table 6Categories of bioinformatic tools available via the “useful links“ tab.
Category | Description |
---|---|
Bioactive peptide databases | Databases of biologically active peptides including general databases (covering several activities) or databases of particular activities (e.g., antimicrobial) |
Bioactivity prediction | Software predicting biological activity of peptides, especially interactions with proteins, e.g., enzymes |
Immunology of proteins and peptides | Databases of allergens and epitopes, software for predicting allergenicity and occurrence of epitopes as well as other software from the area of immunology |
Literature data mining | Software supporting search for biomedical data (e.g., concerning proteins and peptides) in literature |
Miscellaneous | Databases and software not belonging to other categories. Chemical databases and metabases are attributed to this category. |
Motifs | Programs enabling constructing sequence motifs and finding them in protein or peptide sequences |
Physicochemical properties | Software used to predict and exploit the physicochemical properties of peptides |
Prediction of post-translational modifications | Software used to predict the location of post-translational modifications (phosphorylation, glycosylation) in protein and peptide sequences |
Programs supporting peptide design | Software supporting design of peptides with desired biological properties |
Protein resources | Databases and software concerning proteins but not peptides, including databases of protein sequences and structures |
Proteolysis | Databases annotating proteolytic enzymes, software for proteolysis simulation |
Proteomic tools | Tools supporting proteomics research including mass spectrometry |
Sequence alignments | Software for constructing protein and peptide sequence alignments and for searching in protein sequence databases |
Structure prediction and visualization | Software for modeling secondary and tertiary structures of proteins and peptides |
New options in the BIOPEP-UWM database and modifications of existing ones, not described in the previous publications [8,9].
Option | Description |
---|---|
Peptide annotation | Possibility of annotation of peptides containing D-amino acids |
Search options 1 | Search on the basis of InChIKey; addition of “exact match” search as user’s choice, designed especially for sequence search |
List of peptide activities | List of peptide activities rearranged and enriched |
Proteolytic enzyme annotation | Updated list of bonds susceptible to proteolytic enzyme action |
New search options | Search on the basis of InChIKey; addition of “exact match” search as user’s choice |
“SMILES” tab 1 | Application converting amino acid sequences into the SMILES code |
New options available via the “enzyme(s) action” tab | New quantitative parameters describing possibility of release of bioactive peptides by proteolytic enzymes—Equations (5)–(7) in Table 5, option enabling finding enzyme with a given specificity among proteinases annotated in the database |
“find the enzymes for peptide release” tab | Option which enables finding proteolytic enzymes liberating of N- and C-termini of bioactive peptides |
“find” tab | Shortcut to the list of peptides with a given activity |
Batch processing | Option which enables finding profiles of potential biological activity of fragments, calculating quantitative parameters that characterize protein or peptide, and simulating proteolysis for a set of sequences |
Quantitative parameters characterizing occurrence and possibility of release of bioactive peptide from a set of sequences | Parameters calculated via the “batch processing” option—Equations (8)–(10) and (12)–(14) in the Table 5 |
The “BIOPEP-UWM news” tab | Tab designed to provide important news concerning the database |
1 Application serving for conversion amino acid sequences into SMILES code has been announced in [4].
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019 by the authors.
Abstract
The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer