Pf Target Browser
This web app, created by the Winzeler Lab, was designed to be a resource for MalDA (Malaria Drug Accelerator) Consortium's Plasmodium falciparum target prioritization efforts. We have compiled a variety of annotations for all genes in the P. falciparum 3D7 genome (PlasmoDB release 66) relevant to their potential as antimalarial targets.
To navigate this app, you can:
- Enter a keyword into the search bar, which will show matches to protein-coding genes based on Pf3D7 gene ID, gene symbol, and product description. If there is exactly one match, it will redirect to the corresponding gene page (e.g. mdr1)
- Enter the Pf3D7 ID of the gene of interest (e.g. PF3D7_0417200) at the end of the URL
- Visit the Gene List tab for a full list of protein-coding genes
- There, you can view up to 6 additional gene annotation columns and sort by a specific column
All data displayed in this website can be downloaded in tabular format from the following repository: Figshare link.
Methods
P. falciparum 3D7 Genes
5,318 protein coding genes (5,389 transcripts) were considered from the Plasmodium falciparum 3D7 reference genome in PlasmoDB Release 66 - 28 Nov 2023 (accessed January 3rd, 2024).
PlasmoDB Annotations
The following gene/transcript annotations were taken directly from PlasmoDB Release 66 using the "Search Strategy" function:
- Gene/transcript ID, Entrez ID, UniProt ID(s), gene symbol, product description, ortholog group (OrthoMCL)
- Genomic location, gene/CDS/protein length, molecular weight, isoelectric point
- Domain annotations (Interpro, PFam, Superfamily), # transmembrane (TM) domains, SignalP peptide prediction [ref]
- Computed and curated Gene Ontology (GO) components, functions and processes; Enzyme Commission (EC) numbers
- Genetic variation across all strains in PlasmoDB (number of unique noncoding, synonymous, nonsynonymous, and nonsense SNPs)
Pf Target Browser includes linkouts to BRENDA for EC numbers and AmiGO for GO terms.
Functional Information
To provide additional data on biological function, we include the following:
-
Summary of gene expression across sporozoite, ring, trophozoite, schizont and gametocyte stages based on the Le Roch et al. oligonucleotide array dataset [ref] and Malaria Cell Atlas Chromium 10x single cell RNA-seq dataset [ref]. RNA-seq expression data was taken directly from pf-ch10x-exp.csv in pf.zip downloaded from the Malaria Cell Atlas website in February 2024.
- For the Le Roch et al. microarray dataset, which reports expression and logP (likelihood of gene absence calculated using the MOID algorithm) values for 9 substages synchronized using two different methods (sorbitol or thermocycling), genes were labelled "expressed" in a particular stage if at least one substage with either synchronization method had expression ≥ 30 and logP ≤ -1, "not expressed" if no substage had expression ≥ 10 and logP ≤ -0.5, and otherwise "possibly expressed."
- For the Malaria Cell Atlas scRNA-seq dataset, mean normalized expression across cells and proportion of cells with expression value of 0 for the gene are reported. There are a total of 37,624 cells with the following distribution of cells per stage:
- Ring - 7,071 | Trophozoite - 13,436 | Schizont - 8,159 | Gametocyte - 8,958
- Link to gene ID search in the Malaria Parasite Metabolic Pathways website, which includes a variety of information on function and sub-cellular localization of genes included in its metabolic pathway maps. MPMP is maintained by Hagai Ginsburg from Hebrew University.
- Link to possible entry in iAM_Pf480, a genome-scale metabolic model (GeMM) with 480 Pf genes constructed by Abdel-Haleem et al. 2018 [ref], in BiGG Models.
- Link to gene ID search in the STRING website, which displays networks of known and predicted protein-protein interactions from the STRING database [ref].
Essentiality Information
Three sources of evidence were used to assess Pf gene essentiality:
-
Zhang et al. 2018 [ref]: used piggyBac transposon insertion mutagenesis to generate 38,000 Pf mutants; calculated mutagenesis index scores (MIS) and mutagenesis fitness scores (MFS) for 5,399 nuclear genes, identifying 2,680 genes as essential for in vitro ABS growth. TABLE S5
- Pf3D7 gene IDs were directly mapped to the PlasmoDB release 66 gene list
-
PlasmoGEM (Plasmodium Genetic Modification Project) [ref]: produced P. berghei gene disruption vectors with enhanced transfection efficiency; website includes blood stage phenotypic data (relative growth rate) for 2,578 Pb genes. PlasmoGEM link
- PbANKA gene IDs were mapped to Pf3D7 gene IDs based on OrthoMCL (release 6.19) ortholog groups
- "Download CSV" accessed January 2024
-
RMgmDB (Rodent Malaria genetically modified Parasites database) [ref]: manually curated web repository of genotype and phenotype information for 5,300+ P. berghei mutants (as of August 12, 2023 update) RMgmDB link
- Searched database using all Pf3D7 gene IDs (RMgmDB stores its own PbANKA-Pf3D7 mapping); considered "successful" gene modifications of any type with reported asexual blood stage phenotype
- Accessed January 2024
Binding Information
We also defined presence of a small molecule binding domain as a key factor for target potential. To approximate this, we searched for evidence of similarity to proteins with known ligand interactions through three complementary approaches:
-
Orthology of Pf3D7 protein to at least one protein in BindingDB [ref], a public database of experimentally determined protein-ligand binding affinities (1,218,340 compounds and 9,195 targets as of 2024-01-28 update). BindingDB link
- Accessed January 30th, 2024 with help from Mike Gilson
- 6,202 proteins with measured affinity of ≥10 micromolar with at least one compound considered
- Orthology determined based on presence of P. falciparum 3D7 gene in the same ortholog group as BindingDB Uniprot entry, according to four different orthology databases (HOGENOM, OMA, OrthoDB available in the Uniprot ID mapping web tool [accessed 2/2/2024] and OrthoMCL)
- Additionally, we ran BLAST on the protein sequence of each BindingDB entry against the full OrthoMCL protein database (FASTA release OrthoMCL-6.19). Protein sequences for all Uniprot BindingDB identifiers were obtained from UniProt’s REST tool and were compared against the OrthoMCL custom database with BLAST 2.15 blastp function. Pf genes were mapped based on ortholog group match.
-
AlphaFill [ref] predictions of ligands corresponding to Pf3D7 AlphaFold(v4) models, taken from the AlphaFill databank. The AlphaFill algorithm determines candidate ligands by searching for sequence homologs in PDB with known ligands. Of 5,099 Pf3D7 genes having a corresponding Uniprot ID with AlphaFill information, 2,771 had at least one ligand hit. AlphaFill link
- To focus on AlphaFill hits that are more likely to indicate druggability, we excluded small ligands with <10 atoms as well as additional salts, solvents and polymers used for protein crystallization.
- We only show confident hits with global r.m.s.d. <10 (a measure of structural between the Pf protein of interest and its potential homolog in PDB) and local r.m.s.d. <4 (structural similarity of the backbone atoms within 6 Å from the transplanted ligand, after local structural alignment).
- The "best" AlphaFill hit for each gene was defined as the confident hit with the lowest local RMSD among all ligands, considering the lowest local RMSD hit if multiple exist for a single ligand.
-
Inhibitors linked to EC (Enzyme Commission) number classes in the BRENDA Enzyme Database [ref] (Release 2023.1, updated 2/1/2023) for Pf3D7 genes based on EC number annotations in PlasmoDB release 66, applicable only to enzymes. Other ligand types were not considered. For genes with incomplete EC number annotations, all EC numbers matching the wildcard were considered. BRENDA link
Human Orthologs
Existence of a highly similar human ortholog poses cytotoxicity concerns for a candidate target. Homo sapiens genes orthologous to Pf3D7 genes were determined from OrthoMCL, and both sequence and structural similarity were evaluated through pairwise comparison of Pf3D7 and human ortholog AlphaFold(v4) structures in TM-align [ref]. Of 2,006 genes with human ortholog(s), 1,972 had AlphaFold structures enabling TM-align comparison.
Resistome Mutations
In vitro selections with antimalarial compounds yield mutations that may play roles in resistance. We have compiled a Pf "resistome" database consisting of all SNV and indel mutations identified from whole genome sequencing of 651 Pf clones or bulk culture samples with evolved resistance to one of 109 diverse compounds. Here, we report data on the following classes of mutations in the resistome database:
- "Disruptive" mutations: any SNV or indel occurring within a gene that is not a synonymous, intron, or stop retained variant (e.g. inframe indels, frameshift, start lost, splice region, nonsense, missense)
- Missense mutations: missense SNVs
- "Interesting" missense mutations: missense SNVs that do not fall in a low complexity region (PlasmoDB) and have pLDDT >70 for the affected residue where the protein's AlphaFold structure is available. These are highlighted in green in Pf Target Browser
To determine variants, WGS reads were aligned to the Pf3D7 v13 genome and processed according to GATK 3.5 pipeline. GATK HaplotypeCaller with default parameters was used to generate SNV/indel variant calls, and SnpEff was used to annotate genes and variant effects. Initial filtering retained SNVs and indels with total depth ≥2 and alternate allele frequency (AAF) >0.2 in a resistant sample along with depth ≥3 and genotype call of 0/0 in its parent sample. Next, these variants were filtered for SnpEff gene annotation (excludes intergenic variants >1000nt away from nearest gene) and either major alt allele depth >40 and major AAF >0.4, or major alt allele depth >10 and AAF >0.8.
Protein Information
All records from Protein Data Bank (PDB) associated with a Pf gene ID were obtained by querying Pf3D7 gene symbols and UniProt IDs, or by searching "Plasmodium" in the PDB website in February 2024.
We include visualization of the AlphaFold protein structure, if available, in an embedded version of iCn3D, a web-based 3D structure viewer from the NIH.
Field Variants
We used the MalariaGEN Pf7 [ref] dataset of 20,864 worldwide samples to assess field genetic variation. Variant (SNV and indel) call data was accessed via the malariagen_data Python package. We restricted our analysis to 5,868,659 variants marked as "passing" based on quality filters and exclusion of variants in hypervariable regions, mitochondrial and apicoplast genomes (see Methods of the Pf7 paper). Note: this means that no variants are reported for subtelomeric and non-nuclear genes, even though they may have field variants. Variants were then sorted based on effect ("synonymous" includes stop retained, "disruptive" was defined as anything other than synonymous, and "missense" is restricted to missense SNVs) and prevalence among all Pf7 samples ("singleton" occurs in only one sample, "doubleton" occurs in two samples, "rare" variants exclude singletons/doubletons but occur in less than 21 samples (<0.1% prevalence), all else defined as "common"). Whether a variant occurs in a sample or not was further stratified by homozygous genotype call, e.g. 1/1 indicating that allele 1 is present at nearly 100% frequency, or any genotype call containing the allele (e.g. 1/2 would be counted as both allele 1 and allele 2 occurring in a sample).
TL;DR: Each number in the table indicates how many unique variants (non-3D7 reference allele at a locus within the gene) of a particular effect type fall into a certain prevalence bin across the Pf7 dataset. In computing prevalence, whether or not a sample "has" a variant is defined either by homozygous only or any genotype call.
Associated Publications
A download from the NCBI FTP site was performed for gene2pubmed.gz (version 2024-02-21) containing tax ID, gene ID (Entrez) and PubMed ID. Gene IDs were mapped to this annotation set and corresponding PMIDs were extracted. To include gene references not related to an Entrez ID, but to the gene name (symbol), PMID details were obtained pragmatically using NCBI Eutils4 efetch function from NCBI. Title, authors and DOI identifier were retrived for each record.
Number Formatting
To facilitate interpretation, certain numeric data are colored to reflect their value. Colors should not be construed as indications of whether a gene is suitable or not as an antimalarial drug target. Details are listed below.
- Essentiality - Zhang MIS: linear scale from green to red, green being 0, red being 1
- Essentiality - Zhang MFS: green if MFS is negative, else red
- Essentiality - Zhang #Insertions: green if #Insertions = 0, yellow if #Insertions = 1, else red
- Essentiality - PlasmoGEM Relative Growth Rate: linear scale from green to red, green being 0, red being 1
- Binding Evidence - AlphaFill "best hit" Local RMSD: uncolored (white) if <1, yellow if 1-4, red if >4
- Binding Evidence - AlphaFill "best hit" Global RMSD: linear scale from green to red based on percent rank among all Pf3D7 genes' best AlphaFill hits
- Resistome Mutations - # Samples with Disruptive Mutations: linear scale from red to green based on percent rank among all Pf3D7 genes
- Genetic Variaton - PlasmoDB Total / Non-coding / Synonymous / Stop Codon SNPs: linear scale from green to red based on percent rank among all Pf3D7 genes
- Orthology Information - Most Similar Human Ortholog TM-align score: linear scale from red to green based on percent rank among all Pf3D7 genes with a human ortholog
- Orthology Information - Most Similar Human Ortholog RMSD: linear scale from green to red based on percent rank among all Pf3D7 genes with a human ortholog
- Orthology Information - Most Similar Human Ortholog Seq Identity: linear scale from red to green based on percent rank among all Pf3D7 genes with a human ortholog
- Protein Information - Protein Length: linear scale from red to green based on percent rank among all Pf3D7 genes
- Protein Information - Molecular Weight: linear scale from red to green based on percent rank among all Pf3D7 genes
- Protein Information - Isoelectric Point: linear scale from red to green based on percent rank among all Pf3D7 genes
More Resources
- Essentiality and localization for "conserved protein, unknown function" genes on Pf3D7 chromosome 3: TABLE (Kimmel et al. 2023 [ref])
- Structural similarities to known domains found in 353 Pf3D7 proteins of unknown function, using Alphafold predictions and DALI search against PDB: TABLES (Behrens and Spielmann 2024 [ref])
References
- Abdel-Haleem, A. M., Hefzi, H., Mineta, K., Gao, X., Gojobori, T., Palsson, B. O., Lewis, N. E., & Jamshidi, N. (2018). Functional interrogation of Plasmodium genus metabolism identifies species- and stage-specific differences in nutrient essentiality and drug targeting. PLoS computational biology, 14(1), e1005895. https://doi.org/10.1371/journal.pcbi.1005895
- Behrens, H. & Spielmann, T. Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710; doi: https://doi.org/10.1101/2023.06.05.543710 [Preprint]
- Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic acids research, 49(D1), D498–D508. https://doi.org/10.1093/nar/gkaa1025
- Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research, 44(D1), D1045–D1053. https://doi.org/10.1093/nar/gkv1072
- Hekkelman, M.L., de Vries, I., Joosten, R.P. et al. (2023). AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 20, 205–213. https://doi.org/10.1038/s41592-022-01685-y
- Howick, V. M. et al. (2019). The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science 365, eaaw2619. https://doi.org/10.1126/science.aaw2619
- Khan, S.M., Kroeze, H., Franke-Fayard, B., & Janse, C.J. (2013). Standardization in generating and reporting genetically modified rodent malaria parasites: the RMgmDB database. Methods in molecular biology (Clifton, N.J.) 923, 139–150. https://doi.org/10.1007/978-1-62703-026-7_9
- Kimmel, J., Schmitt, M., Sinner, A. et al. (2023). Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell systems, 14(1), 9–23.e7. https://doi.org/10.1016/j.cels.2022.12.001
-
Le Roch, K. G., Zhou, Y., Blair, P. L., Grainger, M., Moch, J. K., Haynes, J. D., … Winzeler, E. A. (2003). Discovery of gene function by expression profiling of the malaria parasite life cycle. Science, 301(5639), 1503–1508. https://doi.org/10.1126/science.1087025
- MalariaGEN, Abdel Hamid, M. M., Abdelraheem, M. H., Acheampong, D. O., Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., Amenga-Etego, L., Andagalu, B., Anderson, T., Andrianaranjaka, V., Aniebo, I., Aninagyei, E., Ansah, F., Ansah, P. O., Apinjoh, T., Arnaldo, P., Ashley, E., … van der Pluijm, R. W. (2023). Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome open research 8, 22. https://doi.org/10.12688/wellcomeopenres.18681.1
- Teufel, F., Almagro Armenteros, J.J., Johansen, A.R. et al. (2022). SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40, 1023–1025. https://doi.org/10.1038/s41587-021-01156-3
For more information about SignalP, see this Springer Nature blog post
- Schwach, F., Bushell, E., Gomes, A.R., Anar, B., Girling, G., Herd, C., Rayner, & J.C., Billker, O. (2015). PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites. Nucleic Acids Research 43(D1), D1176–D1182. https://doi.org/10.1093/nar/gku1143
- Szklarczyk, D., Kirsch, R., Koutrouli, M., Nastou, K., Mehryary, F., Hachilif, R., Gable, A. L., Fang, T., Doncheva, N. T., Pyysalo, S., Bork, P., Jensen, L. J., & von Mering, C. (2023). The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1), D638–D646. https://doi.org/10.1093/nar/gkac1000
- Zhang, M. et al. (2018). Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. Science 360, eaap7847. https://doi.org/10.1126/science.aap7847
- Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research, 33(7), 2302–2309. https://doi.org/10.1093/nar/gki524