About Gene List

Pf Target Browser

This web app, created by the Winzeler Lab, was designed to be a resource for MalDA (Malaria Drug Accelerator) Consortium's Plasmodium falciparum target prioritization efforts. We have compiled a variety of annotations for all genes in the P. falciparum 3D7 genome (PlasmoDB release 66) relevant to their potential as antimalarial targets.

To navigate this app, you can:

All data displayed in this website can be downloaded in tabular format from the following repository: Figshare link.

Methods

P. falciparum 3D7 Genes

5,318 protein coding genes (5,389 transcripts) were considered from the Plasmodium falciparum 3D7 reference genome in PlasmoDB Release 66 - 28 Nov 2023 (accessed January 3rd, 2024).

PlasmoDB Annotations

The following gene/transcript annotations were taken directly from PlasmoDB Release 66 using the "Search Strategy" function:

Pf Target Browser includes linkouts to BRENDA for EC numbers and AmiGO for GO terms.

Functional Information

To provide additional data on biological function, we include the following:

Essentiality Information

Three sources of evidence were used to assess Pf gene essentiality:

  1. Zhang et al. 2018 [ref]: used piggyBac transposon insertion mutagenesis to generate 38,000 Pf mutants; calculated mutagenesis index scores (MIS) and mutagenesis fitness scores (MFS) for 5,399 nuclear genes, identifying 2,680 genes as essential for in vitro ABS growth. TABLE S5
    • Pf3D7 gene IDs were directly mapped to the PlasmoDB release 66 gene list
  2. PlasmoGEM (Plasmodium Genetic Modification Project) [ref]: produced P. berghei gene disruption vectors with enhanced transfection efficiency; website includes blood stage phenotypic data (relative growth rate) for 2,578 Pb genes. PlasmoGEM link
    • PbANKA gene IDs were mapped to Pf3D7 gene IDs based on OrthoMCL (release 6.19) ortholog groups
    • "Download CSV" accessed January 2024
  3. RMgmDB (Rodent Malaria genetically modified Parasites database) [ref]: manually curated web repository of genotype and phenotype information for 5,300+ P. berghei mutants (as of August 12, 2023 update) RMgmDB link
    • Searched database using all Pf3D7 gene IDs (RMgmDB stores its own PbANKA-Pf3D7 mapping); considered "successful" gene modifications of any type with reported asexual blood stage phenotype
    • Accessed January 2024

Binding Information

We also defined presence of a small molecule binding domain as a key factor for target potential. To approximate this, we searched for evidence of similarity to proteins with known ligand interactions through three complementary approaches:

  1. Orthology of Pf3D7 protein to at least one protein in BindingDB [ref], a public database of experimentally determined protein-ligand binding affinities (1,218,340 compounds and 9,195 targets as of 2024-01-28 update). BindingDB link
    • Accessed January 30th, 2024 with help from Mike Gilson
    • 6,202 proteins with measured affinity of ≥10 micromolar with at least one compound considered
    • Orthology determined based on presence of P. falciparum 3D7 gene in the same ortholog group as BindingDB Uniprot entry, according to four different orthology databases (HOGENOM, OMA, OrthoDB available in the Uniprot ID mapping web tool [accessed 2/2/2024] and OrthoMCL)
    • Additionally, we ran BLAST on the protein sequence of each BindingDB entry against the full OrthoMCL protein database (FASTA release OrthoMCL-6.19). Protein sequences for all Uniprot BindingDB identifiers were obtained from UniProt’s REST tool and were compared against the OrthoMCL custom database with BLAST 2.15 blastp function. Pf genes were mapped based on ortholog group match.
  2. AlphaFill [ref] predictions of ligands corresponding to Pf3D7 AlphaFold(v4) models, taken from the AlphaFill databank. The AlphaFill algorithm determines candidate ligands by searching for sequence homologs in PDB with known ligands. Of 5,099 Pf3D7 genes having a corresponding Uniprot ID with AlphaFill information, 2,771 had at least one ligand hit. AlphaFill link
  3. Inhibitors linked to EC (Enzyme Commission) number classes in the BRENDA Enzyme Database [ref] (Release 2023.1, updated 2/1/2023) for Pf3D7 genes based on EC number annotations in PlasmoDB release 66, applicable only to enzymes. Other ligand types were not considered. For genes with incomplete EC number annotations, all EC numbers matching the wildcard were considered. BRENDA link

Human Orthologs

Existence of a highly similar human ortholog poses cytotoxicity concerns for a candidate target. Homo sapiens genes orthologous to Pf3D7 genes were determined from OrthoMCL, and both sequence and structural similarity were evaluated through pairwise comparison of Pf3D7 and human ortholog AlphaFold(v4) structures in TM-align [ref]. Of 2,006 genes with human ortholog(s), 1,972 had AlphaFold structures enabling TM-align comparison.

Resistome Mutations

In vitro selections with antimalarial compounds yield mutations that may play roles in resistance. We have compiled a Pf "resistome" database consisting of all SNV and indel mutations identified from whole genome sequencing of 651 Pf clones or bulk culture samples with evolved resistance to one of 109 diverse compounds. Here, we report data on the following classes of mutations in the resistome database:

To determine variants, WGS reads were aligned to the Pf3D7 v13 genome and processed according to GATK 3.5 pipeline. GATK HaplotypeCaller with default parameters was used to generate SNV/indel variant calls, and SnpEff was used to annotate genes and variant effects. Initial filtering retained SNVs and indels with total depth ≥2 and alternate allele frequency (AAF) >0.2 in a resistant sample along with depth ≥3 and genotype call of 0/0 in its parent sample. Next, these variants were filtered for SnpEff gene annotation (excludes intergenic variants >1000nt away from nearest gene) and either major alt allele depth >40 and major AAF >0.4, or major alt allele depth >10 and AAF >0.8.

Protein Information

All records from Protein Data Bank (PDB) associated with a Pf gene ID were obtained by querying Pf3D7 gene symbols and UniProt IDs, or by searching "Plasmodium" in the PDB website in February 2024.

We include visualization of the AlphaFold protein structure, if available, in an embedded version of iCn3D, a web-based 3D structure viewer from the NIH.

Field Variants

We used the MalariaGEN Pf7 [ref] dataset of 20,864 worldwide samples to assess field genetic variation. Variant (SNV and indel) call data was accessed via the malariagen_data Python package. We restricted our analysis to 5,868,659 variants marked as "passing" based on quality filters and exclusion of variants in hypervariable regions, mitochondrial and apicoplast genomes (see Methods of the Pf7 paper). Note: this means that no variants are reported for subtelomeric and non-nuclear genes, even though they may have field variants. Variants were then sorted based on effect ("synonymous" includes stop retained, "disruptive" was defined as anything other than synonymous, and "missense" is restricted to missense SNVs) and prevalence among all Pf7 samples ("singleton" occurs in only one sample, "doubleton" occurs in two samples, "rare" variants exclude singletons/doubletons but occur in less than 21 samples (<0.1% prevalence), all else defined as "common"). Whether a variant occurs in a sample or not was further stratified by homozygous genotype call, e.g. 1/1 indicating that allele 1 is present at nearly 100% frequency, or any genotype call containing the allele (e.g. 1/2 would be counted as both allele 1 and allele 2 occurring in a sample).

TL;DR: Each number in the table indicates how many unique variants (non-3D7 reference allele at a locus within the gene) of a particular effect type fall into a certain prevalence bin across the Pf7 dataset. In computing prevalence, whether or not a sample "has" a variant is defined either by homozygous only or any genotype call.

Associated Publications

A download from the NCBI FTP site was performed for gene2pubmed.gz (version 2024-02-21) containing tax ID, gene ID (Entrez) and PubMed ID. Gene IDs were mapped to this annotation set and corresponding PMIDs were extracted. To include gene references not related to an Entrez ID, but to the gene name (symbol), PMID details were obtained pragmatically using NCBI Eutils4 efetch function from NCBI. Title, authors and DOI identifier were retrived for each record.

Number Formatting

To facilitate interpretation, certain numeric data are colored to reflect their value. Colors should not be construed as indications of whether a gene is suitable or not as an antimalarial drug target. Details are listed below.

More Resources

References

  1. Abdel-Haleem, A. M., Hefzi, H., Mineta, K., Gao, X., Gojobori, T., Palsson, B. O., Lewis, N. E., & Jamshidi, N. (2018). Functional interrogation of Plasmodium genus metabolism identifies species- and stage-specific differences in nutrient essentiality and drug targeting. PLoS computational biology, 14(1), e1005895. https://doi.org/10.1371/journal.pcbi.1005895
  2. Behrens, H. & Spielmann, T. Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710; doi: https://doi.org/10.1101/2023.06.05.543710 [Preprint]
  3. Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic acids research, 49(D1), D498–D508. https://doi.org/10.1093/nar/gkaa1025
  4. Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research, 44(D1), D1045–D1053. https://doi.org/10.1093/nar/gkv1072
  5. Hekkelman, M.L., de Vries, I., Joosten, R.P. et al. (2023). AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 20, 205–213. https://doi.org/10.1038/s41592-022-01685-y
  6. Howick, V. M. et al. (2019). The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science 365, eaaw2619. https://doi.org/10.1126/science.aaw2619
  7. Khan, S.M., Kroeze, H., Franke-Fayard, B., & Janse, C.J. (2013). Standardization in generating and reporting genetically modified rodent malaria parasites: the RMgmDB database. Methods in molecular biology (Clifton, N.J.) 923, 139–150. https://doi.org/10.1007/978-1-62703-026-7_9
  8. Kimmel, J., Schmitt, M., Sinner, A. et al. (2023). Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell systems, 14(1), 9–23.e7. https://doi.org/10.1016/j.cels.2022.12.001
  9. Le Roch, K. G., Zhou, Y., Blair, P. L., Grainger, M., Moch, J. K., Haynes, J. D., … Winzeler, E. A. (2003). Discovery of gene function by expression profiling of the malaria parasite life cycle. Science, 301(5639), 1503–1508. https://doi.org/10.1126/science.1087025
  10. MalariaGEN, Abdel Hamid, M. M., Abdelraheem, M. H., Acheampong, D. O., Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., Amenga-Etego, L., Andagalu, B., Anderson, T., Andrianaranjaka, V., Aniebo, I., Aninagyei, E., Ansah, F., Ansah, P. O., Apinjoh, T., Arnaldo, P., Ashley, E., … van der Pluijm, R. W. (2023). Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome open research 8, 22. https://doi.org/10.12688/wellcomeopenres.18681.1
  11. Teufel, F., Almagro Armenteros, J.J., Johansen, A.R. et al. (2022). SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40, 1023–1025. https://doi.org/10.1038/s41587-021-01156-3
  12. For more information about SignalP, see this Springer Nature blog post

  13. Schwach, F., Bushell, E., Gomes, A.R., Anar, B., Girling, G., Herd, C., Rayner, & J.C., Billker, O. (2015). PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites. Nucleic Acids Research 43(D1), D1176–D1182. https://doi.org/10.1093/nar/gku1143
  14. Szklarczyk, D., Kirsch, R., Koutrouli, M., Nastou, K., Mehryary, F., Hachilif, R., Gable, A. L., Fang, T., Doncheva, N. T., Pyysalo, S., Bork, P., Jensen, L. J., & von Mering, C. (2023). The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1), D638–D646. https://doi.org/10.1093/nar/gkac1000
  15. Zhang, M. et al. (2018). Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. Science 360, eaap7847. https://doi.org/10.1126/science.aap7847
  16. Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research, 33(7), 2302–2309. https://doi.org/10.1093/nar/gki524