FlyAtlas 2 – Docs
Click or tap on the triangles to expand and collapse the different sections of documentation.
About FlyAtlas 2
FlyAtlas 2 shares with the original FlyAtlas (Nature Genetics [2007] 39, 715–720) the objective of allowing one to find which genes are expressed in individual tissues of Drosophila melanogaster. The flies and tissues are the same as previous, but in FlyAtlas 2 the gene transcripts have been quantified by RNA-Seq rather than hybridization to microarray. This removes any ambiguity in the identification of genes, which is now extended to specific RNA transcripts. Approximately 17,500 genes and 34,500 transcripts are represented. The original ‘FlyAtlas 1’ data are still available, and links to these are provided for each gene.
General Instructions for Use
Main Search Types
Three main types of search can be performed after selecting the appropriate section from the menu. Gene searches are made for a particular gene or list of genes using the names or identifiers assigned by FlyBase; Category searches are made for protein-coding genes using descriptive terms that are part of their gene ontologies; and Tissue searches produce a list of the most abundant or enriched protein-coding or microRNA genes in a particular tissue.
Why You should use Autosuggest
For every field in which a gene name or identifier is entered there is an autocomplete facility based on the entries in the database underlying FlyAtlas 2. The autosuggest for a Symbol or Name starts after two characters, that for an Annotation symbol starts after five characters, and that for a FlyBase ID after nine. There are two reasons that you should use this. The first is that gene names are, of necessity, case-sensitive because there are some distinct Drosophila genes that differ only by case. The second is to save yourself the frustration of making a query, only to be told that the gene is not in the database. If the gene is not in the autocomplete menu you can be sure that it is not in the database.
Greek Characters in Input
Autosuggest will add Greek characters (e.g. eIF-2β from eI), and will autocomplete after a single Greek character (e.g. βTub56D from β). Characters that start gene symbols or names are α β γ δ ε η ι κ λ θ ζ , and may be copied from here and pasted into a text box. Alternatively one can type the Latin equivalents (alpha, beta etc.).
Abundance and Enrichment
The term, abundance, used in the original FlyAtlas, is only used in the Tissue section of FlyAtlas 2. Elsewhere it replaced by the specific units, either FPKM (Fragments Per Kilobase of transcript per Million mapped reads), or in the case of microRNA genes by TPM (normalized counts per million). The two are not comparable. The enrichment is a measure of the abundance of a gene in a particular tissue relative to that in the whole fly. This allows one to determine whether the expression of a particular gene is tissue-specific or not. In order to avoid division by zero and misleading values of enrichment, in calculating the latter it is necessary to set abundance values for whole flies that fall below the background threshold to that of the threshold. (2 FPKM or 200 TPM). This may result in quantitative underestimations, so users should inspect the data carefully. Enrichments are not presented for individual transcripts.
How the Category Search works
The default interface for the Category Search is based solely on gene ontology (GO) terms obtained from http://geneontology.org. Those in FlyAtlas2 are drawn from a file (go-basic.obo) that provides a listing of each entry that includes an id and name, e.g.
id: GO:0000019
name: regulation of mitotic recombination
A FlyBase file (gene_association.fb) is available that lists the GO ids assigned to each gene, and data from both files have been incorporated into the FlyAtlas2 relational database, allowing each FBgn to be correlated with zero or more ‘GO names’. GO entries for which there is no related FBgn have been removed from the database. A FlyAtlas2 Category Search by ‘Category’ produces an autosuggest menu showing all the ‘GO names’ that include the letters typed, so that one may select one of interest. This does have the disadvantage that one cannot select multiple terms. If one chooses a ‘Free Search’, this includes all the corresponding GO names. The disadvantages here are that one may get very many ‘hits’ (or none) and that individual ‘GO names’ may sometimes contain the keyword in an overall context that is not relevant to one’s interest.

The alternative interface for the Category Search employs a different categorization — one derived from FlyBase ‘groups’. It offers a selection of 126 groups (ranging from 3 to 258 members) in a pull-down menu.

About the Profile Search
The Profile Search compares the expression of one gene across different tissues with that of other genes in Drosophila, and reports those that are most similar. It determines either of two statistical values — the Pearson or Spearman correlation coefficient, r, explanation of which can be found on Wikipedia pages, e.g. https://en.wikipedia.org/wiki/Correlation_and_dependence. The output includes the values of r and of p, a statistical probability. The cut-off value for r can be set by the user before starting the search. Currently searches for microRNAs are not supported.
There are several options for the tissues used in constructing the profile (see below). The default — ‘adult and larval’ — is generally recommended, even though it lacks certain tissues because of computational memory restrictions. (Limiting common tissues to one sex also avoids bias to the adult.) The excluded tissues are present in the adult-only alternatives. The user should, of course, check that any alternative tissue selection used contains those tissues in which the query gene is expressed.

The Midgut Facility
The Drosophila larval midgut is composed of five regions of different pH: the neutral gastric caeca/anterior region, the acidic region, the neutral region, the slightly acidic transitional region, and the posterior alkaline region. We prepared an RNA-Seq library from all these five regions in the course of studies on this topic, and the data are presented in this separate section of FlyAtlas 2, together with a graphic indicating their anatomical locations. Please note that MicroRNA data are not available for these midgut regions.
What the Colours Mean
The colouring of cell backgrounds (heat mapping) is merely a visual aid to help users quickly identify the results of most interest. The colouring is not a surrogate for the numerical data, and in the case of transcripts, the user should download the tables to inspect values of interest. The scales have been devised to accomodate the particular ranges of values in the database. For abundance values the white-to-black spectrum illustrated below is used. For FPKMs this has a 15-step log1.6 scale, but all values below 2 FPKM (considered background) are coloured white, and no discrimination is employed for values above 1150 FPKM (pure black).

For the generally higher TPM values of microRNAs a log2.6 scale is used with no indication of background:

For enrichment values a divergent white/yellow/red spectrum is used:

This employs an asymmetric log scale running from white to yellow (0-1) to represent genes with ‘decreased enrichment’, and then from yellow through orange and a wider range of reds to encompass values of ‘increased enrichment’.
Result Presentation Options
The default presentation of the table containing the results of a query has been simplified for initial ease of inspection. Checkboxes provide options of more detailed information. SD appends standard deviations (based on triplicates) to the values. Whole Body includes the abundance values for whole flies. Male v. Female shows the said ratio of abundances, in cases where the tissue is not sex-specific. Checkbox selections are retained in subsequent queries until changed.
Correlating Gene and Transcript data
The table with the transcript data is, of necessity, quite compact. However it is possible to highlight the part of the transcript table corresponding to a particular tissue by clicking/tapping on the cell in question in the gene table. Click/tap again to turn the highlighting off. (This does not apply to microRNAs, where the transcripts are presented in a different manner.)
Downloading Tables and Graphics
There are download icons below each of the tables of gene and transcript data. These generate files in tab-separated format with an extension ‘txt’. Such files will likely open in a text editor when double-clicked, but can be imported into open spreadsheet applications, such as MS Excel, LibreOffice or Apple's Numbers. The files can then be formatted and saved in the application’s native format, if required.
The microRNA bar chart can be downloaded in SVG format for editing in a suitable application such as Adobe Illustrator or Inkscape.
Links Out
The ‘name line’ above the results table contains a ‘link-out’ icon on its extreme right. Clicking this invokes a menu of options, the majority to third-party Drosophila resources. Note, however, that the first is to the corresponding gene results for FlyAtlas 1although only in cases where the gene is represented in that facility (e.g. not for RNA genes of any sort).
Paralogues
The ‘name line’ above the results table contains an item entitled ‘paralogues’. This allows one to view the results for the gene and any paralogues together (in the same format as for bulk searches). The list of paralogues is taken from FlyBase, and one should realize that it includes both close and distant homologues. The final cut-off is to some extent arbitrary, and it is ultimately up to the individual to perform his own sequence comparisons to decide on their biological significance.
View in UCSC Browser
Transcript results provide an option to ‘View in UCSC Genome Browser’. Clicking this gives an explanatory page with a link to the gene of interest in the UCSC Browser, with the reads from FlyAtlas 2 superimposed. This is useful both for assessing the quality of the data for oneself, and for checking unique exons that define particular transcripts. You are strongly advised to read the explanatory page the first time you use this. We think that the facility offered by the UCSC Browser is excellent, but its richness means that it does require some effort on the user’s part.
In-page Help
Clicking on the icons showing a question mark in a circle invokes in-page help about downloading tables and about correlating gene and transcript data, as explained above.
Questions & Problems
Why is my Gene not in the Database?
First, make sure that the appropriate name or identifier has been selected, start typing, and check the autocomplete for related spellings. If this fails it is possible that you are using a discontinued or superseded designation or one that is newer than the reference genome used for the database. Check by searching in FlyBase for the gene designation you used.
What is the Problem with Di- and Poly-cistronic mRNAs?
The vast majority of Drosophila mRNAs are monocistronic — as with other eukaryotes. However there are 68 transcripts in the Drosophila reference genome which are di- or poly-cistronic, encompassing 139 genes with distinct FlyBase identifiers. In these cases the software used to process the RNA-Seq data assigns the expression, apparently randomly, to just one of these genes — expression values will be absent from the other(s). In such cases ‘genes’ are defined by the expression of particular proteins rather than a discrete region of DNA, and the extent of expression of a transcript with the translational potential for two or more proteins can give no information on the expression of the proteins, regardless of which gene the software has assigned it to. In certain cases such genes have multiple transcripts, some of which may be monocistronic and thus have the potential of yielding useful information. The user should consult FlyBase for information on this.
Why are there Transcripts in FlyBase that are absent from the FlyAtlas2 Output?
In 573 cases these ‘missing’ transcripts are identical to other transcripts (present in the output) even though each has a distinct FlyBase identifier. This is because certain transcripts encode more than one protein, and for consistency of nomenclature a distinct corresponding transcript ID is generated for each distinct protein ID. In 139 cases the different proteins are translated from different start sites on di- or poly-cistronic mRNAs (see above). In other cases additional polypeptides are generated by read-through of stop codons. (An ‘X’ can be found in the amino-acid sequence at the position of these stop codons.)
Another type of missing transcript is the stem-loop precursor of the 260 microRNAs. The method of analysing microRNAs did not allow quantitation of these, and only the two mature cleaved portions are included.
What is the situation regarding Tissue Comparability?
Adult fat body, spermatheca and heart, and larval garland cells were sequenced using a different chemistry from the other tissues. The results for these tissues are comparable within this group, but not with other tissues. The identification of their individual transcripts is also less rigorous.
Can I rely on the Data for RNA genes?
Unlike Fly Atlas 1, FlyAtlas 2 includes data for genes that are transcribed into RNA that does not encode protein. There are different classes of such genes, and the reliability of the results varies with the class. The microRNA genes were analysed separately, and we regard the results for them as reliable as those for protein genes. Some other RNA genes are too small to be detected in the total-RNA workup (e.g. tRNAs), some large genes may overlap protein coding genes and should be examined using the UCSC browser, and repeated genes (e.g. rRNAs) are not called by our analysis. The attention of the user is drawn to the particular problems of any RNA gene for which he has searched.
Why are some results marked ‘N.A.’?
Generally this is because the data are ‘Not Available’ as the Cuffdiff program flagged their status as other than ‘OK’. Most of the genes involved do not encode proteins. (The sequence reads can still be inspected in the UCSC Browser.) In other cases it is used where no value of enhancement has been presented because the FPKM value was less than 2, the value used as for cutoff above background.
Male and Female Abundances and Enhancements seem Inconsistent
For male/female comparisons of a gene in a particular tissue, only abundances are meaningful. This is because enhancements are related to expression in whole flies, where males and females obviously differ, most pertinently in the large contribution made by the ovaries of the female.
FlyAtlas 1 and 2 give different Results for my Gene
The differentially expressed genes identified in the original FlyAtlas work are generally corroborated by FlyAtlas 2. However it would be surprising if there were not some inconsistencies in data for so many genes. It is worth emphasizing some of the limitations of the previous microarray approach. Thus, although microarray probe-sets were designed to be specific for individual genes, in many cases they turned out not to be so, and the data for over 700 genes in FlyAtlas 1 were ambiguous and are so marked in the FlyAtlas 2013 facility. Some genes were detected by more than one probe-set, but with marked differences in signal, suggesting that the probe-sets for some other genes may not have been optimal. Nevertheless, do not hesitate to use the Feedback Form let us know of any discrepancies that you feel are cause for concern.
Why are some Features not available on Mobile?
We designed the FlyAtlas 2 web application so that it responded to the decrease in viewing area of mobile devices by maintaining core functionality but dispensing with some secondary features. Thus, on tablets the ‘link-out’ options (including to the UCSC browser) are not available. Selecting ‘Request Desktop Site’ from the options in your mobile web browser may — depending on your device — give you access to these features.
My Problem or Question is not covered here
If you have a question or a problem that is not covered here, let us know using the feedback form. We shall try to respond within 24 hours, although circumstances may sometimes prevent this.
FlyAtlas 2 & Third-party Data — Technical & Versioning
Insects
Insects were wild-type Canton S Drosophila melanogaster, reared at 22˚C on a 12:12 h light regime on standard Drosophila diet. Larvae were third instar, crawling, and adults were one week after emergence. Tissue dissection was as previously described by Chintapalli, V. et al. (2007) Nature Genetics, 39, 715–720.
Biological Replicates
The biological replicates for individual tissues were analysed using the same technology. Most of the data are for biological triplicates, but the following are for duplicates:
Adult male and female Crop
Adult male and female Eye
Adult female Rectal Pad
Adult male and female Salivary Gland
Adult male and female Thoracicoabdominal Ganglion
RNA-Seq
RNA was isolated in triplicate using the Qiagen miRNeasy kit, with separate libraries prepared for total RNA or microRNA sequencing. The quality of RNA samples was analysed using an Agilent Bioanalyzer. Most of the tissues were sequenced by Edinburgh Genomics, using Illumina TruSeq technology for whole RNA. Adult Fat Body, Spermatheca and Heart, and larval Garland cells were sequenced by Glasgow Polyomics using Clontech SMARTer Pico Input methodology for the small amounts of RNA available. Different chemistries were also used with different tissues for microRNAs.
Computational Analysis
Total RNA was analysed using the Tuxedo pipeline (Trapnell et al. (2012) Nature Protocols 7, 562–578) and the Drosophila Release 6 reference genome (Ensembl version as provided by Illumina — March 2016). Data for all tissues were normalized together using Cuffdiff. MicroRNA was originally analysed using CapMirSeq, but in August 2022 was reanalysed for all tissues using sRNAPipe (Pogorelcnik et al. (2018) Mobile DNA 9:25).
FlyAtlas 2 Database & Third-Party Data
The RNA-Seq data were loaded into a MySQL database, which underlies this web application. It uses gene information from Drosophila Release 6 reference genome (Santos et al. (2015) Nucleic Acids Research 43, D690–D697) and gene ontology information from FlyBase (flybase.org) and the Gene Ontology Consortium (www.geneontology.org).
Version Information
DATABASE VERSION HISTORY
The history of the various updates to the downloadable FlyAtlas2.sql database is shown with version numbers based on dates in yy.mm.dd format. Current versions of constituent third-party data are shown below this.
FlyAtlas2_24.01.08.sql: Paralogue flag added to Gene table for revised paralogue code. (Live from 24.03.07.)
FlyAtlas2_23.06.29.sql: Table of paralogues added.
FlyAtlas2_22.10.14.sql: MicroRNA precursors (RM forms) removed from database.
FlyAtlas2_22.09.23.sql: Previous MicroRNA data replaced by normalized TPM data. Mitochondrial genes and their transcripts removed.
FlyAtlas2_22.08.02.sql: Heart and Garland cell MicroRNA data added, and all data reanalysed using sRNAPipe. RM forms included and some newly identified microRNAs added.
FlyAtlas2_21.09.30.sql: FlyBase Groups updated.
FlyAtlas2_21.09.27.sql: Additional profiles added.
FlyAtlas2_21.09.19.sql: Gene nomenclature update to FlyBase FB2021_04.
FlyAtlas2_21.08.09.sql: Larval Garland Cell FPKMs added.
FlyAtlas2_21.04.18.sql: Male and Female Heart FPKMs added.
FlyAtlas2_19.10.15.sql: MicroRNA genes and transcripts revised to include RA and RB forms and exclude RM forms. Data re-normalized.
FlyAtlas2_19.09.18.sql: MicroRNA data added for adult Crop, Fat Body, Salivary Gland and Spermatheca.
FlyAtlas2_19.09.02.sql: MicroRNA data revised for Eye and Thoracicoabdominal Ganglion. (Replaces 19.08.31)
FlyAtlas2_19.08.31.sql: MicroRNA data added for Eye and Thoracicoabdominal Ganglion.
FlyAtlas2_19.08.19.sql: Tissue replicate numbers adjusted as described in Biological Replicates.
FlyAtlas2_19.07.01.sql: Male and Female Fat Body FPKMs added.
FlyAtlas2_19.04.10.sql: Gene Names and Symbols updated to FlyBase versions 2019_01.
FlyAtlas2_19.03.04.sql: Third replicate of Male and Female eye and salivary gland FPKMs added.
FlyAtlas2_18.08.15.sql: Virgin and Mated Spermatheca FPKMs added.
FlyAtlas2_18.05.25.sql: Male and Female crop, eye, salivary gland and thoracicoabdominal ganglion FPKMs added (2 replicates).
FlyAtlas2_18.05.25.sql: Field for Cuffdiff ‘status‘ values added to GeneFPKM and TranscriptFPKM tables.
FlyAtlas2_18.05.05.sql: New Female Head replicate substituted for one of previous.
FlyAtlas2_18.05.05.sql: Anatomical nomenclature correction: ‘Anal Pad’ changed to ‘Rectal Pad’.
FlyAtlas2_17.11.09.sql: Male and Female brain and anal pad, and Male accessory gland RPMs added for microRNAs.
FlyAtlas2_17.10.07.sql: gene_association.fb FBgn/GOid pairs qualified by ‘NOT’ removed.
FlyAtlas2_17.10.05.sql: Male and Female brain and anal pad, and Male accessory gland FPKMs added.
FlyAtlas2_17.10.04.sql: Gene symbol, names and annotation symbols updated to FlyBase FB2017_04.
GENE DATA
The Drosophila Reference Genome (Ensembl) defines the FBgn numbers of the genes in the database and the aim is to update this once per year. The gene symbols and names associated with the FBgns are derived from two FlyBase files and are updated in concert with the reference genome, and sometimes more frequently. The current versions in FlyAtlas2 are:
Drosophila_melanogaster_Ensembl_BDGP6_chr : 03.10.2015
fbgn_annotation_ID_fb_2021_04.tsv : 04.08.2021
gene_snapshots_fb_2021_04.tsv : 04.08.2021
ONTOLOGY DATA
The list of GO ids and associated names is obtained from the go-basic.obo file at http://geneontology.org. The correlation between Drosophila FBgn identifiers and GO identifiers is made using the FlyBase gene association file. The current versions in FlyAtlas2 are:
go-basic.obo.txt : 01.09.2021
gene_association.fb : 04.08.2021
PARALOGUE DATA
The list of paralogues is also from FlyBase — the current version in the database is given below.
dmel_paralogs_fb : 14.11.2022
Citing FlyAtlas 2 & Data Availability
Citing FlyAtlas 2
In any published work relying on these data, please cite: Krause, S. A., Overend, G., Dow, J. A.T. and Leader, D. P. (2020) FlyAtlas 2 in 2022: enhancements to the Drosophila melanogaster expression atlas. Nucleic Acids Research 50 D1010–D1015. Free access to this paper is available here.
Authorship and Acknowledgements
This work was conceived by Julian Dow and Shireen Davies, and supported by grant BB/K019953/1 from the BBSRC. The biological work was performed by Sue Krause and Gayle Overend (midgut regions), the bioinformatics, database and web construction by David Leader, and the microRNA analysis by Aniruddha Pandit and Ahmed Khalid Omar. RNA sequencing was carried out by Edinburgh Genomics, partly supported through core grants from NERC (R8/H10/56), MRC(MR/K001744/1) and BBSRC (BB/J004243/1). We thank the support teams at FlyBase and UCSC Browser (genome.ucsc.edu) for help in making use of their facilities.
Data Availability
• RNA-Seq data have been deposited at the European Nucleotide Archive under accession numbers PRJEB22205 (Main), PRJEB11865 (Midgut) and PRJEB48667 (Garland cells).
• The latest version of the FlyAtlas 2 database can be downloaded from:
motif.mvls.gla.ac.uk/downloads/FlyAtlas2_2024.01.08.sql .
• An Excel workbook containing the gene data can be downloaded from:
motif.mvls.gla.ac.uk/downloads/FlyAtlas2_gene_data.xlsx .
Linking to FlyAtlas 2
You are welcome to link to FlyAtlas 2 from your own website.
If you wish to link directly to the results of a search for a specific gene you need to provide the two parameters, GENE_ID and ID_TYPE, in a query of the type:

https://motif.mvls.gla.ac.uk/FlyAtlas2/index.html?search=gene&gene=GENE_ID&idtype=ID_TYPE

where:
GENE_ID is the identifier appropriate to the ID type.
ID_TYPE can be any of ‘fbgn’, ‘cgnum’, ‘symbol’ or ‘name’.
e.g.

https://motif.mvls.gla.ac.uk/FlyAtlas2/index.html?search=gene&gene=FBgn0016075&idtype=fbgn

https://motif.mvls.gla.ac.uk/FlyAtlas2/index.html?search=gene&gene=CG16858&idtype=cgnum

https://motif.mvls.gla.ac.uk/FlyAtlas2/index.html?search=gene&gene=vkg&idtype=symbol

https://motif.mvls.gla.ac.uk/FlyAtlas2/index.html?search=gene&gene=viking&idtype=name

Direct Data Download
The results tables for individual genes and transcripts can be downloaded directly without the need for a web browser. Automating this should allow batch downloads, not currently available through the web interface. To do this you need to provide the two parameters, FBGN and DATA_TYPE, in a query of the type:

https://motif.mvls.gla.ac.uk/FA2Direct/index.html?fbgn=FBGN&tableOut=DATA_TYPE

where:
FBGN is the FlyBase gene identifier only.
DATA_TYPE can be ‘gene’, ‘transcriptGene’, ‘mir’ or ‘transcriptMir’; i.e. for gene or transcript data for normal genes or microRNAs.
e.g.

https://motif.mvls.gla.ac.uk/FA2Direct/index.html?fbgn=FBgn0016075&tableOut=gene

https://motif.mvls.gla.ac.uk/FA2Direct/index.html?fbgn=FBgn0016075&tableOut=transcriptGene

https://motif.mvls.gla.ac.uk/FA2Direct/index.html?fbgn=FBgn0262177&tableOut=mir

https://motif.mvls.gla.ac.uk/FA2Direct/index.html?fbgn=FBgn0262177&tableOut=transcriptMir