Three main types of search can be performed after selecting the appropriate section from the menu. Gene searches are made for a particular gene or list of genes using the names or identifiers assigned by FlyBase; Category searches are made for protein-coding genes using descriptive terms that are part of their gene ontologies; and Tissue searches produce a list of the most abundant or enriched protein-coding or microRNA genes in a particular tissue.
▷ Why You should use Autosuggest
For every field in which a gene name or identifier is entered there is an autocomplete facility based on the entries in the database underlying FlyAtlas 2. The autosuggest for a Symbol or Name starts after two characters, that for an Annotation symbol starts after five characters, and that for a FlyBase ID after nine. There are two reasons that you should use this. The first is that gene names are, of necessity, case-sensitive because there are some distinct Drosophila genes that differ only by case. The second is to save yourself the frustration of making a query, only to be told that the gene is not in the database. If the gene is not in the autocomplete menu you can be sure that it is not in the database.
▷ Greek Characters in Input
Autosuggest will add Greek characters (e.g. eIF-2β from eI), and will autocomplete after a single Greek character (e.g. βTub56D from β). Characters that start gene symbols or names are α β γ δ ε η ι κ λ θ ζ , and may be copied from here and pasted into a text box. Alternatively one can type the Latin equivalents (alpha, beta etc.).
▷ Abundance and Enrichment
The term, abundance, used in the original FlyAtlas, is only used in the Tissue section of FlyAtlas 2. Elsewhere it replaced by the specific units, either FPKM (Fragments Per Kilobase of transcript per Million mapped reads), or in the case of microRNA genes by TPM (normalized counts per million). The two are not comparable. The enrichment is a measure of the abundance of a gene in a particular tissue relative to that in the whole fly. This allows one to determine whether the expression of a particular gene is tissue-specific or not. In order to avoid division by zero and misleading values of enrichment, in calculating the latter it is necessary to set abundance values for whole flies that fall below the background threshold to that of the threshold. (2 FPKM or 200 TPM). This may result in quantitative underestimations, so users should inspect the data carefully. Enrichments are not presented for individual transcripts.
▷ How the Category Search works
The
default interface for the Category Search is based solely on gene ontology (GO) terms obtained from http://geneontology.org. Those in FlyAtlas2 are drawn from a file (go-basic.obo) that provides a listing of each entry that includes an id and name, e.g.
id: GO:0000019
name: regulation of mitotic recombination
A FlyBase file (gene_association.fb) is available that lists the GO ids assigned to each gene, and data from both files have been incorporated into the FlyAtlas2 relational database, allowing each FBgn to be correlated with zero or more ‘GO names’. GO entries for which there is no related FBgn have been removed from the database.
A FlyAtlas2 Category Search by ‘Category’ produces an autosuggest menu showing all the ‘GO names’ that include the letters typed, so that one may select one of interest. This does have the disadvantage that one cannot select multiple terms. If one chooses a ‘Free Search’, this includes all the corresponding GO names. The disadvantages here are that one may get very many ‘hits’ (or none) and that individual ‘GO names’ may sometimes contain the keyword in an overall context that is not relevant to one’s interest.
The alternative interface for the Category Search employs a different categorization — one derived from FlyBase ‘groups’. It offers a selection of 126 groups (ranging from 3 to 258 members) in a pull-down menu.
▷ About the Profile Search
The Profile Search compares the expression of one gene across different tissues with that of other genes in
Drosophila, and reports those that are most similar. It determines either of two statistical values — the Pearson or Spearman correlation coefficient,
r, explanation of which can be found on Wikipedia pages, e.g. https://en.wikipedia.org/wiki/Correlation_and_dependence. The output includes the values of
r and of
p, a statistical probability. The cut-off value for
r can be set by the user before starting the search. Currently searches for microRNAs are not supported.
There are several options for the tissues used in constructing the profile (
see below). The default — ‘adult and larval’ — is generally recommended, even though it lacks certain tissues because of computational memory restrictions. (Limiting common tissues to one sex also avoids bias to the adult.) The excluded tissues are present in the adult-only alternatives. The user should, of course, check that any alternative tissue selection used contains those tissues in which the query gene is expressed.
The Drosophila larval midgut is composed of five regions of different pH: the neutral gastric caeca/anterior region, the acidic region, the neutral region, the slightly acidic transitional region, and the posterior alkaline region. We prepared an RNA-Seq library from all these five regions in the course of studies on this topic, and the data are presented in this separate section of FlyAtlas 2, together with a graphic indicating their anatomical locations. Please note that MicroRNA data are not available for these midgut regions.
The colouring of cell backgrounds (heat mapping) is merely a visual aid to help users quickly identify the results of most interest. The colouring is not a surrogate for the numerical data, and in the case of transcripts, the user should download the tables to inspect values of interest. The scales have been devised to accomodate the particular ranges of values in the database. For abundance values the white-to-black spectrum illustrated below is used. For FPKMs this has a 15-step log1.6 scale, but all values below 2 FPKM (considered background) are coloured white, and no discrimination is employed for values above 1150 FPKM (pure black).
For the generally higher TPM values of microRNAs a log2.6 scale is used with no indication of background:
For enrichment values a divergent white/yellow/red spectrum is used:
This employs an asymmetric log scale running from white to yellow (0-1) to represent genes with ‘decreased enrichment’, and then from yellow through orange and a wider range of reds to encompass values of ‘increased enrichment’.
▷ Result Presentation Options
The default presentation of the table containing the results of a query has been simplified for initial ease of inspection. Checkboxes provide options of more detailed information. SD appends standard deviations (based on triplicates) to the values. Whole Body includes the abundance values for whole flies. Male v. Female shows the said ratio of abundances, in cases where the tissue is not sex-specific. Checkbox selections are retained in subsequent queries until changed.
▷ Correlating Gene and Transcript data
The table with the transcript data is, of necessity, quite compact. However it is possible to highlight the part of the transcript table corresponding to a particular tissue by clicking/tapping on the cell in question in the gene table. Click/tap again to turn the highlighting off. (This does not apply to microRNAs, where the transcripts are presented in a different manner.)
▷ Downloading Tables and Graphics
There are download icons below each of the tables of gene and transcript data. These generate files in tab-separated format with an extension ‘txt’. Such files will likely open in a text editor when double-clicked, but can be imported into open spreadsheet applications, such as MS Excel, LibreOffice or Apple's Numbers. The files can then be formatted and saved in the application’s native format, if required. The microRNA bar chart can be downloaded in SVG format for editing in a suitable application such as Adobe Illustrator or Inkscape.
The ‘name line’ above the results table contains a ‘link-out’ icon on its extreme right. Clicking this invokes a menu of options, the majority to third-party Drosophila resources. Note, however, that the first is to the corresponding gene results for FlyAtlas 1 — although only in cases where the gene is represented in that facility (e.g. not for RNA genes of any sort).
The ‘name line’ above the results table contains an item entitled ‘paralogues’. This allows one to view the results for the gene and any paralogues together (in the same format as for bulk searches). The list of paralogues is taken from FlyBase, and one should realize that it includes both close and distant homologues. The final cut-off is to some extent arbitrary, and it is ultimately up to the individual to perform his own sequence comparisons to decide on their biological significance.
Transcript results provide an option to ‘View in UCSC Genome Browser’. Clicking this gives an explanatory page with a link to the gene of interest in the UCSC Browser, with the reads from FlyAtlas 2 superimposed. This is useful both for assessing the quality of the data for oneself, and for checking unique exons that define particular transcripts. You are strongly advised to read the explanatory page the first time you use this. We think that the facility offered by the UCSC Browser is excellent, but its richness means that it does require some effort on the user’s part.
Clicking on the icons showing a question mark in a circle invokes in-page help about downloading tables and about correlating gene and transcript data, as explained above.