General FAQ about AraPheno
starWhat is AraPheno?
AraPheno is a central repository for population scale phenotype data for the model plant Arabidopsis thaliana
starIs the data in AraPheno public?
Data on AraPheno is public. Please cite the phenotype, original study of the phenotype as well as AraPheno if you use any data from this database.
starWhich information is stored in AraPheno?
This database contains public phenotype data from different studies. RNASeq data was also added recently.
starWhich data formats are supported?
AraPheno supports a variety of different data formats, including CSV, JSON, PLINK and ISA-TAB format.
starIs it possible to download the phenotype data?
Yes phenotype data can be download at the individual phenotype views. You can choose if you like to download the phenotypic meta-information, or the actual phenotype values. For this purpose, you can choose different formats, including CSV, JSON and PLINK.
starCan I download the full database?
Yes, if you click on the download database link in the home page. This will generate a zip file containing a csv file with a list of the studies (and their details) as well as one folder per study, with the study id as the folder name. Each folder contains information about the study's phenotypes as well as the values both in csv and plink format.
starShould I upload mean/average values or replicates?
Whenever replicate values are available, you should upload the replicate values and not the averages/means. Both submission formats (ISA-TAB and PLINK) support uploading replicate values.
starIs it possible to preserve the replicate information across multiple phenotypes?
Yes, it is possible to preserve the specific value of each replicate across multiple phenotypes. In case of PLINK or CSV just repeat the FID (accession_id) multiple times and add a an arbitary number into the IID (replicate_id) column or alternatively leave it empty (it is not used by AraPheno).
For PLINK this should look as follows:
FID IID pheno1 pheno2 6909 1 24.5 100.2 6909 2 23.2 101.5 6909 3 25.2 99.4 6414 4 5.4 10.4 6414 5 11.2 6414 6 4.2 9.8 ...
For CSV this should look as follows:
accession_id,replicate_id,pheno1,pheno2 6909, 1, 24.5, 100.2 6909, 2, 23.2, 101.5 6909, 3, 25.2, 99.4 6414, 4, 5.4, 10.4 6414, 5, , 11.2 6414, 6, 4.2, 9.8 ...
The main difference between PLINK and CSV is that PLINK uses a space as a delimiter and CSV uses a comma. Addtionally the headers are different.
Empty values are encoded as empty cells in both CSV and PLINK (see pheno1 for accession_id/FID: 6414 and replicate_id/IID: 5)
AraPheno will create separate replicate values for each accession and make sure that for example replicate 1 of 6909 has 24.6 for pheno1 and 100.2 for pheno2.
This also works for the ISA-TAB format.
starWhy is the RNASeq on a separate page?
RNASeq experiments usually generate a lot of data, hence treating each gene in an RNASeq experiment as a separate phenotype on AraPheno would completely overshadow the other reported phenotypes. Therefore, we report RNASeq data separately, but in a similar way.