BIGA does not support query for unharmonized data from GWAS Catalog due to their diverse columns. But you can do BIGA analysis by uploading the data if the original summary statistics data is available. As shown in the figure below, the harmonised folder is missing but tsv file is available.
To use this summary statistics, coronary artery disease data, you can download the original tsv file. The file size is around 3GB, which exceeds the limit of 600 MB of BIGA's input file. We recommend you to use linux `awk` command to extract those necessary columns like snp, chromosome, position, a1, a2, eaf, se, odds_ratio, pvalue, se in the file.
The original file in coronary artery disease includes the following columns:
p_value chromosome base_pair_location effect_allele other_allele effect_allele_frequencyodds_ratio beta standard_error markername freqse minfreq maxfreq direction hetisqhetchisq hetdf hetpval cases effective_cases n meta_analysis
You can use the following command to extract desired columns and gzip the file.
awk -F'\t' 'BEGIN {OFS="\t"} {if(NR==1) print "p_value", "base_pair_location", "effect_allele", "other_allele", "effect_allele_frequency", "odds_ratio", "beta", "standard_error", "n"; else print $1, $3, $4, $5, $6, $7, $8, $9, $NF}' your_file.tsv | gzip > output_file.tsv.gz
Explanation of the command:
-F'\t'
: Sets the field separator to a tab character for TSV files.
BEGIN {OFS="\t"}
: Sets the output field separator to a tab character.
if(NR==1) ...
: If processing the first row (header), it prints the column names you want to keep.
else print $1, $3, $4, $5, $6, $7, $8, $9, $NF: For data rows, it prints the specified columns. Here, $NF represents the last field of each row, assuming 'n' is in the last column. If 'n' is not the last column, replace $NF with the correct column number for 'n'.
your_file.tsv: This should be replaced with the path to your TSV file.
| gzip > output_file.tsv.gz
: This pipes the awk output to gzip, compressing it, and writes the compressed data to output_file.tsv.gz. Replace output_file.tsv.gz with your desired output file name.
Then you can upload the gzipped data to do a BIGA analysis.
If the trait in IEU OpenGWAS does not have download buttons as in the figure below, it means that the data is not downloadable. BIGA is not able to query this kind of data.
Description: Problems encountered during the processing of your uploaded data, potential reasons:
1. Incomplete Compressed Files: The file may not have completed saving or compressing before upload.
2. Corrupted Files: Files may become damaged during download, storage, or transfer.
Resolution: Please re-compress and save your summary statistics data before re-uploading.
Description: Missing required columns in your file.
Resolution: Please check the columns in your uploaed data against our input data column requirements, which can be found here.
Description: Significant missing values in critical fields such as beta and p-value columns.
Resolution: Please double-check your data for completeness before uploading. We will provide detailed information in the log file about specific columns that have significant missing values.
Description: Duplicate columns detected.
Resolution: Please remove duplicate columns or specify the correct column name in the Summary statistics file column names
section on our upload page.
Description: Unable to download queried data from GWAS Catalog.
Resolution: The download link may not exist. The Trait ID could be incorrect or may not exist in the GWAS Catalog harmonized database. We are unable to process the data further.
Description: Failed to execute the data query from the GWAS Catalog. This is a general error that occurs when data cannot be queried from the GWAS Catalog.
Resolution: Please provide us with the detailed log file for further diagnosis on our Google Forum: BIGA GWAS.
Description: Missing necessary columns from GWAS Catalog dataset, possibly leading to issues in column recognition.
Harmonized data may exhibit missing values or use outdated formats.
Resolution: Since necessary column is missing in original GWAS Catalog data, we are unable to process the data further.
Description: Necessary columns have missing values exceeding 50%. Some harmonized data from the GWAS Catalog may still exhibit significant missing values in critical fields, such as both beta and odds ratio.
Resolution: Since the necessary columns have missing values in the original GWAS Catalog data, we are unable to process the data further.
Description: Missing required columns in the dataset from IEU OpenGWAS.
Resolution: Since a necessary column is missing in the original IEU OpenGWAS data, we are unable to process the data further. We found that some data only lack the AF column, which is essential for harmonization. However, we will consider making this column optional in future analyses.
Description: Necessary columns have missing values exceeding 50%.
Resolution: Since the necessary columns have missing values in the original IEU OpenGWAS data, we are unable to process the data further.
Description: Possible non-existent URL, unable to download queried data from IEU OpenGWAS.
Resolution: The download link may not exist. Please ensure that your input trait ID is accurate before querying. Double-check for typing errors, such as extra underscores or other mistakes.
Description: Failed to execute the data query from the IEU OpenGWAS. This is a general error that occurs when data cannot be queried from the IEU OpenGWAS.
Resolution: Please provide us with the detailed log file for further diagnosis on our Google Forum: BIGA GWAS.
Description: Failure in executing the data query from Neale Lab.
Resolution: The download link may not exist. Please ensure that your input trait ID is accurate before querying. Double-check for typing errors, such as extra underscores or other mistakes..