This report summarizes the results obtained from running correlation computations on the exon dataset (UMUTAffyExon_0209_RMA) using the `correlation_rust` program.
The focus of this report is the workflow that successfully executed and the observed performance when computing correlations.
The dataset is stored in LMDB and was converted to CSV for processing.
Dataset structure:
Each row represents a trait and each column represents a sample.
wc -l output.csv 1236087 output.csv
-rw-r--r-- 1 alexm alexm 867M Mar 11 03:57 output.csv
The generated CSV file is approximately **867 MB** and contains **1,236,087 rows**, each corresponding to a trait.
Example row:
5.17816,5.04923,6.71493,5.52693,5.02245,5.21265,5.51605,5.40495,...
Each row contains **93 values**, corresponding to the samples.
The workflow that successfully ran:
Program execution command:
cargo run ./tests/data/sample_json_file.json
Two correlation methods were executed:
Both computations were run using Cargo in **debug** and **release** modes.
Debug execution:
Finished `dev` profile [unoptimized + debuginfo] Elapsed: 50.63s
Release execution:
Finished `release` profile [optimized] Elapsed: 10.13s
Debug execution:
Finished `dev` profile [unoptimized + debuginfo] Elapsed: 59.92s
Release execution:
Finished `release` profile [optimized] Elapsed: 19.31s
The correlation program successfully processed the exon dataset containing more than **1.2 million traits**.
Key observations:
The workflow of converting LMDB data to CSV and running the Rust correlation program completed successfully and produced the expected results.
The experiment demonstrated that the `correlation_rust` program can compute correlations on a large dataset with:
Observed execution times:
The LMDB → CSV workflow successfully enabled correlation computation across the full dataset.