Requirement #4625 (new)
OMERO/HIC data storage & analysis
|Reported by:||jamoore||Owned by:|
|Cc:||jburel, cxallan, jrswedlow||Business Value:||n.a.|
|Total Story Points:||n.a.||Roif:||n.a.|
|Mandatory Story Points:||n.a.|
Description (last modified by jmoore) (diff)
The initial data to be used for the first demo(s) will be some representative subset of the GoDARTs data provided by Andy and Alison (step 0). This will approximate the project sets exported to "research data centers". This data will get loaded into OMERO.tables (step 1) via a command-line script. Some work has been done on a generic loader, and may be re-usable for this task. Otherwise, a custom script will be written.
Once the data is in OMERO, another command-line script will be written to export the OMERO.tables data to a CSV file (step 2). This represents the current state of the researchers' workflow: easy to implement on the OMERO side but does not add any security to the system. The script should include functionality for choosing columns and filtering the exported data. The usability of this script should be validated by the researchers. Other options may need to be added: exporting to TSV, XLS; more advanced querying; etc.
The next steps will work to add security constraints to full export to support the Safe Haven requirements. Any API methods which will be used by partial export will have auditing added (step 3) so that it is clear which researchers have accessed what data. Further, classes of authorization will be added to each column in the data set (step 4). Levels may include (from least to most secure): full access, aggregations, aggregations without outliers, correlations, absolute subset, admin-only access. With column security in place, full export can be disabled, leaving only partial data export (step 5). At this point, the researchers should again be asked for user-feedback to determine what features must be added to make this modified workflow still viable for them.
The final step (6) for the initial phase is then to allow researchers to submit a script for execution on the entire data set.
- we will work with an anonymised subset of the GoDARTS data as the basis for the pilot
- we will define a couple of analyses that can be demonstrated
- we will define the requirements, stories and tasks
- we will work on porting a couple of key scripts (e.g. genome imputation, date of death validation, drug exposure)
- the initial focus (i.e. for July) will be on developing basic tool(s) for data loading, querying, and export all of which are to be audited; the researchers will then use the exports as usual with their preferred tools.
- the overall focus will be on developing APIs for a key set of defined tools, (e.g. R, STATA, PLINK, VCFTOOLS) so that these can interact directly with the OMERO architecture
- Summary changed from Safe Haven data storage & analysis to OMERO/HIC data storage & analysis
Changed 2 years ago by adjudson
- attachment Safe Haven - Omero Model Project Outline 20110301.pdf added