Requirement #4625 (new)

Opened 2 years ago

Last modified 3 months ago

OMERO/HIC data storage & analysis

Reported by: jamoore Owned by:
Priority: critical Milestone: Unscheduled
Component: General Keywords: n.a.
Cc: jburel, cxallan, jrswedlow Business Value: n.a.
Total Story Points: n.a. Roif: n.a.
Mandatory Story Points: n.a.

Description (last modified by jmoore) (diff)

The initial data to be used for the first demo(s) will be some representative subset of the GoDARTs data provided by Andy and Alison (step 0). This will approximate the project sets exported to "research data centers". This data will get loaded into OMERO.tables (step 1) via a command-line script. Some work has been done on a generic loader, and may be re-usable for this task. Otherwise, a custom script will be written.

Once the data is in OMERO, another command-line script will be written to export the OMERO.tables data to a CSV file (step 2). This represents the current state of the researchers' workflow: easy to implement on the OMERO side but does not add any security to the system. The script should include functionality for choosing columns and filtering the exported data. The usability of this script should be validated by the researchers. Other options may need to be added: exporting to TSV, XLS; more advanced querying; etc.

The next steps will work to add security constraints to full export to support the Safe Haven requirements. Any API methods which will be used by partial export will have auditing added (step 3) so that it is clear which researchers have accessed what data. Further, classes of authorization will be added to each column in the data set (step 4). Levels may include (from least to most secure): full access, aggregations, aggregations without outliers, correlations, absolute subset, admin-only access. With column security in place, full export can be disabled, leaving only partial data export (step 5). At this point, the researchers should again be asked for user-feedback to determine what features must be added to make this modified workflow still viable for them.

The final step (6) for the initial phase is then to allow researchers to submit a script for execution on the entire data set.


In summary:

  • we will work with an anonymised subset of the GoDARTS data as the basis for the pilot
  • we will define a couple of analyses that can be demonstrated
  • we will define the requirements, stories and tasks
  • we will work on porting a couple of key scripts (e.g. genome imputation, date of death validation, drug exposure)
  • the initial focus (i.e. for July) will be on developing basic tool(s) for data loading, querying, and export all of which are to be audited; the researchers will then use the exports as usual with their preferred tools.
  • the overall focus will be on developing APIs for a key set of defined tools, (e.g. R, STATA, PLINK, VCFTOOLS) so that these can interact directly with the OMERO architecture

References

Attachments

Safe Haven - Omero Model Project Outline 20110301.pdf Download (214.6 KB) - added by adjudson 2 years ago.

Change History

comment:1 Changed 2 years ago by jmoore

  • Summary changed from Safe Haven data storage & analysis to OMERO/HIC data storage & analysis

comment:2 Changed 2 years ago by jmoore

  • Description modified (diff)

comment:3 Changed 2 years ago by jburel

  • Cc jburel, szwells, cxallan, jrswedlow added

comment:4 Changed 2 years ago by jmoore

  • Milestone changed from OMERO-Beta4.3 to Unscheduled

Moving to Unscheduled since 4.3 is going into freeze. A new milestone may should be created just for this requirement.

Changed 2 years ago by adjudson

comment:5 Changed 3 months ago by jamoore

  • Cc szwells removed
  • Component set to General

All stories & tasks closed since no work is on-going.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.2-PRO © 2008-2010 agile42 all rights reserved (this page was served in: 0.139258 sec.)

We're Hiring!