Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #12217 (closed)

Opened 10 years ago

Closed 9 years ago

RFE: in-place import speed (ln_s)

Reported by: mtbcarroll Owned by: mtbcarroll
Priority: critical Milestone: 5.1.2
Component: Import Version: 5.0.1
Keywords: n.a. Cc: ux@…, rleigh, fs@…, mlinkert
Resources: n.a. Referenced By: https://trello.com/c/EOL8YL14/54-bug-import
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

At http://pastebin.com/njqqv00A I have pasted my experiences importing a 273Mb MIAS plate from a local NTFS filesystem mounted under Linux via FUSE. Normal command-line import takes just over two minutes. However, I then add a --transfer=ln_s and import takes nearly fourteen minutes.

What's going on? Could accesses to the files somehow be improved, or should we warn in documentation about this side of in-place import? I shudder to think what the slowdown might have been had the filesystem been remotely mounted. (I would do that experiment too, but don't think I have time today!)

Attachments (7)

session.log (204.2 KB) - added by mtbcarroll 10 years ago.
performance-cellworx.text.xz (30.7 KB) - added by mtbcarroll 10 years ago.
CIFS-mount cellworx import well over twice as slow with ln_s
12217_logs.zip (123.6 KB) - added by sbesson 9 years ago.
Log files for the MIAS dataset import with/without symlinks
hang-stack (21.2 KB) - added by mtbcarroll 9 years ago.
stack trace during slow import
hang-stack.2 (21.2 KB) - added by mtbcarroll 9 years ago.
here's another in case the comparison is useful
12217_logs_2.zip (125.1 KB) - added by sbesson 9 years ago.
Log files for the MIAS dataset import with/without symlinks with CIFS mount
12217_logs_3.zip (92.3 KB) - added by sbesson 9 years ago.
Log files for the MIAS dataset import with/without symlinks with CIFS mount post unmounting/remounting

Download all attachments as: .zip

Change History (48)

comment:1 Changed 10 years ago by jamoore

  • Cc fs@… added
  • Milestone changed from Unscheduled to 5.0.2
  • Owner jamoore deleted
  • Priority changed from minor to critical

We should at least know if this is something critical for 5.0.2. If not, it can be pushed.

comment:2 Changed 10 years ago by jburel

Somebody looking at it? nobody assigned.

comment:3 Changed 10 years ago by mtbcarroll

The plate I used was data_repo/from_skyking/mias/frans/siRNA_PRIM1_03102008/001-365700055641.

comment:4 Changed 10 years ago by bpindelski

Tested on an Ubuntu laptop. Images stored on desktop. Normal bin/omero import took 2m46.342s, using in-place logic - 2m43.599s. Can't observe a difference in import times, as Mark did.

comment:5 Changed 10 years ago by mtbcarroll

What the significance of "desktop" here? Is it on a remote mount?

comment:6 Changed 10 years ago by bpindelski

"desktop" means the /home/<user>/Desktop folder. No, it is not on a remote mount, it is on my laptop's built-in HDD, which uses the ext4 FS. My testing only proves that there shouldn't be any import lag due to using in-place import. If I had a setup with a remote mount, I'd first verify the speed difference between the local HDD and the mount using some big file and copying it over. What I'm suggesting is that maybe the combination of NTFS, FUSE and LSC network might have caused some speed issues.

comment:7 Changed 10 years ago by jamoore

I'd certainly be interested in knowing if there's a short-list of components that shows the same behavior. But if others have not seen it in practice, probably happy to have this pushed to 5.0.3.

comment:8 Changed 10 years ago by mtbcarroll

I have tried it again in a hopefully easily reproduced fashion:

  1. I have squig mounted over CIFS.
  2. I import data_repo/from_skyking/mias/frans/siRNA_PRIM1_03102008/001-365700055641 normally and it takes 2m33.
  3. I import it again with --transfer=ln_s and it takes 8m49.

Will attach log of session.

Changed 10 years ago by mtbcarroll

comment:9 Changed 10 years ago by mtbcarroll

(This was at work on the LSC wired LAN.)

comment:10 Changed 10 years ago by jamoore

Interesting. Looking at the elapsed times from your log split into a "copy" and a "soft"(link) file with

jamoore@blue:/tmp$ cat s.py 
#!/usr/bin/python
import fileinput
import datetime
import time

# ['2014-05-08', '11:58:22,463', '528809']
previous = None
for line in fileinput.input():
    parts = line.split(" ")
    t1 = " ".join(parts[0:2])
    t2 = time.strptime(t1, "%Y-%m-%d %H:%M:%S,%f")
    if previous != None:
        print "elapsed: %4s  " % \
            ((t2.tm_min - previous.tm_min) * 60 + (t2.tm_sec - previous.tm_sec)),
        print line,
    previous = t2

we see:

jamoore@blue:/tmp$ ./s.py soft.log | grep Step:
elapsed:  3     2014-05-08 11:50:24,043 50389      [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 207
elapsed:  202   2014-05-08 11:53:46,127 252473     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 207
elapsed:  275   2014-05-08 11:58:21,796 528142     [l.Client-4] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 207
elapsed:  1     2014-05-08 11:58:22,147 528493     [l.Client-7] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 207
elapsed:  0     2014-05-08 11:58:22,172 528518     [l.Client-4] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 207
jamoore@blue:/tmp$ ./s.py copy.log | grep Step:
elapsed:  3     2014-05-08 11:47:20,683 59062      [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 44
elapsed:  39    2014-05-08 11:47:59,735 98114      [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 44
elapsed:  46    2014-05-08 11:48:45,028 143407     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 44
elapsed:  1     2014-05-08 11:48:46,654 145033     [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 44
elapsed:  0     2014-05-08 11:48:46,679 145058     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 44

Frequencies of times are:

jamoore@blue:/tmp$ ./s.py soft.log | cut -b9-13 | sort -n | uniq -c
    393     0
     42     1
      2     2
      2     3
      1   202
      1   275

and

jamoore@blue:/tmp$ ./s.py copy.log | cut -b9-13 | sort -n | uniq -c
    383     0
     52     1
      1     2
      2     3
      1    39
      1    46

comment:11 Changed 10 years ago by jamoore

Sorry, that should have printed out the *previous* lines, so for soft:

elapsed:  202   2014-05-08 11:50:24,043 50389      [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 207
elapsed:  275   2014-05-08 11:53:46,127 252473     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 207

and copy

elapsed:   39   2014-05-08 11:47:20,683 59062      [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 44
elapsed:   46   2014-05-08 11:47:59,735 98114      [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 44

comment:12 Changed 10 years ago by mtbcarroll

A concern is what else takes rather longer too: Export? Caching of big image tiles? Etc.

Changed 10 years ago by mtbcarroll

CIFS-mount cellworx import well over twice as slow with ln_s

comment:13 Changed 10 years ago by jamoore

What's the next step here? Is anyone still looking into this? (If so, please take the ticket). Do we need to count this as a blocker? If so, speak up post-haste.

comment:14 Changed 10 years ago by mlinkert

  • Cc mlinkert added

I'd argue that this falls under the category of "slow things are slow", and so probably shouldn't block 5.0.2. #6586 is likely related. Push to 5.0.3 and then take the time to properly sort out what else might be affected?

comment:16 Changed 10 years ago by jamoore

  • Milestone changed from 5.0.2 to 5.0.3
  • Priority changed from critical to blocker

Pushing to 5.0.3 and elevating so that we decide on a course of action.

comment:17 Changed 10 years ago by jburel

comment:18 Changed 10 years ago by mlinkert

Worth seeing if https://github.com/openmicroscopy/openmicroscopy/pull/2758 has any effect on this?

comment:19 Changed 10 years ago by mtbcarroll

With that PR, and the Bio-Formats PR it references, merged in, repeating the experiment from https://trac.openmicroscopy.org.uk/ome/ticket/12217#comment:8, without --transfer=ln_s it took 2m08, with --transfer=ln_s it took 6m54.

comment:20 Changed 10 years ago by jamoore

If the only way to improve this further is to somehow reduce the number of symlink lookups, is there going to be any chance of tackling this in 5.0.3, Melissa? If not, we might should push to 5.1 at this point.

comment:21 Changed 10 years ago by mlinkert

I'm not likely to be able to do any serious work on this for 5.0.3, so +1 to pushing given there is already an improvement.

comment:22 Changed 10 years ago by jamoore

  • Milestone changed from 5.0.3 to 5.1.0-m1
  • Summary changed from RFE: in-place import speed to RFE: in-place import speed (ln_s)

comment:23 Changed 10 years ago by bpindelski

Tested with http://ci.openmicroscopy.org/view/5.1/job/OMERO-5.1-merge-build/ICE=3.4,label=ome-c6100-3/245/.

Uploads using mias from ~/Desktop/import/testing/mias/siRNA_PRIM1_03102008 to /OMERO:

  • without ln_s - 1m49.904s
  • with ln_s - 1m37.502s

Uploads using mias from /Volumes/bpindelski/WINDOWS/mias/frans/siRNA_PRIM1_03102008 (smbfs) to /OMERO:

  • without ln_s - 1m49.580s
  • with ln_s - 1m57.512s

It seems this issue doesn't exist on develop. Is it worth closing the ticket?

comment:24 Changed 10 years ago by mtbcarroll

Will re-test when not blocked by #12530.

comment:25 Changed 10 years ago by mtbcarroll

I vote against closing: Using JAVA_OPTS=-Domero.import.active_close=false from http://trac.openmicroscopy.org.uk/ome/ticket/12530#comment:4 and repeating the experiment from https://trac.openmicroscopy.org.uk/ome/ticket/12217#comment:8, without --transfer=ln_s it took 1m50, with --transfer=ln_s it took 8m11.

If this ticket does get fixed, we may want to adjust the language on http://www.openmicroscopy.org/site/support/omero5.1-staging/sysadmins/in-place-import.html about plate imports possibly running "rather more slowly", though it may remain wise to encourage relative performance profiling.

comment:26 Changed 10 years ago by jamoore

  • Milestone changed from 5.1.0-m1 to 5.1.0

comment:27 Changed 9 years ago by mtbcarroll

https://github.com/openmicroscopy/bioformats/pull/1522 shaves a couple of minutes off that time difference.

comment:28 Changed 9 years ago by jamoore

  • Milestone changed from 5.1.0 to 5.1.1

comment:29 Changed 9 years ago by sbesson

  • Owner set to sbesson
  • Status changed from new to accepted

comment:30 Changed 9 years ago by sbesson

Tested with today's merge build under the Linux integration server. Ran the import of the MIAS plate linked above with/without symlinks and with/without import steps skipping. Each import was run 5 consecutive times and the mean import time, including transfer and processing, is reported below:

Import type Skip Mean import time (+/- SD)
copy 1min15s (+/- 7s)
copy all 33s (+/- 2s)
symlink 1min13s (+/- 2s)
symlink all 31s (+/- 2s)

So it seems the symlink overhead has been worked around at least for this type of format on 5.1.x. Additionally we now expose a fast import mode for skipping data access in the PIXELDATA import step. Unless we have a reproducible metrics to work against, I suggest we close this ticket.

Changed 9 years ago by sbesson

Log files for the MIAS dataset import with/without symlinks

comment:31 Changed 9 years ago by mtbcarroll

On what kind of filesystem mount was the MIAS plate?

comment:32 Changed 9 years ago by sbesson

This is squig over NFS (as mounted by default on the CI node).

comment:33 Changed 9 years ago by mtbcarroll

And it looks like ln_s instead of normal upload made almost no difference? I'll try re-testing locally.

comment:34 Changed 9 years ago by mtbcarroll

Using 3d44990f on my work laptop with my wired ethernet card in CTIR:

In an import test of squig over CIFS, normal upload took 1m56 then with ln_s took ... well, I've been looking at "METADATA_IMPORTED Step: 1 of 5" for over an hour now and network activity on my laptop remains heavy.

Mount options are, rw,nosuid,nodev,relatime,vers=1.0,cache=strict,username=mtbcarroll,domain=LIFESCI-AD,uid=1000,forceuid,gid=1000,forcegid,addr=134.36.66.3,file_mode=0664,dir_mode=0775,nounix,serverino,rsize=61440,wsize=65536,actimeo=1

Trying local NTFS over FUSE again, normal upload took 52s then with ln_s took 58s. So, that's great!

Last edited 9 years ago by mtbcarroll (previous) (diff)

comment:35 Changed 9 years ago by jamoore

Mark: can you take stack traces at the hang please?

Changed 9 years ago by mtbcarroll

stack trace during slow import

Changed 9 years ago by mtbcarroll

here's another in case the comparison is useful

comment:36 Changed 9 years ago by sbesson

  • Milestone changed from 5.1.1 to 5.1.2
  • Priority changed from blocker to critical

Retested after mounting squig locally over CIFS and running the same import script as described aboce - see 12217_logs_2.zip. First symlink import is much slower than the data copy (7min30 for ln_s vs 2min for copy). But then every following symlink import on the same data is significantly faster (~30-40s server-side). This means most likely we are using the cold cache associated with the CIFS mount.

Unmounting and remounting the CIFS mount is sufficient to reproduce the same trend: first symlink import lasts ~7min30 vs ~50s for all subsequent symlink imports of the same dataset - see 12217_logs_3.zip.

Addressing this issue likely involves some profiling via perf4j to find out why the very first file access in the PixelData? step via symlink is much slower than copying the data locally. Pushing this ticket to 5.1.2.

Changed 9 years ago by sbesson

Log files for the MIAS dataset import with/without symlinks with CIFS mount

Changed 9 years ago by sbesson

Log files for the MIAS dataset import with/without symlinks with CIFS mount post unmounting/remounting

comment:37 Changed 9 years ago by jamoore

  • Owner changed from sbesson to mtbcarroll

Mark: does the above sound plausible? and if so, what do we do with this ticket? It doesn't seem to be directly actionable (we have other general performance cards)

comment:38 Changed 9 years ago by mtbcarroll

The timings I quoted tended to be at least for a second import of the same data in the same manner; I tried to control for cold cache issues so wasn't sensitive to them. I'm really not sure; it might be worth asking Roger.

This ticket was useful for capturing a specific reproducible egregious performance issue but, given that I commented,

Trying local NTFS over FUSE again, normal upload took 52s then with ln_s took 58s. So, that's great!

then I suppose it has now served its purpose and from my side I'm happy if something analogous to this 3d44990f I mentioned was actually merged. Does anyone know what it is? Should I retest?

Last edited 9 years ago by mtbcarroll (previous) (diff)

comment:39 Changed 9 years ago by jamoore

I sadly don't know what commit you mean, either, Mark. If you'd like to retest esp. with a warm cache, by all means, just as this stands it's unclear what the next step is.

comment:40 Changed 9 years ago by mtbcarroll

Testing the original issue I reported, without ln_s it takes 53s, with ln_s it takes 59s, so that's close to what the mystery commit achieved. I guess it is thus somehow in, and I am happy: from my point of view, this ticket is resolved.

comment:41 Changed 9 years ago by jamoore

  • Resolution set to fixed
  • Status changed from accepted to closed

Assuming you meant to click "fixed".

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.95489 sec.)

We're Hiring!