Task #12217 (closed)
Opened 10 years ago
Closed 9 years ago
RFE: in-place import speed (ln_s)
Reported by: | mtbcarroll | Owned by: | mtbcarroll |
---|---|---|---|
Priority: | critical | Milestone: | 5.1.2 |
Component: | Import | Version: | 5.0.1 |
Keywords: | n.a. | Cc: | ux@…, rleigh, fs@…, mlinkert |
Resources: | n.a. | Referenced By: | https://trello.com/c/EOL8YL14/54-bug-import |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description
At http://pastebin.com/njqqv00A I have pasted my experiences importing a 273Mb MIAS plate from a local NTFS filesystem mounted under Linux via FUSE. Normal command-line import takes just over two minutes. However, I then add a --transfer=ln_s and import takes nearly fourteen minutes.
What's going on? Could accesses to the files somehow be improved, or should we warn in documentation about this side of in-place import? I shudder to think what the slowdown might have been had the filesystem been remotely mounted. (I would do that experiment too, but don't think I have time today!)
Attachments (7)
Change History (48)
comment:1 Changed 10 years ago by jamoore
- Cc fs@… added
- Milestone changed from Unscheduled to 5.0.2
- Owner jamoore deleted
- Priority changed from minor to critical
comment:2 Changed 10 years ago by jburel
Somebody looking at it? nobody assigned.
comment:3 Changed 10 years ago by mtbcarroll
The plate I used was data_repo/from_skyking/mias/frans/siRNA_PRIM1_03102008/001-365700055641.
comment:4 Changed 10 years ago by bpindelski
Tested on an Ubuntu laptop. Images stored on desktop. Normal bin/omero import took 2m46.342s, using in-place logic - 2m43.599s. Can't observe a difference in import times, as Mark did.
comment:5 Changed 10 years ago by mtbcarroll
What the significance of "desktop" here? Is it on a remote mount?
comment:6 Changed 10 years ago by bpindelski
"desktop" means the /home/<user>/Desktop folder. No, it is not on a remote mount, it is on my laptop's built-in HDD, which uses the ext4 FS. My testing only proves that there shouldn't be any import lag due to using in-place import. If I had a setup with a remote mount, I'd first verify the speed difference between the local HDD and the mount using some big file and copying it over. What I'm suggesting is that maybe the combination of NTFS, FUSE and LSC network might have caused some speed issues.
comment:7 Changed 10 years ago by jamoore
I'd certainly be interested in knowing if there's a short-list of components that shows the same behavior. But if others have not seen it in practice, probably happy to have this pushed to 5.0.3.
comment:8 Changed 10 years ago by mtbcarroll
I have tried it again in a hopefully easily reproduced fashion:
- I have squig mounted over CIFS.
- I import data_repo/from_skyking/mias/frans/siRNA_PRIM1_03102008/001-365700055641 normally and it takes 2m33.
- I import it again with --transfer=ln_s and it takes 8m49.
Will attach log of session.
Changed 10 years ago by mtbcarroll
comment:9 Changed 10 years ago by mtbcarroll
(This was at work on the LSC wired LAN.)
comment:10 Changed 10 years ago by jamoore
Interesting. Looking at the elapsed times from your log split into a "copy" and a "soft"(link) file with
jamoore@blue:/tmp$ cat s.py #!/usr/bin/python import fileinput import datetime import time # ['2014-05-08', '11:58:22,463', '528809'] previous = None for line in fileinput.input(): parts = line.split(" ") t1 = " ".join(parts[0:2]) t2 = time.strptime(t1, "%Y-%m-%d %H:%M:%S,%f") if previous != None: print "elapsed: %4s " % \ ((t2.tm_min - previous.tm_min) * 60 + (t2.tm_sec - previous.tm_sec)), print line, previous = t2
we see:
jamoore@blue:/tmp$ ./s.py soft.log | grep Step: elapsed: 3 2014-05-08 11:50:24,043 50389 [l.Client-1] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5 Logfile: 207 elapsed: 202 2014-05-08 11:53:46,127 252473 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5 Logfile: 207 elapsed: 275 2014-05-08 11:58:21,796 528142 [l.Client-4] INFO ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5 Logfile: 207 elapsed: 1 2014-05-08 11:58:22,147 528493 [l.Client-7] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5 Logfile: 207 elapsed: 0 2014-05-08 11:58:22,172 528518 [l.Client-4] INFO ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5 Logfile: 207 jamoore@blue:/tmp$ ./s.py copy.log | grep Step: elapsed: 3 2014-05-08 11:47:20,683 59062 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5 Logfile: 44 elapsed: 39 2014-05-08 11:47:59,735 98114 [l.Client-1] INFO ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5 Logfile: 44 elapsed: 46 2014-05-08 11:48:45,028 143407 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5 Logfile: 44 elapsed: 1 2014-05-08 11:48:46,654 145033 [l.Client-1] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5 Logfile: 44 elapsed: 0 2014-05-08 11:48:46,679 145058 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5 Logfile: 44
Frequencies of times are:
jamoore@blue:/tmp$ ./s.py soft.log | cut -b9-13 | sort -n | uniq -c 393 0 42 1 2 2 2 3 1 202 1 275
and
jamoore@blue:/tmp$ ./s.py copy.log | cut -b9-13 | sort -n | uniq -c 383 0 52 1 1 2 2 3 1 39 1 46
comment:11 Changed 10 years ago by jamoore
Sorry, that should have printed out the *previous* lines, so for soft:
elapsed: 202 2014-05-08 11:50:24,043 50389 [l.Client-1] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5 Logfile: 207 elapsed: 275 2014-05-08 11:53:46,127 252473 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5 Logfile: 207
and copy
elapsed: 39 2014-05-08 11:47:20,683 59062 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5 Logfile: 44 elapsed: 46 2014-05-08 11:47:59,735 98114 [l.Client-1] INFO ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5 Logfile: 44
comment:12 Changed 10 years ago by mtbcarroll
A concern is what else takes rather longer too: Export? Caching of big image tiles? Etc.
comment:13 Changed 10 years ago by jamoore
What's the next step here? Is anyone still looking into this? (If so, please take the ticket). Do we need to count this as a blocker? If so, speak up post-haste.
comment:14 Changed 10 years ago by mlinkert
- Cc mlinkert added
I'd argue that this falls under the category of "slow things are slow", and so probably shouldn't block 5.0.2. #6586 is likely related. Push to 5.0.3 and then take the time to properly sort out what else might be affected?
comment:15 Changed 10 years ago by mtbcarroll
Added warning in https://github.com/openmicroscopy/ome-documentation/pull/780.
comment:16 Changed 10 years ago by jamoore
- Milestone changed from 5.0.2 to 5.0.3
- Priority changed from critical to blocker
Pushing to 5.0.3 and elevating so that we decide on a course of action.
comment:17 Changed 10 years ago by jburel
- Referenced By set to https://trello.com/c/EOL8YL14/54-bug-import
comment:18 Changed 10 years ago by mlinkert
Worth seeing if https://github.com/openmicroscopy/openmicroscopy/pull/2758 has any effect on this?
comment:19 Changed 10 years ago by mtbcarroll
With that PR, and the Bio-Formats PR it references, merged in, repeating the experiment from https://trac.openmicroscopy.org.uk/ome/ticket/12217#comment:8, without --transfer=ln_s it took 2m08, with --transfer=ln_s it took 6m54.
comment:20 Changed 10 years ago by jamoore
If the only way to improve this further is to somehow reduce the number of symlink lookups, is there going to be any chance of tackling this in 5.0.3, Melissa? If not, we might should push to 5.1 at this point.
comment:21 Changed 10 years ago by mlinkert
I'm not likely to be able to do any serious work on this for 5.0.3, so +1 to pushing given there is already an improvement.
comment:22 Changed 10 years ago by jamoore
- Milestone changed from 5.0.3 to 5.1.0-m1
- Summary changed from RFE: in-place import speed to RFE: in-place import speed (ln_s)
comment:23 Changed 10 years ago by bpindelski
Tested with http://ci.openmicroscopy.org/view/5.1/job/OMERO-5.1-merge-build/ICE=3.4,label=ome-c6100-3/245/.
Uploads using mias from ~/Desktop/import/testing/mias/siRNA_PRIM1_03102008 to /OMERO:
- without ln_s - 1m49.904s
- with ln_s - 1m37.502s
Uploads using mias from /Volumes/bpindelski/WINDOWS/mias/frans/siRNA_PRIM1_03102008 (smbfs) to /OMERO:
- without ln_s - 1m49.580s
- with ln_s - 1m57.512s
It seems this issue doesn't exist on develop. Is it worth closing the ticket?
comment:24 Changed 10 years ago by mtbcarroll
Will re-test when not blocked by #12530.
comment:25 Changed 10 years ago by mtbcarroll
I vote against closing: Using JAVA_OPTS=-Domero.import.active_close=false from http://trac.openmicroscopy.org.uk/ome/ticket/12530#comment:4 and repeating the experiment from https://trac.openmicroscopy.org.uk/ome/ticket/12217#comment:8, without --transfer=ln_s it took 1m50, with --transfer=ln_s it took 8m11.
If this ticket does get fixed, we may want to adjust the language on http://www.openmicroscopy.org/site/support/omero5.1-staging/sysadmins/in-place-import.html about plate imports possibly running "rather more slowly", though it may remain wise to encourage relative performance profiling.
comment:26 Changed 10 years ago by jamoore
- Milestone changed from 5.1.0-m1 to 5.1.0
comment:27 Changed 9 years ago by mtbcarroll
https://github.com/openmicroscopy/bioformats/pull/1522 shaves a couple of minutes off that time difference.
comment:28 Changed 9 years ago by jamoore
- Milestone changed from 5.1.0 to 5.1.1
comment:29 Changed 9 years ago by sbesson
- Owner set to sbesson
- Status changed from new to accepted
comment:30 Changed 9 years ago by sbesson
Tested with today's merge build under the Linux integration server. Ran the import of the MIAS plate linked above with/without symlinks and with/without import steps skipping. Each import was run 5 consecutive times and the mean import time, including transfer and processing, is reported below:
Import type | Skip | Mean import time (+/- SD) |
copy | 1min15s (+/- 7s) | |
copy | all | 33s (+/- 2s) |
symlink | 1min13s (+/- 2s) | |
symlink | all | 31s (+/- 2s) |
So it seems the symlink overhead has been worked around at least for this type of format on 5.1.x. Additionally we now expose a fast import mode for skipping data access in the PIXELDATA import step. Unless we have a reproducible metrics to work against, I suggest we close this ticket.
comment:31 Changed 9 years ago by mtbcarroll
On what kind of filesystem mount was the MIAS plate?
comment:32 Changed 9 years ago by sbesson
This is squig over NFS (as mounted by default on the CI node).
comment:33 Changed 9 years ago by mtbcarroll
And it looks like ln_s instead of normal upload made almost no difference? I'll try re-testing locally.
comment:34 Changed 9 years ago by mtbcarroll
Using 3d44990f on my work laptop with my wired ethernet card in CTIR:
In an import test of squig over CIFS, normal upload took 1m56 then with ln_s took ... well, I've been looking at "METADATA_IMPORTED Step: 1 of 5" for over an hour now and network activity on my laptop remains heavy.
Mount options are, rw,nosuid,nodev,relatime,vers=1.0,cache=strict,username=mtbcarroll,domain=LIFESCI-AD,uid=1000,forceuid,gid=1000,forcegid,addr=134.36.66.3,file_mode=0664,dir_mode=0775,nounix,serverino,rsize=61440,wsize=65536,actimeo=1
Trying local NTFS over FUSE again, normal upload took 52s then with ln_s took 58s. So, that's great!
comment:35 Changed 9 years ago by jamoore
Mark: can you take stack traces at the hang please?
comment:36 Changed 9 years ago by sbesson
- Milestone changed from 5.1.1 to 5.1.2
- Priority changed from blocker to critical
Retested after mounting squig locally over CIFS and running the same import script as described aboce - see 12217_logs_2.zip. First symlink import is much slower than the data copy (7min30 for ln_s vs 2min for copy). But then every following symlink import on the same data is significantly faster (~30-40s server-side). This means most likely we are using the cold cache associated with the CIFS mount.
Unmounting and remounting the CIFS mount is sufficient to reproduce the same trend: first symlink import lasts ~7min30 vs ~50s for all subsequent symlink imports of the same dataset - see 12217_logs_3.zip.
Addressing this issue likely involves some profiling via perf4j to find out why the very first file access in the PixelData? step via symlink is much slower than copying the data locally. Pushing this ticket to 5.1.2.
Changed 9 years ago by sbesson
Log files for the MIAS dataset import with/without symlinks with CIFS mount
Changed 9 years ago by sbesson
Log files for the MIAS dataset import with/without symlinks with CIFS mount post unmounting/remounting
comment:37 Changed 9 years ago by jamoore
- Owner changed from sbesson to mtbcarroll
Mark: does the above sound plausible? and if so, what do we do with this ticket? It doesn't seem to be directly actionable (we have other general performance cards)
comment:38 Changed 9 years ago by mtbcarroll
The timings I quoted tended to be at least for a second import of the same data in the same manner; I tried to control for cold cache issues so wasn't sensitive to them. I'm really not sure; it might be worth asking Roger.
This ticket was useful for capturing a specific reproducible egregious performance issue but, given that I commented,
Trying local NTFS over FUSE again, normal upload took 52s then with ln_s took 58s. So, that's great!
then I suppose it has now served its purpose and from my side I'm happy if something analogous to this 3d44990f I mentioned was actually merged. Does anyone know what it is? Should I retest?
comment:39 Changed 9 years ago by jamoore
I sadly don't know what commit you mean, either, Mark. If you'd like to retest esp. with a warm cache, by all means, just as this stands it's unclear what the next step is.
comment:40 Changed 9 years ago by mtbcarroll
Testing the original issue I reported, without ln_s it takes 53s, with ln_s it takes 59s, so that's close to what the mystery commit achieved. I guess it is thus somehow in, and I am happy: from my point of view, this ticket is resolved.
comment:41 Changed 9 years ago by jamoore
- Resolution set to fixed
- Status changed from accepted to closed
Assuming you meant to click "fixed".
We should at least know if this is something critical for 5.0.2. If not, it can be pushed.