User Story #860 (new)

Opened 3 years ago

Last modified 5 months ago

Add ServerErrorEvent subsystem for notification of internal errors

Reported by: jmoore Owned by: jmoore
Priority: critical Milestone: Unscheduled
Component: Services Keywords: errors, exceptions, logging, asynchrnous
Cc: atarkowska, cxallan, jburel, jrswedlow Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: 4.0d Estimated Remaining Time: n.a.

Description (last modified by jmoore) (diff)

With more asynchronous logic in the server -- full text search processing, job processing, etc. -- it's difficult for server adminstrators to find problems when they only show up in the rather bloated logs.

All asynchronous processing subsystems should start raising a ServerErrorEvent in addition to logging an exception. The event can be handled by multiple listeners. E.g.:

  • A simple LoggingServerErrorEventListener can write a special log file
  • A EmailingServerErrorEventListener can send an email to a specified admin (emails are disabled if the configuration property is set to "", e.g. omero.servererror.email=
  • A WebAdminServerErrorEventListener could pass the information on to the WebAdmin? console which administrators could check periodically.

Events which are of importance include:

  • CorruptedFileServerError - When the sha1 of a Pixels or an OriginalFile do not match the value in the DB
  • LuceneLockedServerError - some forms of exceptions can leave Lucene in a locked state, making search mostly unusable.
  • NoJobProcessorServerError - if all jobs are failling/not being accepted, then JobHandler is essentially useless. The problem may be that all compute nodes are down.

Perhaps an "error level" can determine, for example, whether or not an email will be sent.

See:

  • #1840 - notification needs to find new jobs and start processing

References

Change History

Changed 3 years ago by jmoore

  • priority changed from minor to critical
  • owner changed from josh to jmoore
  • milestone changed from Future to 3.0-Beta4

Changed 20 months ago by jmoore

  • milestone changed from OMERO-Beta4 to OMERO-Beta4.1

Too much work for 4.0. Pushing.

Changed 7 months ago by jmoore

  • cc jburel, jrswedlow added
  • milestone changed from Unscheduled to OMERO-Beta4.2

Has not been clearly discussed with the team, but is on the 4.2 roadmap, so moving.

Changed 6 months ago by jmoore

r6190 disables JobNotification (and an annoying exception in the log). This will need to get replaced by some other notification system.

Changed 6 months ago by jmoore

  • description modified (diff)

Changed 5 months ago by jmoore

  • milestone changed from OMERO-Beta4.2 to Unscheduled
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.2.1-PRO © 2008-2009 agile42 all rights reserved (this page was served in: 0.295482 sec.)