User Tools

Site Tools


access:access_server_2013_robustification_project

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
access:access_server_2013_robustification_project [2013/04/03 18:36]
mjallison [Possible Solutions]
access:access_server_2013_robustification_project [2013/04/05 19:02] (current)
mjallison [Possible Solutions]
Line 5: Line 5:
 ===== Problems ===== ===== Problems =====
 There are many potential problems with the current Access server architecture. In a rough order of severity they are: There are many potential problems with the current Access server architecture. In a rough order of severity they are:
-  * Complete region failure (rare, but something close to this happened in 2010 or 2011)+  * Complete region failure (rare, but something close to this happened in 2010 or 2011). Fortunately the S3 buckets are not specific to a region. ​
   * RDS or EC-2 instance failure (has happened in 2011 and 2012, about once per year)   * RDS or EC-2 instance failure (has happened in 2011 and 2012, about once per year)
   * AWS fabric failure (at least once per year), e.g. S3, network, virtual host failure   * AWS fabric failure (at least once per year), e.g. S3, network, virtual host failure
   * Storage and retrieval failures, mostly experienced by the Agtek Access Java Client   * Storage and retrieval failures, mostly experienced by the Agtek Access Java Client
 +  * Lack of Track redundancy due to them being stored in instance specific storage. ​
   * Potential black hat attacks (mainly on the AccessWeb application)   * Potential black hat attacks (mainly on the AccessWeb application)
 +  * Through put of operations, appears to be DB related.
   * Client failures (losing keys)   * Client failures (losing keys)
  
 ===== Possible Solutions ===== ===== Possible Solutions =====
 +
 +AWS Tools
 +   * RDS snapshots, currently being done and retained for last 3 days. (50Gib) Can restore from a snapshot. ​
 +   * EC2 snaps shots done once per day, keeping last two days (100 GiB) Can restore from snapshot.
 +   * EBS Snapshots can be moved between regions.
 +   * AWS console operations (new instance, snapshot, etc) can be automated.
  
 Virtual Machine failure recovery strategies Virtual Machine failure recovery strategies
Line 35: Line 43:
       * Real time connection monitoring       * Real time connection monitoring
       * Operation duration       * Operation duration
 +
 +Storage problems:
 +   * Most storage issues appear to be related to the Access Java Client, fix it. 
 +   * Track storage can be moved to S3, increasing the safety of track storage.
 +
 +===== Recommended Solution =====
 +   * TBD: Identify robustness goals, recovery speed, etc. to guide solution selection.
 +   * Attend April 30 AWS conference to get briefed on more AWS tech.
 +   * Document modern (2013) system architecture
 +   * Document failover process (for manual recovery), recovery procedures for failure modes. ​
 +   * Modify AccessSupport tool to automate instance creation (from existing AMI-create snap of existing AMI, reattach EBS), recover EBS from snaps, repopulate DB from backup snap, 
 +   * Modify AccessSupport tool to copy snaps to another region to prep for region failover process. ​
 +   * Consider auto copy snaps to another region for backup. ​
 +===== Possible Track items to consider at the same time =====
 +Likely only make these modifications when we rework a track product
 +  * Move track storage to S3, integrate with Access Files. ​
 +  * Integrate track api with regular API?
 +  * Drop support for firmware loads on devices (old grey boxes).
 +  * Drop support for TSMAdmin client
 +  * Drop support for TSMAdmin server
 +     * SQL customer tables; assetid, association,​ gps, rtk, rtktrack, track, vehicle
 +     * SQL customer tables; tsm, tmm, device
 +   * Remove SupportTool tabs for Devices, Trackwork Modules, Trackwork Servers (associated tables if not already present).
 ===== Server Architecture Improvements ===== ===== Server Architecture Improvements =====
 The following areas are routine maintenance items and/or feature requests that need to be done. The timing is right to do these at the same time as the other efforts. The following areas are routine maintenance items and/or feature requests that need to be done. The timing is right to do these at the same time as the other efforts.
Line 41: Line 72:
   * Routine update of AMI Linux server upgrades (security)   * Routine update of AMI Linux server upgrades (security)
   * Possible update of entire Linux AMI (2013-03 variant released).   * Possible update of entire Linux AMI (2013-03 variant released).
 +  * Performance improvements:​ add index to problematic tables (licence, licenseuser,​ licenselog). 
 +  * Add licenselog pruning.  
 +  * File/Folder level permissions. 
 +  * Complete remove of the Box stuff. ​
access/access_server_2013_robustification_project.1365014180.txt.gz · Last modified: 2013/04/03 18:36 by mjallison