Differences

This shows you the differences between two versions of the page.

--- access:access_server_2013_robustification_project [2013/04/03 17:35]
mjallison [Possible Solutions]
+++ access:access_server_2013_robustification_project [2013/04/05 19:02] (current)
mjallison [Possible Solutions]
@@ Line 5: / Line 5: @@
 ===== Problems =====
 There are many potential problems with the current Access server architecture. In a rough order of severity they are:
+  * Complete region failure (rare, but something close to this happened in 2010 or 2011). Fortunately the S3 buckets are not specific to a region.
   * RDS or EC-2 instance failure (has happened in 2011 and 2012, about once per year)
   * AWS fabric failure (at least once per year), e.g. S3, network, virtual host failure
   * Storage and retrieval failures, mostly experienced by the Agtek Access Java Client
+  * Lack of Track redundancy due to them being stored in instance specific storage.
+  * Potential black hat attacks (mainly on the AccessWeb application)
+  * Through put of operations, appears to be DB related.
   * Client failures (losing keys)
 ===== Possible Solutions =====
+AWS Tools
+   * RDS snapshots, currently being done and retained for last 3 days. (50Gib) Can restore from a snapshot.
+   * EC2 snaps shots done once per day, keeping last two days (100 GiB) Can restore from snapshot.
+   * EBS Snapshots can be moved between regions.
+   * AWS console operations (new instance, snapshot, etc) can be automated.
 Virtual Machine failure recovery strategies
@@ Line 17: / Line 28: @@
       * built into the AccessSupport tool (NOT keyforge)
       * Scripting is easy, but not easy to transfer the skills to another person
       * Building into the support tool makes them easy to use, but not as easy to adapt for future changes.
+Recreate RDS, EC2 constellation in new region
+   * Quick instance recreation (as in last section): allow region specifier
+   * Migrate EBS / RDS instances : cross region migration/snapshot.
+Security issues:
+   * Implement https for web application
+   * Add a security analyzer to look for anomalies and send alerts
+   * Include failure (404, 501, bad login) attempts in auto security analysis
+Monitoring:
+   * Increase real time monitor goals to include:
+      * Real time connection monitoring
+      * Operation duration
+Storage problems:
+   * Most storage issues appear to be related to the Access Java Client, fix it.
+   * Track storage can be moved to S3, increasing the safety of track storage.
+===== Recommended Solution =====
+   * TBD: Identify robustness goals, recovery speed, etc. to guide solution selection.
+   * Attend April 30 AWS conference to get briefed on more AWS tech.
+   * Document modern (2013) system architecture
+   * Document failover process (for manual recovery), recovery procedures for failure modes.
+   * Modify AccessSupport tool to automate instance creation (from existing AMI-create snap of existing AMI, reattach EBS), recover EBS from snaps, repopulate DB from backup snap,
+   * Modify AccessSupport tool to copy snaps to another region to prep for region failover process.
+   * Consider auto copy snaps to another region for backup.
+===== Possible Track items to consider at the same time =====
+Likely only make these modifications when we rework a track product
+  * Move track storage to S3, integrate with Access Files.
+  * Integrate track api with regular API?
+  * Drop support for firmware loads on devices (old grey boxes).
+  * Drop support for TSMAdmin client
+  * Drop support for TSMAdmin server
+     * SQL customer tables; assetid, association, gps, rtk, rtktrack, track, vehicle
+     * SQL customer tables; tsm, tmm, device
+   * Remove SupportTool tabs for Devices, Trackwork Modules, Trackwork Servers (associated tables if not already present).
 ===== Server Architecture Improvements =====
-The following areas are routine maintenance items and/or feature requests that need to be done.
+The following areas are routine maintenance items and/or feature requests that need to be done. The timing is right to do these at the same time as the other efforts.
   * Upgrade the server infrastructure to the latest Java 7
   * Add wildcard search to admin api for users
   * Routine update of AMI Linux server upgrades (security)
   * Possible update of entire Linux AMI (2013-03 variant released).
+  * Performance improvements: add index to problematic tables (licence, licenseuser, licenselog).
+  * Add licenselog pruning.
+  * File/Folder level permissions.
+  * Complete remove of the Box stuff.

AGTEK R & D Wiki

User Tools

Site Tools

Differences

Page Tools