19/08/2012

--== RE-UP ALL FILE ==--

Goo.im is in transfer and i need to update all my file...






The Good:
  • New distribution servers: More redundancy and download bandwidth.
  • New web server: Better uptime on the site and services.
  • New file server: MOAR SPACE for hosting files.
  • New switch: A new switch and fiber modules will let us upgrade from our current uplink to a fiber link which gives us the ability to upgrade all the way up to 10gbps before we have to get more hardware.
The Bad:
  • The switch module that lets the fiber SFPs hook in has yet to arrive due to shipping errors beyond our control. This means that until it arrives, and Snipa can install it, we're still on our current uplink. Current schedule is for the module, SFPs, and fiber line to all be hooked up in about two weeks.
  • Because of how the system is now setup with the additional servers, we can no longer have separate account names and folder names for developers. We are looking into a workaround, but in the meantime developers will have to have their username as their folder in /devs/.
The Ugly:
We had an issue with our Lustre cluster. Ours is laid out with a metadata server, and two object storage targets/servers (OSTs, or basically, file servers). You can think of the metadata server (MDS) as like the MBR or index of a hard drive; the MDS lists where every file is on the OSTs and contains metadata and if it gets wiped out, recovery of data is nigh-impossible. As it turns out, when the new OST was brought online, it caused a race in the MDS, where it thought the new OST was the old, and that all data was missing... And started wiping out the data. By the time everything was over, the cluster was up and running normally, but all the metadata was gone. Because of that, all of the files were inaccessible.

To give you an idea, imagine trying to find a specific quote from a certain page in a particular book inside all of the libraries of the world, but with no librarians, no index cards, no computer index, nothing. Just you and endless rows of books. That's what it would be like to try and pull out a ZIP from almost a dozen terabytes of storage space. On top of that, we did not have the means nor budget to have a backup system for all of the data that was hosted. Don't worry, we have already budgeted for a backup system, and will be implementing it likely in the next round of upgrades or sooner to prevent just such an issue in the future. Unlike some folks, we learn from our mistakes.

We were able to restore a backup of the site itself, as well as all developer and sponsor accounts, as well as the GApps. We're working with developers to help them get their data re-uploaded from mirrors, and all developers can now resume uploading and restoring their data. However, some folders such as /stock we do not have any backups for. We will try and find as many files as we can and restore them from around the web, but it will take a while and it's likely that we will not be able to restore all of them.

The buildbox at the very least was untouched, so no data is missing from it, and developers can continue to use it normally. On top of that, this has given us a chance for a clean start, wiping out old, unused files and letting us completely reorganize with all developers located under the /devs folder and all of them having full compatibility with GooManager from now on so long as they make their ROM compatible.

We apologize for the disruption this data loss has caused. We had picked lustre specifically because it was designed to be very reliable and provide high throughput to all of our distribution servers. Despite this setback, Goo.im will continue to operate, and we will work to restore what we can.


Aucun commentaire:

Enregistrer un commentaire