Here’s the full story of how TheExperiment and other Pilot hosted sites went down. What follows is a play-by-play account of how Qwest deals with it’s home office business customers.
What follows is a play-by-play account of how Qwest deals with it’s home office business customers. In summary I forgot to pay my bill and they turned my service off. However, that day I called in a payment and they said “service should be restored within 2 to 24 hours.”
February 5, 2001: Late at night, while doing the bills, I noticed that my Qwest bill was late and that I had already been sent a temporary disconnect notice. I set the bill aside and made a note to take care of it first thing in the morning before going to work. Little did I know that this would be the beginning of one of the biggest computer technology fiascos of my life.
February 6, 2001: On my way to work that morning, I noticed that my net connection looked like it was down. I immediately called Qwest to find out what had happened. They told me that due to non-payment of the bill that they had disconnected my phone line. I apologized for the late payment and paid the full amount due over the phone, which at that time was nearly $600. They assured me that the phone service would be restored promptly, which I was told was within 2 to 24 hours from the time the bill was paid.
February 6, 2001: After about 12 hours, I got concerned that the phone service was not yet restored, so I called in to check on the status of my service. I was told that service usually takes 2 to 24 hours to be restored and that I should call back the following morning.
February 7, 2001: It was now over 27 hours since the original disconnect, so I called to check on the status of my account. This time I was told that, due to the fact that I had DSL service in addition to residential telephone service, it would take 3 business days to restore the service. It being a Wednesday, the earliest that service could be restored would be the following Monday. I explained that my DSL service was very important to me and that I would do anything they needed in order to restore service as fast as possible. The operator placed me on hold for several minutes while consulting with “higher powers.” After they returned, they told me that the earliest that service could be restored on a “non-pay” was 3 business days. I accepted my admonishment and resigned myself to wait until Monday for my service to be restored. I felt that since this would still be within 5 days since the original service interruption, I would get my servers back online just in time to avoid everyone’s email bouncing. In the mean time, I took advantage of the down time to backup files between computers and make preparations for moving some of the core services from my older server to the newer faster machine.
February 8, 2001: After thinking about the whole situation over the following day I decided to call back and see if talking to a different operator might result in a different experience. So I called and explained my entire situation, including the late bill payment, etc., to the operator and they graciously offered to see what they could to help expedite the restoration of service. I was told that my telephone service could be restored by the end of the day and my DSL service could be restored as early as Friday morning! I was ecstatic at the news and conveyed my gratitude to the Qwest operator. I was shocked that simply by calling back and getting to a different operator I could receive such a different quality of service.
February 8, 2001: When I returned from work on Thursday at about 6PM, sure enough, my phone service was restored! However, I still had no WAN connectivity with my DSL modem, indicating that my DSL service had not yet been restored. Since the operator had said they would begin the restoration of the DSL on Friday morning, I waited with anticipation until the following day.
February 9, 2001: I received a call at about 8:45AM on Friday morning from the same operator that had helped me restore my telephone service the previous day. They explained that there was a disconnect order that had been “placed in the system” on the same day that the telephone service was temporarily turned off. Even though I had paid the bill that very same day. It came out in conversation, that the operator who accepted my credit card payment was supposed to have canceled any pending disconnect orders at that time. However, since apparently the various “systems” of Qwest are so poorly interconnected, the group that would have canceled the disconnect order, doesn’t get their records updated to show that the bill was paid until the following day. Qwest tells us to “Ride the light” in their ads, what the hell are they riding? Camels? The upshot of this is that my DSL line has been completely “torn down” and even had the wire pairs reassigned to other circuits, not to mention my static IP addresses. The only way to restore DSL service at this point, I am told, is to become a brand new Qwest customer again and wait the customary 14 days for service to be installed. Graciously they offer to waive the $70 installation fee, since there won’t really be much to install. After much wrangling, haranguing and hand wringing, I find that there is nothing that I can do to make this giant bureaucracy move any faster. They even explain to me that if they give me preferential treatment over other customers that they would be fined by the FCC. Aw, too bad.
February 10, 2001: After explaining my horrific situation to my neighbor Shane, he offered up a brilliant solution. Since he is already a Qwest DSL subscriber, he suggests that we simply bridge our two networks and that I run behind his proxy! It was pure genius on his part and since I was going to be moving into his apartment, after he moved out, it would make the transition even easier when that day arrived. So I ran into work to borrow one of our spare spools of CAT-5 cable. I returned and we were able to string a long cable between our two houses without any difficulty at all. I connected my end of the cable to the uplink port on my hub, where my DSL modem had been connected. At Shane’s house we connected to one of the free ports on his main hub and voila! Now things weren’t totally peachy just yet. I was online, but I was behind a proxy and didn’t have a “real” IP address. So I agreed to take over the billing on Shane’s telephone service and pay for the upgrade to “OfficeWorks” so we could get static IP addresses. I remember having to do this myself over a year ago when I first established DSL service with Qwest (then USWest). As I recall it only took 24 hours for the change in service to occur when USWest was running things. So naturally I assumed that after a brief delay of 24 hours, we would be able to have static IP addresses and I could start updating the DNS records on my server (and at Network Solutions) accordingly. Shane and I tried to contact Qwest that night to upgrade the service to “OfficeWorks” but the business office was closed on weekends and we would have to call on Monday morning to do the upgrade.
February 11, 2001: On Sunday morning Shane came over to my house with the most peculiar expression on his face. I mix of fear/disgust/anger/amusement/sadness/incomprehension… I asked him what was wrong and before he could even open his mouth, it dawned on me. Qwest had disconnected his DSL service too! We were both in a state of utter shock. It was so infuriating that our anger could only be transmogrified into laughter. We just could not believe the comedy of errors and incompetence exhibited by Qwest in the handling of our affairs. Shane assured me that he would call Qwest first thing Monday morning and do whatever was necessary to restore his DSL service again.
February 12, 2001: Shane called me at work in the afternoon to inform me that Qwest assured him that DSL service would be restored by the end of the day. Sure enough at the end of the day, I return from work and we’re all online again! Now the next order of business was ordering the static IP addresses. We call Qwest that night and order the “OfficeWorks” package and are told that the upgrade would take between 2 hours and 5 business days to become effective. Incredulous as it may seem, Qwest doesn’t seem to know their own internal processes well enough to predict the delivery of a service upgrade with any degree of accuracy at all.
February 13, 2001: Anxious about the wide margin of error in the estimate of when the OfficeWorks upgrade will kick-in, I call Qwest to check up on the status of our order. To my shock and alarm, I am told that there is no record of the order “in the system.” I wonder to myself, “What the fuck is wrong with this company? How can they possibly be successful yet operate like this?” I am told that I am not authorized to make changes to this account, which I assure the operator is not my intention. I reiterate my statement that “I am checking on the status of an order that was placed yesterday. I am not changing anything.” After being placed on hold for several minutes, the operator returns and confirms that there is indeed no order in “the system” for the “OfficeWorks” upgrade. The operator then proceeds to make the required entry in “the system” for the “OfficeWorks” upgrade to take place. (Remember, I was just told that I am not authorized to make changes in service for this account.) I am then told that the upgrade will take 7 business days. This means that I won’t even be able to lease static IP addresses until the 21st of February. At this point my mind is completely exhausted from exploring the myriad ways in which things could not have gone worse, so I can’t come up with any clever to say.
February 14, 2001: After considering this whole fiasco I started searching around for a way to get my Internet connectivity though a channel other than the worst phone company in the world (as I felt Qwest was at this point). I called a couple of local ISPs that provide “collocation” services. That way, I could put a couple of my server boxes on a shelf in a phone room somewhere and they would be online 24/7. The starting price for leasing a shelf at most ISPs is $200 month with a one-year contract and that doesn’t include bandwidth. One usually would pay on the order of $700 a month for the 1 mega bit per second service that I was getting from my DSL for $200 month. I decided that this was more money than I was prepared to pay to keep my systems online, for what is a largely philanthropic enterprise. After thinking about other possibilities, I managed to convince a friend of mine at work (the senior network administrator) to allow me to temporarily house one of my server machines (my old slow machine) within their network until this whole Qwest fiasco blew over. We configured the DNS records on my box and the DNS server at work. We assigned the IP addresses, configured the firewall, tested, fixed, tested et voila! I have a single static IP address that can be used to service SMTP and DNS requests. This is all I need to reconfigure my systems and update the records at Network Solutions to point to my new DNS server. Meanwhile I call Qwest again to hassle them about expediting the “OfficeWorks” upgrade again and I’m told, “nothing else can be done.”
February 15, 2001: I log onto www.nsi.com and update my Domain Service Agreement forms to include the new IP addresses of my new DNS box and the DNS server at my work. Moments later I receive these forms via email (I put my work email address into the form at the Network Solutions web site) and print them out. I write a brief note authorizing Network Solutions to make the changes noted on the forms. I photocopy my driver license and have the whole pile of paper notarized prior to faxing it to Network Solutions.
February 16, 2001: We had the heaviest snow in 5 years in Seattle on Thursday night. It dropped between 5 and 12 inches of snow all over the Puget Sound including over 5 inches on Capitol Hill where I live. Everything was snowed in everywhere and I couldn’t reach anyone at the office, so I stayed home and contemplated my situation further. I got a call from Jason who told me that a friend of his also had to FAX some records to Network Solutions and it took them nearly 7 days to make the changes. He doesn’t want to get me down; he just wants to keep me informed.
February 17, 2001: Logged onto Network Solutions web site to check my domain records. Still no change. Logged onto Qwest account services web site to check on status of “OfficeWorks” upgrade, still not in effect. Got drunk later that night with friends.
February 18, 2001: Started this document while I meditated on the magnitude of the situation.
February 21, 2001: I anxiously awaited some glimmer of life in the little light on the modem that indicates a good net connection. However, my patience would be have to be tested since I had to go to sleep early to get to a morning meeting at work the next day.
February 22, 2001: Needless to say, service was not restored in the following morning. So after my meeting I called Qwest to find out what was going on. At that time I was told that service would be restored between 10PM and midnight on Friday the 23rd. At this point I didn’t really believe anything that Qwest said, so I left work early to go see what the DSL modem was doing. I reconfigured all of the various wires and cables to their former configuration and powered up the modem. To my surprise the WAN light (the “good” light) was on and it even appeared that my old static IP block was still enabled! I was ecstatic! I hopped in my car and drove back to work to pickup my box that I had brought there in my vain attempts at routing around the service outage. Earlier that day, I had rebooted the box just to make sure everything was working properly and it was in good health at that time. So I carefully downed the OS by doing a nice clean shutdown. Then I approached the (open) case of the computer and neutralized any static potential by touching the power supply. I then powered off the machine and waited for the hard drives to spool down. I then removed all of the cables from the connectors on the back and carried the machine to my car. I didn’t not jostle or jar the machine on my way to the elevator, nor when loading it into the front seat of my car and proceeded home.
The drive home was uneventful and I parked in my driveway and went around to the passenger side to get the computer. I picked up the machine and carried it into the house without incident. I then placed the computer in its original location (on the floor) and reattached all of its cables to their familiar receptacles. I then powered up the machine and when to take a leak while it booted up. When I returned, I noticed that the machine was hung at a point in the boot process where it would just be loading the boot record from the primary hard drive. I was puzzled, and thought that perhaps something funny had happened to the CMOS memory, the facility by which the computer records information about it’s own configuration and any peripherals such as hard drives. I checked the CMOS and sure enough the very first hard drive entry in the list was set to “None”. So I reset the CMOS to it’s default state and had the BIOS redetect the hard drives. All four drives then showed up as they should have and I restarted the machine. This time when it booted up the message on the screen was something to the effect of “INVALID BOOT DISK OR DISK ERROR. PRESS RETURN TO BOOT AGAIN.” I couldn’t believe my eyes. After many, many hours of trying all possible combinations of parameters, at last I had to admit to myself that somewhere during the mile-and-a-half ride from work to home, the primary hard drive had somehow “lost” it’s partition information. All of the other drives in the system were untouched and working perfectly. Only the one drive out of four that was the most critical component was the one that had this inexplicable event happen to it.
Fortunately, when this fiasco was just getting started, I took advantage of the downtime to mirror most of the services from my older server to my new faster server. As it turns out I had even considered moving /all/ of the services over to the new server so I had copied all of the user records, the web sites and the local database onto the new machine. I moved all of the users, but sadly, not all of their home directories survived; only three home directories made it out of nearly 40 total.
Since I believed that I had all of the critical information that was lost on the old machine, safely backed up on the new machine, I proceeded to install FreeBSD on the now “blank” boot drive, using boot floppies and the FTP installation media. This takes quite a while, but is thankfully a hands-off process, so I got everything started and went to bed.
February 23, 2001: I wake up and lo and behold my old machine is now running smoothly with the latest build of FreeBSD (4.1.1 at the time) and it seems to be just as happy as a clam. I made sure that SSH was setup and left for work. After getting to work, I spent most of the day SSH’d into my “old” machine gleefully copying files from my “new” machine (running FreeBSD 4.1) in preparation for getting all of the old services configured and running. Somehow, I managed to copy one (or more) too many files from the “new” machine to the “old” machine and the “old” machine stopped working properly. I even lost my SSH shell and couldn’t reconnect from work. I ended up leaving work early that afternoon so I could continue to fight with my machine and somehow get it working again.
Upon arriving home, I find that I’ve hosed the box so badly, that the easiest way of getting it working again is to reinstall FreeBSD all over again. I end up doing this, but I choose the most reducing installation suite, to make it run faster. After several hours, it finished and I got the machine going once again. This time, I was much more careful about what I copied from the “new” machine, but I still ran into trouble occasionally. Suffice it to say, that I only got some of the web sites and most of the DNS restored before I finally had to call it a night.
February 24, 2001: I wake up and spend the rest of the day fiddling with SENDMAIL, BIND, HTTP, SSHD, RSYNC getting everything configured and synched up between the three machines that comprise my little service pool. It appears at this time (21:14) that I have restored all services to their original configuration or as closely as was possible given the circumstances. If you find that any service that was working, is now broken, please let me know and I’ll see if there’s anything left in my big can of bug spray. 😉
Sincerely, your bedraggled sysadmin,
Author: Randy Antler
News Service: TheExperiment Network