MARC News

MARC News This, er, blog brought to you by the letters v and i.

News:

2009-07-13
We restored our feeds for the vger lists, but we have holes. If you have raw copies (UNIX mailspools preferred) of any of these lists from vger.kernel.org from June 15th - July 4th (inclusive), please contact us. The lists we currently need are (I will update this list as we fill in the gaps): dccp git-commits-24 kvm kvm-commits kvm-ia64 kvm-ppc linux-admin linux-alpha linux-btrace linux-crypto linux-doc linux-hams linux-kbuild linux-m68k linux-man linux-msdos linux-newbie linux-nfs linux-omap linux-parisc linux-ppp linux-raid linux-rt-users linux-s390 linux-sctp linux-security-module linux-serial linux-sh linux-sound linux-sparse netfilter netfilter-devel reiserfs-devel sparc ultrasparc util-linux-ng

And thanks to various folks who have helped us fill gaps so far: Andrew Morton, Peter Huewe, David Bettman, Malcolm Scott, Robin Holt

2009-06-25
We have had some hiccups this month, related to moving one of our data center locations. Two pieces of bad news: one, our subscriptions to a number of mailing lists appear to have been dropped on June 15th (probably auto-unsubscribed after a number of bounces); it looks like the busiest VGER lists are affected (linux-kernel, some git lists, etc). I'll have to re-subscribe us to those lists and see if I can track down the missing mails. Two, the server that got moved has apparently developed some hardware issues; it's now randomly locking up. MARC will be in read-only mode until we can get to the bottom of that.

2008-05-18
MARC has another sponsor, Chakpak.com. They may in the future help us out with getting an Asian mirror site, but in the meantime their support will enable the next round of hardware updates we need. Please go check them out!

2008-04-01 04:30 EST
Big power spike at our VA facility crashed the UPS's. Hardware is OK, but the site will be read-only until I can clean up any table corruption, etc in the morning.

2007-06-22
Minor policy change for donations: now any donor of $100 or more can get a link in the donor list (if they want), not just their name.

I'm going to attempt to contact existing qualified donors, but if you notice this before you hear from me, please feel free to ping me w/the desired update. Thanks!

2007-04-28
We've been back read-write, and caught up on incoming mails, since the 19th or so. But full-text-indexes are still behind; load is too high during the day for it to make much progress catching up. I'll be adding another server soon, that should help.

In other news, we've got a new sponsor, Terra-International, a Japanese company. I have wanted mirror sites in Asia and Europe for a while; the support from Terra-International may allow us to buy and colo a server in Japan. ...I know nothing about the Japanese hosting/colo market, so if anyone has suggestions based on dealings there, or better, works at a facility that might be able to give MARC a deal, please contact us.

2007-04-16 19:00 EST
We're back up read-only now; will be starting to catch up the backlogged mail later tonight, around midnight EST.

2007-04-16 15:30 EST
Extended power failure this morning, still working on bringing stuff back up. Someone please divert the hurricane so it does not hit us.

2007-04-15 13:30 EST
We've had incoming mail coming through a different server since late last night. I think we are mostly caught up at this point. The backup MXs should have spooled things silently, so that list servers wouldn't complain or drop our subscriptions, but only time will tell. We should be fully caught up by the end of today _except_ that the full-text-search indexes will not be updating until the second server (currently on a FedEx plane) is operational and we can offload some of the work to it. Expect search indexes to be caught up in the Wednesday/Thursday time frame.

2007-04-13 19:00 EST
Several hard lockups later, we're going to just pull the plug on this server. I'm prepping a new box to be the mail receiver; will bake it some tonight, and then ship it tomorrow. That means the site will remain read-only for the most part until the new hardware arrives at the primary location and the folks there have time to set it up, Monday or Tuesday.

2007-04-12 21:00 EST
One of the servers has dropped dead suddenly. It handles incoming mail, and is one of a couple of webservers. Since it is after hours, and nobody happens to be close by to mess with it, we are going ot just wait until morning. I've redirected all web traffic to the other boxes. The site will be read-only until we get this fixed.

2007-03-18
The name trasition to marc.info has kicked in. All old links to http://marc.theaimsgroup.com/foo will still work, but will result in a 301 redirect to the new corresponding URL. Wherever possible/convenient, please update your links in documentation, support web pages, etc to point to the new name. But, old links will continue to work indefinitely.

2007-03-17
I believe we are finally current on list contents, and search indexes. Unless there's another tarball of backlogged mqueue entries behind the couch. Please let us know if you see lists that still appear stalled.

2007-03-15 03:00 EST
MARC is read-write again, i.e. new mails are going to the database. We are just about caught up processing the queued backlog, except for the full-text search indexes, which will likely take a day or more to catch up.

We will need some downtime in the not-distant-future for the guys to clean up after the furious part-swapping, but that should be planned and short, two things this was not. Let's hope things stay uneventful for a while...

2007-03-14 01:30 EST
The SATA controller turned out to be flaky; any time there was activity across both channels, the second disk would error and drop out. It only took one disk swap, one cable swap, one kernel upgrade, and one BIOS upgrade to determine that. :( Finally the md-raid resync could finish successfully, and I started a long line of myisamchk's. They were moving along smoothly, until... about five minutes ago, when the server spontaneously rebooted. No warning, no errors. Yay! That's a very bad sign.
So, after the RAID resyncs _again_, I still have to repair a whole bunch of corrupt MySQL tables before I can start taking incoming mail again. And who knows if it'll actually be stable after that...

2007-03-12 21:20 EST
We managed to get enough of the old disks to talk to get the array back up in degraded state on a replacement RAID controller. The system is being migrated over to some new SATA disks, which should actually be a good bit faster than these 6 year old SCSI disks. But the transition won't be finished until tomorrow morning, at which time we'll start catching up the increasing backlog of mail.

2007-03-12 01:00 EST
So, it looks as if the power supply died a violent death, taking with it two of the five disks in the RAID5 array, and the cache RAM chip on the RAID controller for good measure. Unless we can get at least one of the dead disks to talk to us again, that means no recovery of the old server will be possible. Preparing last rites now; working on getting incoming mail set up on an alternate server in case we can't get this one back.

2007-03-11 19:00 EST
The primary server in Florida has suddenly gone dark, with barely any warning. Tom drove in to check it out; so far looks like the RAID controller may have lost the contents of one SCSI bus. Not sure how long it'll take to get back. In the meantime, the site is readonly.

2007-03-08
MARC is changing domain names! The site will soon switch to being marc.info, instead of marc.theaimsgroup.com. Links to the old URLs will still work, for the Internet definition of forever; they'll just get redirected.
Several reasons for this. First, it's shorter, and I am a huge fan of short URLs. Second, The AIMS Group, the company that originally[*] sponsored MARC, ceased to exist a long time ago. The same people have been 10 East for years now, and have continued co-hosting MARC, but the old name, marc.theaimsgroup.com was already embedded in all kinds of project documentation, etc, so I was reluctant to change it. But it is time for MARC to be established as its own entity.
[*] Actually, the first sponsor was Progressive Computer Concepts, which was the company name in the mid-90's before they became AIMS.

2007-03-08
I've been "rescuing" a number of mailing lists whose archives had fallen off, mostly SourceForge-hosted lists. Since our subscribed addresses are pretty easy for spammers to guess, we use some rudimentary header checks on the incoming mails. SF changed their configuration -- some time ago, now -- which caused the incoming mails to not pass the whitelist header check, and be quarantined instead. I fixed about two dozen lists, and resurrected their messages from the spam sandbox.
Unfortunately that meant a couple of times I had to do full scans of the headers table, so had to disable the web interface for a few minutes at a time. Should be settled down now.

2007-02-23
Had an extended power failure at one of our facilities this morning, which drained UPS batteries before anybody could lay hands on things. I'm working on restoring at least full read-only access to the site, with read-write coming back sometime late tonight or tomorrow. For now only lists.kde.org and the old MARC server are up, but searches are disabled. Sigh.

2007-01-30 11:10 EST
Had some corruption in the MySQL tables on the primary server; MARC has been read-only for the last 12 hours or so while I a)fix the corruption (done) and b)get the replicas re-synced (still going). The oldest of the MARC servers really wants to be replaced, it's too old, too small, too slow.

2007-01-16 15:10 EST
Just fixed a minor replication glitch that has caused about the last weeks' worth of messages to not show up in the web interface. It will take some time (a few hours, at least, I would guess) to finish replicating, but then we should be caught up. Thanks to those who reported the lists stalling.

2006-12-11 17:20 EST
Ack, minor DB corruption again, working on it. MARC is read-only in the meantime.

2006-11-22 12:30 EST
I've uncovered some hopefully minor corruption of one of the main MySQL tables, will need to take the system offline long enough to do some myisamchks.

2006-08-30
The ext2-devel mailing list at SourceForge is shutting down, in favor of a new linux-ext4 list at vger.kernel.org. To preserve continuity, I've renamed the old ext2-devel list to linux-ext4. So, readers of the new list will have the context of the old list readily available. There is a URL redirect in place, so any bookmarks to the old list will be silently redirected to the new one.
This is pretty much my standard way of handling list renames, but I don't usually bother posting about it. Which means list-admins don't know it's an available option :-P Fixed.

2006-08-07 22:30 EDT
Another replication glitch, my fault (I let /var fill on the secondary server, choking up replication logs...). Funny how things always seem to break on weekends, almost as if I were off doing other things. Anyway, replication is now restored but it'll take a while (at least overnight, is my bet) to catch up.

2006-04-26
There's been a minor replication glitch, which prevented new messages from showing up in the web interface since about mid-Saturday. That is now fixed, and messages are replicating over and catching up. I expect everything should be caught up by late tonight / early tomorrow morning at the latest.

2006-03-01 13:20 EST
Ouch, the damage was worse than I thought. Turns out that while incoming mail was down, the backup MXs, which are supposed to sit there and silently accumulate backlogged mail... were sitting there silently accumulating backlogged mail, and eating a big chunk of it because they felt like it. Ouch! Out of about 12,000 queued emails, possibly as many as a thousand had their bodies shredded--we have nothing but the headers. That sucks.
That's more data loss than MARC has ever had in 9 years of operation. I'll be compiling a list of Message-ID's that I know or think have been truncated, and would appreciate any help refilling them from any available sources (I'll keep revising the list downward as I fill in the holes from other archives, copies forwarded to me, etc).

2006-03-01 05:00 EST
That was... decidedly unpleasant. There turned out to be no way to flush the damage and sync the replicas short of just myisamchk'ing everything on the master, rsync'ing to the slaves and blowing away the replication logs. Which of course, takes quite a while to do for ~80 GB of data (the set that had to be rsync'ed) over a fractional T1...
Everything is back now,and read-write enabled. Some of the queued mail was moved out of the way earlier; it'll be replayed later this morning. I'm afraid that a small bit of data may have been lost, though.

2006-02-27 14:30 EST
Ack, the site is in read-only mode while I clean up some (hopefully) minor DB corruption.

2006-01-13
Everything went smoothly (for the usual definition of "smoothly" ;), and MARC is back to read-write, and should be caught up on the mails that backlogged in the meantime.

2006-01-12
The site will be read-only for a while today while the primary server is taken down to reroute power. Browsing traffic should be fine, routed to the other server, but incoming mail will be stalled until the primary is back online.

2005-12-13
Can I tell time?

So the final cleanups due to the outage on Friday finished last night, or rather this morning around 4 AM, at which time I promptly went to bed. ...Without having turned searches back on and removing the warning about them being disabled. Um, oops.

2005-12-12
Not so fast... there's been problems with one of the MARC webservers complaining about nonexistant messages. That's because the crash caused mysql's replication log to be truncated, so when things came back, a slave kept asking for an offset that didn't exist, and erroring out. I've taken that server out of the rotation, am rsync'ing data over, and will restore once the rsyncs are done and I'm back from dinner (whichever happens last). In the meantime, traffic is all being redirected to the remaining webserver; you should not notice anything other than perhaps the site being a bit slower than usual.

2005-12-10
OK, things are back, healthy, and caught up, I think. Searches are enabled again.

2005-12-09
A UPS failed this morning, taking the primary search-index database server down with it. They're back, but I've had to disable searching for a while while I finish myisamchk'ing and rsync'ing data.

2005-07-14
I've just checked in an experimental hack to try to do something useful with encoded From: headers (Quoted-Printable encodings, and Base64 encodings), only when we can be reasonably sure the decoded text will look decent in the default US charsets (because MARC is not really locale-aware or alternate-charset aware, it just tries to fake it). I'd appreciate pointers to cases where we seem to do The Wrong Thing, and/or additional charset encodings that would be safe/worthwhile to decode.

2005-04-21 23:25 EST
Searches have been active again since sometime around noon, but as usual I forgot to update this log at the time to reflect that fact :-P

2005-04-21 03:45 EST
Searches are disabled while I replicate the database to a new server I'm bringing online soon, which was donated by KoreLogic (my company; funny how that works...). I had hoped to stay up until the replication finished and re-enable searches before I went to bed, but that's not happening.

2005-04-01
I just gave lists.kde.org a much-needed minor facelift, changing the color scheme to what other kde.org sites have been using for who knows how long :-P

2004-12-30 12:20 EST
There was a hiccup with the replication a while ago; it didn't impact normal operations, but meant that the backup copies were incomplete. I had to shut down incoming feed for a while during the resync; it is back on now, and the system is catching up with the backlogged messages. It will likely be a few hours before things have fully caught up.

2004-10-21 02:30 EST
Things went mostly but not completely smoothly--tinkering a bit to sync things back up.

2004-10-21 00:05 EST
I'm preparing to switch to using the new server for all searches, while regular browsing, etc will still go through the old one. This will let me keep testing it out while it's within reach of me, but not rely on it too much; worst case only searches will be impacted.

2004-10-11
The new server is still a bit flaky, but the old/existing one is right at 100% full. So, I'm moving full-text-searches off onto the new server--if it dies that's all we'll lose. But in the meantime, I've got the search index update scripts suspended, while I sync data between the two. So, search indexes won't be updated for probably another day or two (did I mention, the new server is still at my house, so I have access to it while trying to get it stable... sigh).

2004-09-07
Hurricane damage took out both of the primary Internet circuits to the data center MARC is homed in, power has been out for nearly 24 hours, etc. /me kisses the generator. We are still online via a slower tertiary link, but if real production traffic (read: the stuff that pays bills) needs the bandwidth, MARC may go away again off and on today and tomorrow.

2004-08-18
While beating up the box trying to reproduce an OOPS, a hard drive died. The hard drive that was a replacement for one that died back in April. Sigh.

2004-08-17
Still getting absolutely nowhere trying to get the new server stable. At this point I have replaced: case, power supply, motherboard, CPUs, RAM, hard drive controller, hard drives, NIC, video card, fans, and all cables. In other words, everything. Oh, and the kernel, too. The box is just cursed. Anybody got a dual Opteron with a few gigs of RAM and half a terabyte of storage just hanging around? I'll give you a nice sticker, or big wet sloppy kiss, or something.

In the meantime, the old server is completely running out of space; it's at 99% capacity. I've been forced to take a few lists' full-text-indexes offline to free up some space. These lists (which have not been added to for some time, either because the list moved/got renamed, or because our subscription died some time ago, or because the list changed to setting X-No-Archive on every post) are no longer full-text searchable: apache-bugdb axp-redhat basslist dbi-dev caldera-users kde-user linux-router lynx-dev squeak-dev xfree-newbie. This move buys me another month at most; after that I'll probably have to start disabling more.

Needless to say, there's a moratorium on adding new lists until further notice; they'd just accellerate doomsday.

2004-06-02 04:00 EST
That didn't last long. About 1/4 of the way through rsync'ing, the box froze with no explanation whatsoever. Now it gets endless timeouts on the 3Ware card when the driver initializes, decides no drives are present, and panics trying to mount a root filesystem.
I think I will just ask the guys who host the servers for me (about 800 miles away from me) to put it in a box and ship it to me; I'll either ship it back to them once it's fixed, or break it into a thousand tiny little pieces and pelt the windows of hardware manufacturers' CEO's windows in the middle of the night.

2004-06-02 01:10 EST
OK, after forever and a half, it looks as if the new/primary server is healthy again (cross fingers). Now to rsync 90 GB of data...

2004-04-27 13:05 EST
We're still on the old server, waiting for power supply replacements for the primary. (It's got redundant power supplies... which is why the connecting glue is what cooked, not the swappable PSs themselves, naturally.) In the meantime, we just got (and are still being) slashdotted. But she held together (esp after I realized I should turn off KeepAlives so that people don't hold open httpd children). I feel like Han Solo on the Millenium Falcon. Hm, now where'd I put Carrie Fisher?

2004-04-11 03:15 EST
The primary MARC server went dark about an hour and a half ago, not sure why, yet. In the meantime we're back on the old server, which will handle the load fine at least until Monday. /me crosses fingers...

[Snip some stuff that hasn't been rsync'ed off the new/primary server for a while...]

2004-02-07 18:00 EST
I've lost track, but some time around here the install and rebuild on the new old array was finished, and MARC feed was back up and catching up; I will probably leave things this way for the week, and next weekend blow away the old OS install and migrate OS and the "hotter" tables back to the old new array, which is faster than the old one we just added. Whee ;)

2004-02-07 12:45 EST
Almost there...

Stay on target...

2004-02-07 05:50 EST
Finally working... so I'm migrating 100GB or so of data, and doing sanity checks of all the data while I'm at it. Well, mostly I'm waiting for things to finish, so it's bedtime.

2004-02-06 16:15 EST
One more try. This time with a brand-spanking new SCSI card; the other ones we were trying (handy, spare) were all 5+ years old... Once again MARC will be read-only for a while.

2004-02-05 12:50 EST
OK, we'll try again with the same SCSI card the array liked in its old home... MARC will be read-only for at least the next few hours (mails will queue until the dust settles, and then feed into the archives).

...And again, no joy getting the old array to work.

2004-01-30 17:50 EST
So much for that... the old, retired external RAID array we tried to hook up to the old MARC server (that's running out of space) worked fine when it was taken out of production on a different server a few days ago... but now it pukes, and kills the SCSI bus if connected...<sigh>. Maybe we'll try again after the weekend.

2004-01-30 13:45 EST
The site is going read-only for a while, while we do some upgrades to the old server--which is still primary for incoming feeds, etc. I'll probably be working on it all weekend, but I may checkpoint myself periodically and let backlogged mail catch up. The first phase will probably last the rest of today, though.

2004-01-27
I've just implemented the first phase of some spam-filtering/protection for MARC. This first part was easy: for a number of the lists MARC carries, spammers have gotten MARC's subscribed address, and send spam directly to it. For those lists where I could, I added sanity checks that verify that the incoming messages really came from the list's server. Mails which fail the test are shoved into a 'spam' pseudo-list, from which they can be resurrected easily if this turns out to false-positive on legit mails.

I've been wondering for a while about filtering incoming feed with SpamAssassin or similar; I have several concerns about doing so, and haven't talked myself into it completely. I still might, one day, but this is an easy and safe step in the meantime.

2003-12-31
OK, it is now possible to link to messages within MARC by their Message-ID. Each messages whose Message-ID is known (meaning, everything that comes in from now on) will be accessible by a URL of the form /?i=<message-id>, where message-id can be "raw" (cut-and-pasted from mail headers) or munged in the manner that MARC munges From: and Message-ID headers. Each message now includes a link to itself in that format, as well. When a unique Message-ID is requested, a redirect is issued to the usual /?l=foo&m;=123456 URL for that message. But, when there are multiple messages with the same Message-ID (crossposts, duplicate sends, and deliberate attempts to "hide" valid messages by sending collisions), a list of candidate message-links are printed.

These URLs are ugly to me, not least because they are awfully long. That will get a little better when I find a shorter domain name to move MARC to, the shorter the better (is there a country code '.rc' in which 'ma' is available? ;) But, this should allow the creation of self-referential links: that is, one could compose a message, force its Message-ID: header to be fixed, and embed in it a MARC link back to itself, which is something I know some people have wanted to be able to do for quite a while.

Happy New Year :)

2003-12-30
At long last, Message-ID support is coming. New mails that come in will have their message-ids recorded (we were previously throwing them away--stupid, stupid, stupid). And when messages are viewed, their Message-ID will be printed it we know it (i.e. if it's come in since about midnight EST today). Coming soon will be a way to use Message-ID's to find messages, or rather, a way to build a URL to a message knowing its Message-ID.

2003-12-23
Somewhere around here, I started pointing MARC's CGIs at the new database server, which is a slave replicating off the primary, old server. So far so good.

2003-12-22 04:30 EST
OK, mostly done. Backlogged mails are feeding into the DB now. Unfortunately along the way we discovered a problem with one of the backup MXs that has been causing some mail to bounce rather than queue. Grr. That's an investigation for tomorrow, sleep now...

2003-12-22 01:30 EST
Still syncing...

2003-12-21 17:30 EST
I hate hardware.

Turns out the power supply in the old server can only handle one more disk; more than that, and disks randomly drop offline. The RAID card really likes that, let me tell you. So, after taking care of the cooling issues and some other stuff in about 30 minutes, I've spent the past 4.5 hours adding 36GB of space to the old server. Somehow, I think this was a bad deal.

I'm leaving MARC in read-only mode, and with searches disabled, while a)more myisamchks run to fix any data thrashed by today's fun, and b)data syncs to the new server. Will turn writes back on tonight after I fly home (if the copies are done!), and the backlogged mail should catch up overnight.

2003-12-21 12:30 EST
Minor setbacks... the old server's case won't fit more hot-swap drive sleds without some surgery, which I'm preparing to do. The new server runs outrageously hot, so I'm trying to engineer better cooling. In the meantime, MARC is up and down and up and down. It's Sunday. Step away from the computer, go take a walk, go back to bed, something! Wish I could do that right now :(

2003-12-20
The new hardware is happy, and I'm taking the primary server down for a few minutes to add a couple more disks to it, as well. I won't put the new server into production for several more days of burn-in, but this is the last physical hardware shuffle that has to happen while I'm on-site; everything else can happen remotely.

2003-12-17
I'm flying to Jacksonville today to meet a bunch of hardware I bought and had shipped to 10East, to build a new server for MARC. The new server won't go into production until after at least a few days of burn-in, but in the meantime I may throttle/stop writes periodically to make it easier to toss data sets around, etc. If there is any downtime the rest of the week or this weekend, take advantage of it to make a donation :-P

2003-12-08 07:30 GMT
...Done, feed resumed, and things are catching up quickly.

2003-12-08 02:00 GMT
I'll be switching MARC into read-only mode for a while, doing some database shuffling and bookkeeping. Provided I don't forget to go back in read-write mode before I go to bed, everything will have caught up by morning.

2003-10-29
Adding a bunch of lists, including just about every FreeBSD list that MARC did not already have. This got me thinking that it is nearly hardware-upgrade time, at least time for more storage. MARC's RAID array is puny by current standards; four 36 GB SCSI drives in RAID5 (so about 100GB of usable space, which is about 85% full).
So, I'm thinking of trying 3Wares again. Despite the troubles MARC had with an older (5xxx?) 3Ware card nearly two years ago, I've had two 6xxx cards in systems in my office for about as long, which I beat up on daily, with no problems. I'm probably looking at getting an 8-port 7506, populating it halfway for now with four 180 or 200 GB drives (probably either the 8MB cache WDs, or IBM/Hitachi Deskstars), in 3Ware's rdc-400 hot-swap drive cage. I'd appreciate hearing from people with particularly positive or negative 3Ware stories.
Um, or from people who want to know where they should ship a .5+ TB RAID array, so this can become a non-issue ;)

2003-10-13 05:45 GMT
Looks like everything went well (cross fingers). Downtime was about 15 minutes. I've had the DB in read-only mode since; just changed that, so mails will be catching up for a little bit.

2003-10-13
I will be rebooting the MARC server for a kernel upgrade at 05:00 GMT today. Hopefully things will go smoothly and downtime will be only 10-15 minutes at most, but if things go badly MARC may be down until morning US-Eastern, when someone is physically in the office to hit the Big Red Button.

2003-08-25
Fixed a small longstanding bug in the bad-robot detection code that caused it to misfire when someone does a search that includes the string 'l='. Thanks to Lorrin for the bug report.

2003-08-06 14:00
OK, the backups were done and writes re-enabled sometime around 10 AM. I didn't update this page until now though, because *I* was read-only (read: ZZZZ) until lunchtime.

2003-08-06 04:00
Things will be read-only for a bit while I make some DB dumps...

2003-07-11
Changed how quoted-printable messages are decoded: many moons ago I rolled my own code to decode quoted-printable text. Duh, there is a standard perl MIME::QuotedPrint module which does it better than my hack did, so I've switched to using it. I don't anticipate any problems, and I've found multiple test cases this gets right where the old code didn't, but I'd appreciate any notes about messages which appear broken now.

2003-07-08
Just added some code to take advantage of FOUND_ROWS() in MySQL 4.x to print 'Last' links on result-sets that can be paged through (browsing messages in a given list, messages in a thread or by an author, and search results). This gives an easy way to get to the last results (which in most cases means, the earliest chronologically).

2003-04-30 02:25 EDT
OK, things went happily, back up with a new kernel. Nothing went wrong. What gives? Next someone will have a Windows 95 machine stay up for more than 50 days.

2003-04-30 00:58 EDT
MARC has a birthday! Or rather, the primary server for MARC hits 365 days of uptime. W00p. ...And now that we've hit that, I'm taking it down soon for a kernel upgrade && reboot.

2003-04-22 13:00 EDT
Fiber cut between us and the CO, took out all circuits. Had backup MX in place offsite to spool mail, but we were otherwise completely dead for about 5 hours. Now that we're back the server's going to be beaten all to hell catching up with the backlogged mail, I'll probably bounce the incoming feed a time or two, etc.

2003-04-01 04:00 EST
Somewhere around here things were finished, and the MySQL upgrade to 4.x was complete. It took a little while for backlogged feeds to catch up, etc, but no further complications (crosses fingers...).

2003-04-01 01:25 EST
Flushing lots and lots of table data. Takes. For. Ever.

2003-04-01 00:45 EST
Full-text searches temporarily disabled during the shuffle. Author and thread searches are still available, but will be disabled briefly too.

2003-04-01 00:40 EST
MySQL upgrade time; MARC will be read-only for a while and the web interface will be up and down for a bit.

2003-03-25
MARC is now Begware :-P

2003-03-17
Updated the usage stats which hadn't been run in quite some time. Forgot to add a changelog entry that I'd done so...

2003-03-03 18:30 EST
OK, back, upgrades done, backlogged mail almost finished catching up.

2003-03-03 12:15 EST
Due to the just-released sendmail flaw, and the fact that I have a plane to catch, inbound mail is suspended in MARC for now--i.e., read-only mode. Will get it back as soon as I have time; tonight from the hotel at the latest. Sorry for the inconvenience...

2003-02-22 22:15
Putting MARC in read-only mode while I do some backups. Incoming mail will just queue up until I'm done.

2003-01-23 11:45
OK, back, and mostly better. Remind me not to bulk-add list spools at peak load times while we are being slammed by robots. Sigh.

2003-01-23 11:35
Whee, repeated load spikes with no obvious cause, which I'm making worse by checking on things; disabling the system for a few minutes while I investigate.

2003-01-03
Just, finally, moved incoming mail handling from the old server to the new. That's the last dependency MARC has on the old server, so now it's free to be rebuilt and returned to useful-backup-server status. Wonder when that will happen :-P

2002-12-14 02:30
I've been tinkering with a couple of things; may have gotten to the bottom of a longstanding wierd "feature" with $_ usage in mod_perl code. In the meantime though, I've been causing sporadic errors for the past 20 minutes or so. Sorry about that :-P

2002-12-12 03:00
Google likes us. I mean, they really, really like us, apparently. At right around midnight, we started getting load spikes for no reason. Turns out googebots--which normally give MARC a brief once-over of high-level list-of-list pages, and leave--were showing us major love. A bit too much, in fact, and it was dragging the site down :-P. It looks like we're being crawled by an average of 4-5 Googlebots at a time, so while each one was pretty "respectful" wrt request-rate, together they started to hurt. I added an explicit delay when serving Googlebots, which seems to be throttling their traffic down to a reasonable level. If that proves not to be enough (esp once peak traffic kicks in tomorrow), I'll have to robots.txt-them out. :( Anyway, things were crawling and/or shut down for a little while while I figured out what was going on; sorry for the downtime.

2002-12-07
After numerous requests, most recently from Dan Kriwitsky, I've implemented some email-address munging of poster addresses. I've long had reservations about doing this, but I hope what's now set up is a reasonable balance. I'd welcome any feedback positive or negative regarding this change to our usual contact address, webguy.

2002-12-05
MARC has always(?) supported X-No-Archive headers, leaving out of the database any mails that have it set. I just learned how hard (effectively impossible) it still is to add arbitrary headers to popular mail client like M$ Outlook. In addition to the header, Deja / Google check for X-No-Archive: yes in the first line of messages. That seems a reasonable thing to add (though still a little ugly / painful to use, but better than nothing); done.

2002-12-04 0something
I finished cleaning up missed or broken mails in the middle of the night sometime from my hotel and crappy dialup, forgot to make a note of it at the time :-P

2002-12-03 16:30
Found some ugly inconsistancies that I'm going to have to look at manually, but in the meantime MARC is back up in read-only mode.

2002-12-03 15:30
Ouch. Some mysql tables have been silently corrupted. Not sure when it happened; there shouldn't have been any correlation at all to the disk errors on the other server last night. Repairing the damage now; the browsing interface is disabled currently.

2002-12-03 10:30
The old MARC server, which still serves as the mail-receiver frontend for MARC, had a disk fail and kicked into read-only mode at about 3 AM. Incoming mails have been backing up since then. We've got the box healthy again, and backlogged mails will be catching up shortly, although the MARC server will be bogged down a bit catching up indexing the new messages, etc...

2002-11-24
I've never gotten around to making the user-login features very useful, or setting up any way to apply for an account, for that matter :-P So having the login form has caused some confusion for very little reason. For the time being, I've removed the 'Log in / Log out' link.

2002-10-28, 23:30
Turns out for who knows how long message bodies have been limited to 64k, which truncates long posts (with big diffs / attachments) at a rate of about .1% per month. That's too many. I'm doing some DB shuffling (and beating the crap out of the server) now to remove the limit; after that I'm going to attemp to automate fixing the broken mails, which will be time consuming but won't cause any noticable server load.

2002-10-28, 01:00
Added a note about our general responsiveness (or lack thereof) to mails sent to us, for various reasons.
I'm adding the subversion lists with full histories; 26,000 mails or so. This being off-peak hours for MARC, the extra server load probably won't be felt much.

2002-10-21, 20:50
A firewall upgrade/replacement took longer than expected; instead of ~5 minutes downtime we were up and down for an hour or two. Sorry about that :-P

2002-10-08, 05:00
I'm doing a poor-man's defrag on some of the biggest and most fragmented mysql tables. Should result in substantially lower disk seek times, once finished. But in the meantime I'm beating the box to death. Incoming feed and full-text searches are temporarily suspended--hope I remember to turn them back on before going to bed for the, er, night.

2002-08-25
I'm adding a dozen or so new FreeBSD lists, and inserting all older mails from the list spools available on freebsd.org. Many of the lists are fairly low volume, but some are large; you may be able to feel a slowdown this afternoon while the indexers, etc catch up with the new mail.

2002-05-28, 00:50
For the first time in quite a while we had a MySQL glitch. About a hundred or so messages came in and were in delayed_insert buffers when mysqld segfaulted. After some table checks, etc things are happy again, but there are about a hundred cases where we have a message header and no matching body, so those messages will be shown in a message-list view, but viewing one of those mails will result in a "No such message" error. I've pulled out the metadata for the effected mails, and will extract their bodies from backups and re-insert them tomorrow; I'm tired (and seem to have sprained my wrist, ugh).

2002-04-30, 00:10
Most of Jacksonville, Florida spent the last eight hours without power. Fun with fire at a power station, then a collapsed tower, and then a burning tree falling on a substation (according to initial reports). We were on generator power for the first few hours, but decided to shut down when it was impossible to get an ETA for repairs. We're now back up. I'm going to take the opportunity to do some db checking before flipping writes back on and starting to catch up on the backlog.
For the curious: http://www.jacksonville.com/tu-online/stories/043002/met_9278005.html

2002-04-12
Next on my todo list is a European mirror, since ~25% of MARC traffic comes from Europe, and a more local site would probably be much faster for them. I've got an almost-sure-thing offer for free colocation with a large (and generous) ISP in Sweden (will give names and sing their praises more if/when it all goes through ;). Now I just need a box to ship there. Since this will be a secondary site, I'd be willing to go cheap-but-good rather than trying to buy all high end components (and look where *that* got me). Does anybody have a UNI or SMP AMD Athlon MP and a couple of 100+ GB EIDE disks they'd like to donate? :)

2002-04-04
Sometime around here I made the real shift to the new server, but forgot a changelog entry. Things have been humming along nicely since.

2002-03-21
OK, data is migrated. The old server is still the master, but the new box is a slave, and the CGIs (currently running on the new box) are querying it. So it doesn't matter how slow the old box is :-P. Some of the background daemons (*cough* fulltextindexer *cough*) are not caught up, although now they may actually be able to do so in a reasonable amount of time (we'll see). Once I'm comfortable with the stability of the new box, everything will move to it as primary. Enjoy.

2002-03-19
OK, going to be doing some big data shuffling into the wee hours. Should not be very noticable other than the system being read-only for a while. But... famous last words.

2002-03-10
...Nevermind. The new server isn't racked in its final resting place yet, and as such has only a 10mbit connection. I'll wait to move the 35 GB of data, thanks. :-P

2002-03-10
The system is going to be in read-only mode for a bit while I sync the databases over to our new server. Mail will back up for a bit (and browsing will slow down some) while the current DBs are copied over.

2002-02-28, 3AM
Just got back from a business trip. SSL access is disabled until I get some sleep, and then can patch mod_ssl. Just because this sounds hard to exploit doesn't mean there aren't hundreds of people trying hard to make it happen (think of the payoff). So, sorry those of you who use HTTPS to access MARC to blind nosy proxy servers :-P

2002-02-14
The Intel EtherExpress Pro/100 in the MARC server decided to stop passing packets. Reloading the driver didn't help; had to bounce the box (God love serial consoles). Things seem to be back and happy...

2002-01-21
<sigh>
The RAID array in the MARC server went tits up at about 2 AM today. It seems that there were multiple bus resets before a total hang which caused the data to be inconsistent between two disks of a mirror set. It took some smacking around but things are back. The DB is still catching up on linking/indexing the backlogged mail, though...

2001-12-16
Early in the AM one of our 36G 10k RPM IBM SCA drives bit the dust, 37 days after its purchase, and only 30 days of service :( ...but it's on a raid card, so we're still up and running. That's a big improvement, huh!? New drive is on the way.
Guess they just don't make them like our old 18G 7200 RPM Seagate cudas anymore. They have been in service, in MARC, since late 1998!

2001-12-09
Er... I've managed to blow away the last several changelog entries. I think I need some sleep.
In other news, this box is now upgraded to latest apache, perl, mysql, openssl, mod_perl, mod_ssl... and the CGI is now running under mod_perl. We'll see if it blows up...

2001-11-16 17:30
OK, only a couple of bios updates and other silly noise later... we're back. I'm going to be shuffling data to the new drives soon so there might be some slowdowns, but probably not much.

2001-11-16 13:15
Boot disk swap is done, with some complications. Lunchtime. Adding the RAID array at 19:30 GMT.

2001-11-16
We're going to take the server down for some disk shuffling today: replace the boot drives with some 18GB barracudas, and add a SCSI RAID array. I don't want to give an ETA because I'll just be proven wrong by Murphy.

2001-11-15
Some of you might have noticed problems accessing MARC this evening from about 7:15 through 9:45 pm US/Eastern time. We had one of those rare combinations of factors that came together to make a normally redundant system fail. Human error from a few weeks ago, combined with a server crash at around 7:15pm killed all DNS in our network. This was one of those rare times that required a drive into the office to fix stuff -- thus the extended down-time. Redundancy has been restored, appropriate lashings distributed, and we are reviewing procedures, and planning to add a third level of redundancy to our internal DNS. --Lester Hightower

2001-11-11
I got sick of not knowing when the 3Ware card falls over, and feeling like the box needs checking every fifteen minutes. So last night I banged out a wheel-reinvention--a small daemon to watch syslog logs for signs of the 3Ware wigging out. Code slightly sanitized here. Hopefully that'll improve reaction time when something happens. Hopefully also this will only be in place for a couple of days, until we slap new SCSI RAID stuff on this machine and sleep at night.

2001-11-10 18:30
Great... two days between 3Ware crashes. If anyone's interested the errors we're seeing are an endless stream of:
Nov 10 15:44:53 marc1 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x57132a90).
Nov 10 15:44:53 marc1 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits.
Nov 10 15:44:53 marc1 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits (0x57132a90).
Nov 10 15:44:53 marc1 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected bits.
etc.
I'm isamchk'ing things now, in the meantime the site is read-only; you know the drill.

2001-11-09 20:30
Ugh. Two and a half hours, three DAC960 cards, two pairs of disks, and two cables later, and we're back where we started minus 36GB of disk space. Um.

2001-11-09 14:50
Taking the system down RSN to do some disk shuffling and add a SCSI RAID controller the AIMS group is donating. There should be another outage early next week when some new drives arrive.

2001-11-08 17:00
Isamchks of the primary tables were clean; checking the full-text indexes now. Starting to feed backlogged mail into the DB, which will cause even more cache thrashing.
And making plans to get some mo bettah drives...

2001-11-08 10:00
OK, running isamchk foo while I go to a day of meetings. It will probably take a good six hours for everything to be isamchk'ed, but better safe than sorry.
Screw Sun V880's... anybody got a spare A3500 array they want to donate?
Sigh. Hardware sucks. Cheap hardware sucks more. :-P

2001-11-08
This is worthless elf Dave. Sigh. See 11-04? Yeah, happened again. Sigh. Got a little more error info outta 3ware stuff before I rebooted it, perhaps it'll help. I left MARC in "Read Only" mode which means... you can search what's there, but the mail is spooling up on 2 backup MX's.. I'll let Hank check this out, but he went to bed just a few hours ago (ya know, that whole the sun comes up, must find coffin, ..err..), so it might be later today. Oh well. Anyone wanna donate a Sun V880? ;)

2001-11-07
I just hacked up a front-page-cache. The front page of MARC is the biggest, and it's fairly slow to generate too--lots of DB queries (and not very efficiently done). A cached copy now gets generated and rechecked every five minutes, and the common case (no cookies set, no color/other preference parameters set) simply reads from the cached file. This reduces backend load but more importantly removes latency when building the front page, which I think shows up as a long load time on medium and slow links.

2001-11-06
Reading this message on the linux-kernel list from Nathan Scott got me wishing MARC supported footnotes in messages (made them into links). So now it does. The code is still wet and probably buggy; I'd appreciate any feedback, especially about things that now seem to have broken. Current behavior:

A [1] anywhere else in a line means this is a footnote
[2] on a new line means this is what the footnote points to. Should we ignore leading whitespace, so this would be a footnote too:
[3] ?
A token like $foo[2] won't match -- there can't be a $ in a token touching the []'s. This reduces false-positives in perl code in an email; not sure what to do about C, if it's a problem.
More than 99 footnotes indicates dain brammage and need not be supported
Footnotes will either be [1] or [2,3,4] -- would you think people often/ ever do [5-8] ? I thought about that but wimped out.

One bit of brokenness is that the code will try to react to quoted footnote-tags (see this example). I think the answer is, "tough." Another is that the reverse-pointing HREF's (going from footnotes back up into the document) will find the lowest/last reference to that footnote, not the first or the specific one that was clicked on. The answer for that might be the same; not sure.

2001-11-04
Well, the good news is, the ECC memory seems to have increased the stability of the 3Ware card. The bad news is, just when I was starting to think it was the non-ECC memory all along (and the 3Ware was just the canary in a coal mine)... we started having 3Ware errors this evening. Unlike before the box did not crash, but just entered the rough equivalent of a bus-reset-storm, so disk I/O got more and more backlogged, running the load up to 35 or so and killing responsiveness. I was able to shut things down and reboot && fsck, no errors found. There were a number of minor corruptions in MySQL tables though, so I've been isamchk'ing like a fiend. Incoming mail was restored by 2-3 AM on the 5th, but some of the background processes still haven't been re-enabled.
Crossing fingers that this was an isolated hiccup with the 3Ware card...

2001-08-20
Took the server down for a few minutes this morning while the guys replaced the memory in it with two new ECC DIMMs. We'll see if this helps the stability...

2001-08-17 02:00
OK, got the databases checked and new mails fed back in.
I may have figured out what's been causing the crashes: when we upgraded this box to 512MB, the second DIMM added was not ECC. Looking back trough the logs, three times in the past year the kernel has printk'ed a single message which looks like the result of memory corruption (complaints about bogus entries on the free list or a page deref where only one bit is set in an otherwise NULL value). This isn't good. What's odd is that this--if it is the cause of the stability problems--never caused problems anywhere but the 3Ware driver; the other errors were non-fatal. It might explain the one or two MySQL incidents we've had in the past year, too.
Anyway, we've got some new ECC DIMMs being overnighted (over the weekend of course!) so will probably be taking the box down on Monday to swap out memory. Cross fingers that this does it... but I'm not holding my breath.

2001-08-16
Sorry about the downtime. Hank's outta town, this is worthless elf Dave again. Something made things unhappy around 4am, someone there rebooted it, unfortunately, didn't get a chance to see the console before reboot.. ;/ Maybe Hank glanced, I forgot he has serial console up. So, MARC is in readonly mode, Hank requested as such, and said he'll check the dbs out from hotel land (welleavethelightonforyoucoughcough) tonight. But everything'll be back into place, all the mails are spooling on backup mx's, so they'll be in place eventually.

2001-08-14
Long overdue upgrade to Mysql 3.23.41 today. Seems to have gone without a hitch. The way things are supposed to work.

2001-08-07
Got another complaint about the white text & black background. That makes about five complaints in four years, which means to me that most of you people are normal, moral, God-fearing folks who prefer light text on a dark background, and there's only a small perverse minority who prefer black-on-white. But hey, I'm flexible. I just added some quick scheme buttons to the 'Configure' page, so you can quickly pick a black-on-white scheme rather than dial-a-color, if that's what you want. I probably should have done that a long time ago anyway...
I'll take suggestions for other schemes to add, BTW.

2001-08-06
Or not (see below). During the long cold winter we were forced to eat Robin's minstrels, and there was much rejoicing. Other than that, things have been pretty smooth for the past month (cross fingers).
No real idea what was causing the 3Ware deaths. We got a few concerned emails from 3Ware tech support (thanks to those who wrote to them citing us and worrying ;), but we never really got an answer from them for the one question we really wanted an answer to: were the symptoms we described a)consistant with things the new firmware fixes, or b)consistant with the behavior of flaky/broken/can-be-fixed-by-replacing 3Ware cards? So we are rolling with it until/unless the card kills the box again, at which time, it's pulled-Baraccuda time.

2001-07-06
Whee indeed. Hank's in meetings and out of town. This is Dave, I figured I'd put something for people wondering what's up, and he can change it later. Basically, we upgraded one of our proxy servers this morning w/ a new 3ware firmware. It was another box w/ some 3w related issues (but none that cause crashes like whatever's up w/ MARC), and the upgrade went fine, so we decided what the heck, let's upgrade MARC, Hank's not looking, what he doesn't know can't hurt him, right? .. er.. I mean. So anyhow, we upgrade it, then reboot. Somehow, the upgrade screwed up something to make LILO really unhappy with us. Which irritated us somewhat, seeing as the 3w drives aren't even the system disks, so it did something that the BIOS saw that made LILO hate us, as best we can figure. We're not really sure. Even after we got into the box, re-running LILO didn't help, whatever had happened made the old version of LILO broken, period. So we punted and tried a new-ish version, and that seemed to do the trick.
Hank will probably kill this later and add some helpful info that is actually coherent.

2001-07-05
Whee. Another 3Ware crash. Since several people have asked what exactly our problems/symptoms have been (because they were planning to buy some 3Ware cards and are now reconsidering) I'll post details later.

In the meantime, the site is up, read-only. I'm making full backups of everything because this probably won't be the last crash we have before we get this resolved, and if/when we finally start losing data because of them, I don't want to play the incremental game. So, write access is disabled for now (mail is just spooling up). I should be able to start catching up the incoming feeds in about an hour.

2001-06-30
Wow, almost 3 months of uptime (er, nevermind the short outage a few weeks ago as power was rerun...) before the 3Ware driver crashed the box again.

Things are mostly back together. The DB is read-mostly for now while a few dozen GB's get isamchk'ed (new mail is coming in and will show up, but author, thread, and full-body indexes won't be updated until later tonight).

Whee.

2001-04-01
OK. There've been some updates to the 3Ware driver, which hopefully fix the problems that bit us earlier. We'll see... I'm going to reboot the box to 2.2.19 shortly, and then beat on it for a while, laying off when it gets late enough nobody'll want to drive back to the office to hit the switch if it dies. In the meantime, I'm going to have incoming feed to the databases turned off, so all tables will be read-only and thus reduce the chances of data corruption...

...OK, running 2.2.19. Let's see how the new 3Ware driver stands up to some abuse...

2001-03-19
Fsck. fsck, fsck, fsck, fsck, fsck, fsck, fsck, fsck...
Sigh. Started copying some data to the 3Ware'd drives. Everything's happy for the first 4 hours of doing so (hey, moving 25GB of data slowly enough that the cache-thrashing doesn't kill performance of the live site is a slow process). The box waits until past midnight to lock up tight. Andy Orr rules for having gotten up and driven to the office. 3Ware driver'd spat up some error messages and left the box completely locked with interrupts disabled (not even sysrq magic worked). A reboot, lots of fscks, and a promise not to touch the 3Ware stuff again any time soon, and we're back... (running a whole lot of isamchk's now just to be on the safe side).

2001-03-13
Whee. Why do I keep hearing the line from the Gilligan's Island song, "a three hour tour" ?
So. Shut the box down, install the 3Ware IDE-RAID card. Hm, motherboard bios hates the 3Ware card. Check around: there's BIOS updates from Intel for the mobo. Maybe that'll fix things? Pull it down, try to update BIOS. Find that it's not possible to save existing settings; they all get nuked, thank you Intel. Cross fingers. Get the new BIOS happy, the box sees the 3Ware card, life is good. The 3Ware card chews on the IDE disks a while setting up the mirror. Eventually, reboot. Things seem fine, except for the 50% packet loss on eth0. Um. Realize that IO-APIC is disabled in the BIOS now. Reboot; fix that. No change. Reboot, get the bios to stop putting every damn thing on IRQ11. Reboot. No change. Try forcing the Tulip to come up half duplex. No change. Punt; enable the on-board Intel EEpro (which is among the bad run of chips that had multicast filter lockup bugs some time ago, hence the presence of the Tulip card...). Presto, everything's working great. And there was much rejoicing.
So, that one hour upgrade ended up taking about 5 hours. Sorry about that. But hey, at least now the box is Y2K compliant--with the new BIOS, the clock no longer thinks that it's 1996. Note that the server was new in 1999...

2001-03-12
Pretty quiet for a while, so of course it's time to shake things up. It's time to take the server down to add some more disk space. While we're at it some drives are going to get shuffled from the external canisters they are in currently--should be pretty straightforward. The box will be going down at 6PM US/Eastern (23:00 GMT) and with luck will be down less than an hour.

2001-02-21
A few reboots, and this box is finally running a 2.2 kernel. It's been 2.0.x forever because hey, if it ain't broke don't fix it. But, it was mildly broken in various ways. More importantly, there's a 3Ware card and pair of drives ready to go into this machine, but 3Ware needs 2.2.x and above AFAIK, so it was time. This should also give a performance and stability boost, though I can't complain about how it's been so far...

2001-02-19
OK, I've culled the 5450 messages that didn't get inserted, and re-inserted them. There's still the ~200 which lost their message bodies--will fix those another day.

2001-02-14
We had our first MySQL-related problems in nearly two years on the 12th. A table index got munged, such that writes to that table hung indefinitely, causing (among other things) threads to pile up until all MySQL connections were in use, effectively shutting things down. Of course this *would* have to happen when I am halfway across the country and completely net-dead. A quick isamchk made things happy again. But, nearly six thousand emails that came in over the past two days did not get posted, or the headers were inserted and the bodies were not (we still have copies of them, they just didn't hit the DB).

2001-02-01
Some stats, since I've just been asked: MARC runs on a dual P2-400 donated by the AIMS group. The box has 512MB of RAM and a pair of 18GB Seagate Barracuda SCSI drives. On them live (currently) 4.6 million messages, with 7.3 GB of message-body text, 1.2 GB of header text, about 2 GB of various indexes and query-speeding denormalized tables, and 9.7 GB of full-text indexes. (We, um, optimize for runtime speed, not compactness.) We took in about 215,000 new emails, over 800MB of raw mailspools, last month. The website, between marc.theaimsgroup.com and lists.kde.org, got nearly two million hits in Jan 2001 (more usage stats, which show things like the "popularity" of different lists, and most common search terms are available here).

It's not hard to see those disks won't last too much longer; a couple of big EIDE (sigh) disks and a 3Ware card are waiting for me to be ready to take the box down so they can be installed.

2000-12-09
Ugh. Nice neat comedy of errors resulted in an outage this morning. Power failure from about 7 AM to 8:30 AM. No problem, most of the servers have 2+ hours of UPS power. Only... the new alarm system (a boat alarm, huge beastie) was plugged into the same UPS that the firewall was plugged into. So that UPS drained and died quickly, before power came back. No problem, only... the (recently upgraded) firewall's ATX BIOS wasn't set to turn back on after power failure. No problem, other boxes monitor everything and have analog phone lines to dial up the paging tower and alert people. Only... the phone lines were re-run the other day. One of the paging modems had its phone line disconnected. The other, was plugged into the UPS that died. After an hour or so of the power failure, the paging software quit trying to talk to the dead modem, so didn't notice and flush when the modem came back.

Sigh.

2000-12-04
I just had some fun hacking up Webalizer. Check out the MARC usage stats here. I've run stats for lists.kde.org since it was created, marc.theaimsgroup.com since the main MARC site was renamed to that, and for www.progressive-comp.com since before MARC existed, to after the company's name changed and MARC moved to marc.theaimsgroup.com (after which, every hit to progressive-comp has been redirected to the new site).

2000-11-17
The AIMS Group's data center, and therefore the MARC server, is now multihomed. Of course, as the second link was being turned up, UUnet had an outage, and... stuff was unreachable for some time Thursday and/or Friday. Doh. Should be better now.

2000-11-something
The reindexing is complete, w00p. Just in time for me to realize that I've got attachment-decoding logic in the CGI code, and it'd really make sense to use the same code to decode base64'ed attachments and index the contents of them. Doh... maybe some day ;)

2000-10-31
I'm curious about something. Long ago I added Expires: headers to nearly all pages that MARC generates to make local caches more efficient when hitting the back button, getting to the same message two different ways, and otherwise skipping around one's history. I'm curious exactly what this "efficiency" is "costing" us in terms of hits-per-month. No, I don't want to go start selling banner ads. But it'd be nice to know. So I'm considering the silly one-pixel-transparent-gif-with-nocache-pragma that people often do to measure such things. I'm curious if people would take this particularly badly. So, please email us and tell us how this would be a first step down the path to evil bannerad-hood, how you will laugh because lynx ignores images, how you will load the four or so gifs the site does use and then disable image-loading and laugh, etc. FWIW I only intend to turn something like this on for a week or two, to see how the rates/percentages change, so I can do some statistical analysis on cache/proxy usage, etc and then turn it off.

2000-10-25
The added memory has definitely helped; namely, I've been keeping the box pegged doing index-table-rebuilds for days now and it's hardly noticable (most of the time, anyway--we got /.'ed once or twice in the meantime, which was fun). The rebuild is still going to take weeks, but/because I'm not disabling anything else while I'm doing it (new indexes are building one list at a time, and being swapped into place once they're completed).

2000-10-20
Woah, finally started datestamping these things. ph33r. Anyway, the AIMS folks just added more memory to this box for me (from 256 to 512 MB). Nice. The bad news is, this gives me enough breathing room to start doing some database rearranging that's been sorely needed. So depending on how much I abuse the server in the next days/weeks, performance may actually go down ;)

2000-07-17
Thread and Author searches are back. They're now done using about the same algorithm that body-searches always have, so should be much more scalable than before. We'll see...

I also just added a bunch of code to (try to) handle MIME attachments. The glue isn't dry yet, so beware. Anyway, the code tries to pick out binary attachments and make them downloadable (and not part of the standard message view). It also tries to expand in-line text attachments that have been Base64'ed or such (but also with a downloadable link). HTML attachments are deliberately _not_ preserved. With all of the issues surrounding malicious HTML content, I'd just rather go ahead and break embedded HTML, esp in lists like BugTraq. If you _really_ want to see unadulterated HTML, download the message using the RAW link, and have fun. (Now, someone is going to discover a way to sneak something through that a broken browser will react to stupidly, and post it ;)

The code for attachments is a big hack. I can't believe it, but it seems to work. And so it goes.

[Undated changelog entries before that...]

The site has moved! Or not. Really, it's just changing names. The company that was Progressive Computer Concepts, Inc was bought by its biggest client some time ago. Since then the ex-PCC guys have been developing Intranet and Extranet sites and large back-end databases for various segments of the United States Railroad industry. Their unit is semi-autonomous within the parent company. www.progressive-comp.com is now permantently being redirected to their group's site, http://www.theaimsgroup.com/, and MARC now lives at http://marc.theaimsgroup.com/. It's still sitting on the same server, runs out of the same data center, etc. Only the name has changed. www.progressive-comp.com/Lists/foo will be redirecting to the new site indefinitely (so no old links, bookmarks, etc should break).

I've reworked the view-messages-by-date code to stop carrying around 'e=XXXXXX' (the end-of-range year/month) in the URL, and just assume that one is always browsing one month at a time. While I was at it I removed an extraneous &d;=1 from most URLs -- that was something the code used to care about, but hasn't for quite some time, so boom, it's gone now. The old URLs will still work, of course, but they'll be redirected to the new (shorter) ones now. This (and the below change) makes URLs shorter now; from:

  http://www.progressive-comp.com/Lists/?l=openssl-users&r=1#openssl-users
  http://www.progressive-comp.com/Lists/?l=openssl-users&d=1&r=1&b=199911&e=199912

to:

  http://www.progressive-comp.com/Lists/?l=openssl-users&r=1
  http://www.progressive-comp.com/Lists/?l=openssl-users&r=1&b=199911

Minimalism Is Good[TM].

A couple of days ago I changed the behavior of top-level views of a specific list (when you follow a link to a list off the main page, and/or go to the list of message/months for a given list). It used to be, you'd get a page listing all the lists in the same list-group as the one you were interested in, with links for that one's messages by month expanded. A Name tag would make the browser drop to the appropriate list. This meant that the search box, at the top of the page, was lots of scrolling away. Non-obvious, and also not condusive to laziness. Therefore... if a specific list is selected, it will now appear at the _top_ of the pile of lists for that group, with no #anchor, and no useless scrolling necessary.

I've just enabled the login & environment-configuration code (the way to change your color scheme, yay). I ripped off a color picker from the Netscape web-devel site, hope it works ;) Hope the new code works in general -- I sort of expect people to trip over stuff in the next couple of days; this will force me to clean it up/finish/fix it... Please send any comments (good or bad) about it to webguy@marc.info. Enjoy ;)

Hardware stuff seems under control -- tulip ethernet cards rule. Also, I got off my butt and learned something about mysql. Delayed inserts and low_priority insert/update own me. Since 99% of the interactive stuff w/MARC is read-only on the databases, and the only writes are non-timing-critical, I've changed all MARC-related code to use low_priority and delayed stuff wherever possible. This avoids the situation where MARC feels nice and fast, and then as a few mails get delivered and processed, message viewing could wait... and wait... and wait until all the pending writes were done. Now it's the other way around -- inserts and updates wait for idle time. I like. To quote a friend of a friend, "Somebody should kick my a** for not learning this sooner."

Sigh We've been having some hardware fun again lately. Don't ever run eepro's in a busy box without disabling multicast... or you will be woken up in the middle of the night by pagers telling you stuff is dead ;) Hopefully we've gotten all the issues sorted out.

I've turned on some new code. It's got several new features, and probably some new bugs (er, unintended features) as well. Namely: an experiment with a Yahoo-like grouping of lists, so a list-of-lists isn't quite so large; searching by subjects and authors (across all lists) in addition to per-list message body searching. We'd appreciate any comments about the new stuff, especially bug reports.

We've moved the databases to a new and otherwise idle system; it seems much happier now.