11:11 SeanDaly lurking
11:12 bernie * regular meeting times (mchua)
11:12 * posting meeting logs - this one should be short, just "we'll do it, here is where" (mchua)
11:12 * setting goals and milestones for this release (mchua)
11:12 * status update on aslo cluster (dfarning)
11:12 * long-term infrastructure plans (bernie)
11:12 #topic regular meeting times (mchua)
11:12 Mel: (Bernie and i are using the same laptop right now, so we'll be swapping back and forth.)
11:12 dfarning_ Ahh I was confused
11:13 Friday ten oclock work fine
11:13 SeanDaly nick should be "berniemel"
11:13 bernie mel: The purpose of my 3 agenda items is to answer the question "if I haven't been involved in the day-to-day of Infrastructure but want to find out what's going on and what I can do to help in <10min, what do I do (and what do we need to set up, as a team, for me to be able to do that?)
11:14 mel: weekly meetings, posted logs, and a release calendar are the fastest way I know to do that
11:14 mel: "read last week's log, go to next week's meeting, and we're working through the milestones on this calendar"
11:14 dfarning_ yes, we have grown to the point where meeting as needed
11:14 bernie SeanDaly: can't change nick without upsetting the meetbot
11:15 SeanDaly learns something every day
11:15 dfarning_ another very important thing is to shift bernie from wizard behind the curtain to teacher
11:15 bernie dfarning_: +1
11:16 dfarning_: I'm having a hard time finding trusted volunteers who are both capable and willing to take over entire pieces of infrastructure
11:16 dfarning_ every project has the trouble .. we need to start growing them
11:16 bernie dfarning_: adam has suggested someone who seems to be the perfect candidate to maintain trac
11:17 I guess we should switch topic
11:18 mel: before we move on - is it decided that this is going to be our weekly meeting time (until we need to switch it)
11:18 and that everything will be logged via meetbot?
11:18 bernie: +1
11:18 dfarning_ sounds good
11:18 bernie mel: ok, moving on now :) I figure we can lay out the issues on the rest of the agenda and then if we have time circle back and figure out what we want to do for this release cycle
11:19 ok, next topic
11:19 #topic status update on aslo cluster (dfarning)
11:20 dfarning_ I think the majority of the hard work is now done
11:20 The abstraction of components is done and works.
11:20 bernie dfarning_: I agree
11:20 dfarning_ next phase
11:20 testing and tuning
11:20 bernie dfarning_: I'm a little worried for the stability of treehouse as a virtualization host: I have seen VMs hang in the past
11:21 dfarning_ Yes, would like to move it as soon as we have somewhere to put it.
11:21 bernie dfarning_: we also need to bolt down network communication between the many VMs to make sure there's no way to eavesdrop and/or hijack the cluster at any level
11:22 dfarning_ Yes, proxy and db are still not on interconnect
11:23 bernie dfarning_: also, distributed systems are inherently more fragile. I feel like we have insufficient HA monitoring and failover at this time.
11:23 dfarning_: soon after we transition the aslo cluster into production, we should start making at least  the db redundant.
11:24 dfarning_ Depending ou you available time, I would like to set up aslo-proxy2 some where and conect the proxies with heart beat
11:25 bernie dfarning_: is it even possible to make the proxy redundant without some kind of IP takeover?
11:25 dfarning_ heartbeat can handle ip takeover
11:26 one machine is the master
11:26 bernie mel: interjection from a non-sysadmin - since I can't help directly with infra team tasks, but am basically here to help drive a beat and grow the team's capacity, I'd like to get us in the habit of blogging non-sysadmin-readable summaries of the work that's being done.
11:26 dfarning_ if it goes down the other can take over
11:26 bernie mel: in other words, "what infrastructure's done for you today, and why you should care." this will help with recruiting.
11:27 mel: I'll take the first blogpost and will hit up mailing lists for explanations of something cool I see passing by, just a fyi.
11:27 #action mchua take first infra blogpost
11:27 mel: +1000
11:28 dfarning_: then the two machines need to run on different physical hosts, but share the same LAN so they can take over the IP.
11:29 dfarning_ yes, that is why I am push for a cluster at rit or the EU
11:30 bernie dfarning_: treehouse and sunjammer would be ideal for the task
11:30 dfarning_: but we can't create new VMs besides sunjammer on the fsf machine. can we setup the proxy slave right on sunjammer?
11:31 dfarning_: let's hope for the best
11:31 dfarning_ Is is possiable to first set up a second vm on treehouse for me to use to test before doing somethen this invasive to sunjammer
11:31 bernie dfarning_: meanwhile, yesterday I had a quick talk with ivan on the phone. we might want to replace solarsail with a new machine.
11:32 dfarning_ What is going to happen to solarsil?
11:32 bernie dfarning_: me, ivan and SL may share the cost. hosting could be done at rit, the eu thing or transworldix, from which we already have an official free hosting offer.
11:33 dfarning_ Can we host multiple machines as transworldix
11:33 bernie dfarning_: we see that solarsail is aging: slow processors, fancy architecture that is hardly maintained upstream....
11:33 dfarning_ s/as/at/
11:34 bernie dfarning_: I think we were offered 2U of rackspace, but perhaps we could ask for 4U and stick 4 pizza boxes in it.
11:34 dfarning_ not sure how big a u is in comparison to machines
11:35 cjb most rackmount machines are 2u
11:35 your average desktop box is like 6u
11:35 bernie dfarning_: treehouse is 2U
11:35 dfarning_: solarsail is also 2U
11:35 cjb very small rackmount machines without much expansion space can be 1u
11:35 dfarning_ thanks
11:36 bernie cjb: these days, it wouldn't bother anyone: a 1U can happily house 3 1.5TB drives
11:36 cjb bernie: well, that doesn't bother anyone who doesn't want to have more than 3 drives :)
11:36 bernie cjb: for web services, we always saturate the cpu before disk and memory
11:37 cjb perhaps you can't use a super-fast CPU on 1u because of the limited cooling possibilities
11:37 (but I'm just guessing.)
11:37 dfarning_ I must aplologize I don't know much about hardware
11:38 before tossing out solarsail-- I would like to test it as a web node for aslo
11:40 proposed action bernie or dogi create aslo-proxy1 on treehouse to test and debuge heartbeat
11:43 bernie dfarning_: ok
11:44 dfarning_ Do you think dogi would be willing to give me root on housetree?
11:44 bernie dfarning_: we should ask him. he told me not to give it to anyone, and I can understand...
11:45 dfarning_ now, you do all sorts of wizardly things to set up an maintain vms.  I would like to see about simplifiy that process
11:45 bernie dfarning_: (housetree is the spare machine hosted at pika)
11:46 dfarning_: I documented all the procedures I use in the wiki, but frankly many of those are to be tuned case by case
11:46 dfarning_: unfortunately, the libvirt management tools are still very rudimental
11:46 dfarning_ yah, better to have me learning on housetree than a production treehouse
11:47 bernie dfarning_: for example, moving a machine from a file to a partition cannot be automated in any way and required me to mount the partition with a fancy command
11:47 mount foo.img /mnt -o loop,offset=32256
11:47 or something like that
11:49 dfarning_ we can revisit this later
11:49 is there more on the angenda
11:49 bernie dfarning_: I'll try to consolidate a little bit and document a little more, but really there's not much we can do at this time to simplify VM management beyond a certain point
11:50 the best we could do is NOT create any more VMs. they are a huge cost in terms of maintenance
11:50 dfarning_: ok. let's wrap up for today
11:51 dfarning_: we need to talk with dogi, ivan on one side and with rit, eu and transworldix on the other
11:51 I'm confident we can get one new box for our core services pretty easily at this point.
11:51 dfarning_ bernie, I agree -- but I see no alteratative for testing and demonstrating scaling proof of concepts.
11:52 bernie I just hope we can make this happen before I leave the country in january.
11:52 dfarning_ can I give you two small action items?
11:53 bernie dfarning_: test machines are ok. we can have dozens of them, as they don't need backups, monitoring, updates and real security anyway.
11:53 dfarning_: I'd tend to limit the number of machines we have in production, just that.
11:53 dfarning_: I still have to create 2 more: pootle and dns/ldap
11:54 dfarning_: sure go on
11:54 dfarning_ ok,  Can you give also-more memory? it started to swap
11:54 bernie dfarning_: ok, but I'm riding the bus now and abusing mel's connection
11:54 dfarning_: can it wait until tonight or tomorrow?
11:55 dfarning_: we have enough memory for now.
11:55 dfarning_: which aslo machine do you need to expand? aslo-web?
11:55 dfarning_ and aslo-web is down
11:55 bernie dfarning_: uh-oh
11:55 dfarning_: damn it's dog slow
11:56 dfarning_ it ran out or memory and crashed this morning -- or got really slow
11:56 bernie dfarning_: aslo-web is eating up cpu time. weird
11:57 dfarning_: btw aslo-web should not swap because I didn't give it any swap :)
11:57 dfarning_: it seems to be frozen the same way I've seen beamrider freeze
11:57 dfarning_: beamrider certainly did not run out of memory, so it may be a different issue
11:58 dfarning_ Any why - that is it for me for now.
12:03 bernie, need to run.
12:03 bye
12:04 bernie dfarning_: rebooted aslo-web
12:04 btw, resource usage on treehouse is still low:
12:04 CPU: 66.6%  Mem: 19456 MB (19456 MB by guests)
12:04 CPU: 14.0%  Mem: 19456 MB (19456 MB by guests)
12:04   24 R    0    2            4.9  0.0  67:57:00 trixbox
12:04   35 R    1    6            2.1  3.0  20:38:59 aslo-proxy
12:04   48 R    0    0  861  844  1.6  6.0   0:39.93 aslo-web
12:04    8 R    0    0            0.3  0.0 359:59.42 vig
12:04   44 R    0    1  180    0  0.2  6.0 214:35.03 aslo-db
12:04   42 R    0    1  180    0  0.2 12.0  29:45:14 beamrider
12:04    5 R    0    0            0.0  0.0 573:52.21 meeting
12:04    -                                           (template-karmic)
12:05 going offline
12:05 over
12:05 dfarning_ bernie, do you have too #MMETING OVER
15:07 bernie #endmeeting

