Time |
Nick |
Message |
11:11 |
SeanDaly |
lurking |
11:12 |
bernie |
* regular meeting times (mchua) |
11:12 |
|
* posting meeting logs - this one should be short, just "we'll do it, here is where" (mchua) |
11:12 |
|
* setting goals and milestones for this release (mchua) |
11:12 |
|
* status update on aslo cluster (dfarning) |
11:12 |
|
* long-term infrastructure plans (bernie) |
11:12 |
|
#topic regular meeting times (mchua) |
11:12 |
|
Mel: (Bernie and i are using the same laptop right now, so we'll be swapping back and forth.) |
11:12 |
dfarning_ |
Ahh I was confused |
11:13 |
|
Friday ten oclock work fine |
11:13 |
SeanDaly |
nick should be "berniemel" |
11:13 |
bernie |
mel: The purpose of my 3 agenda items is to answer the question "if I haven't been involved in the day-to-day of Infrastructure but want to find out what's going on and what I can do to help in <10min, what do I do (and what do we need to set up, as a team, for me to be able to do that?) |
11:14 |
|
mel: weekly meetings, posted logs, and a release calendar are the fastest way I know to do that |
11:14 |
|
mel: "read last week's log, go to next week's meeting, and we're working through the milestones on this calendar" |
11:14 |
dfarning_ |
yes, we have grown to the point where meeting as needed |
11:14 |
bernie |
SeanDaly: can't change nick without upsetting the meetbot |
11:15 |
SeanDaly |
learns something every day |
11:15 |
dfarning_ |
another very important thing is to shift bernie from wizard behind the curtain to teacher |
11:15 |
bernie |
dfarning_: +1 |
11:16 |
|
dfarning_: I'm having a hard time finding trusted volunteers who are both capable and willing to take over entire pieces of infrastructure |
11:16 |
dfarning_ |
every project has the trouble .. we need to start growing them |
11:16 |
bernie |
dfarning_: adam has suggested someone who seems to be the perfect candidate to maintain trac |
11:17 |
|
I guess we should switch topic |
11:18 |
|
mel: before we move on - is it decided that this is going to be our weekly meeting time (until we need to switch it) |
11:18 |
|
and that everything will be logged via meetbot? |
11:18 |
|
bernie: +1 |
11:18 |
dfarning_ |
sounds good |
11:18 |
bernie |
mel: ok, moving on now :) I figure we can lay out the issues on the rest of the agenda and then if we have time circle back and figure out what we want to do for this release cycle |
11:19 |
|
ok, next topic |
11:19 |
|
#topic status update on aslo cluster (dfarning) |
11:20 |
dfarning_ |
I think the majority of the hard work is now done |
11:20 |
|
The abstraction of components is done and works. |
11:20 |
bernie |
dfarning_: I agree |
11:20 |
dfarning_ |
next phase |
11:20 |
|
testing and tuning |
11:20 |
bernie |
dfarning_: I'm a little worried for the stability of treehouse as a virtualization host: I have seen VMs hang in the past |
11:21 |
dfarning_ |
Yes, would like to move it as soon as we have somewhere to put it. |
11:21 |
bernie |
dfarning_: we also need to bolt down network communication between the many VMs to make sure there's no way to eavesdrop and/or hijack the cluster at any level |
11:22 |
dfarning_ |
Yes, proxy and db are still not on interconnect |
11:23 |
bernie |
dfarning_: also, distributed systems are inherently more fragile. I feel like we have insufficient HA monitoring and failover at this time. |
11:23 |
|
dfarning_: soon after we transition the aslo cluster into production, we should start making at least the db redundant. |
11:24 |
dfarning_ |
Depending ou you available time, I would like to set up aslo-proxy2 some where and conect the proxies with heart beat |
11:25 |
bernie |
dfarning_: is it even possible to make the proxy redundant without some kind of IP takeover? |
11:25 |
dfarning_ |
heartbeat can handle ip takeover |
11:26 |
|
one machine is the master |
11:26 |
bernie |
mel: interjection from a non-sysadmin - since I can't help directly with infra team tasks, but am basically here to help drive a beat and grow the team's capacity, I'd like to get us in the habit of blogging non-sysadmin-readable summaries of the work that's being done. |
11:26 |
dfarning_ |
if it goes down the other can take over |
11:26 |
bernie |
mel: in other words, "what infrastructure's done for you today, and why you should care." this will help with recruiting. |
11:27 |
|
mel: I'll take the first blogpost and will hit up mailing lists for explanations of something cool I see passing by, just a fyi. |
11:27 |
|
#action mchua take first infra blogpost |
11:27 |
|
mel: +1000 |
11:28 |
|
dfarning_: then the two machines need to run on different physical hosts, but share the same LAN so they can take over the IP. |
11:29 |
dfarning_ |
yes, that is why I am push for a cluster at rit or the EU |
11:30 |
bernie |
dfarning_: treehouse and sunjammer would be ideal for the task |
11:30 |
|
dfarning_: but we can't create new VMs besides sunjammer on the fsf machine. can we setup the proxy slave right on sunjammer? |
11:31 |
|
dfarning_: let's hope for the best |
11:31 |
dfarning_ |
Is is possiable to first set up a second vm on treehouse for me to use to test before doing somethen this invasive to sunjammer |
11:31 |
bernie |
dfarning_: meanwhile, yesterday I had a quick talk with ivan on the phone. we might want to replace solarsail with a new machine. |
11:32 |
dfarning_ |
What is going to happen to solarsil? |
11:32 |
bernie |
dfarning_: me, ivan and SL may share the cost. hosting could be done at rit, the eu thing or transworldix, from which we already have an official free hosting offer. |
11:33 |
dfarning_ |
Can we host multiple machines as transworldix |
11:33 |
bernie |
dfarning_: we see that solarsail is aging: slow processors, fancy architecture that is hardly maintained upstream.... |
11:33 |
dfarning_ |
s/as/at/ |
11:34 |
bernie |
dfarning_: I think we were offered 2U of rackspace, but perhaps we could ask for 4U and stick 4 pizza boxes in it. |
11:34 |
dfarning_ |
not sure how big a u is in comparison to machines |
11:35 |
cjb |
most rackmount machines are 2u |
11:35 |
|
your average desktop box is like 6u |
11:35 |
bernie |
dfarning_: treehouse is 2U |
11:35 |
|
dfarning_: solarsail is also 2U |
11:35 |
cjb |
very small rackmount machines without much expansion space can be 1u |
11:35 |
dfarning_ |
thanks |
11:36 |
bernie |
cjb: these days, it wouldn't bother anyone: a 1U can happily house 3 1.5TB drives |
11:36 |
cjb |
bernie: well, that doesn't bother anyone who doesn't want to have more than 3 drives :) |
11:36 |
bernie |
cjb: for web services, we always saturate the cpu before disk and memory |
11:37 |
cjb |
perhaps you can't use a super-fast CPU on 1u because of the limited cooling possibilities |
11:37 |
|
(but I'm just guessing.) |
11:37 |
dfarning_ |
I must aplologize I don't know much about hardware |
11:38 |
|
before tossing out solarsail-- I would like to test it as a web node for aslo |
11:40 |
|
proposed action bernie or dogi create aslo-proxy1 on treehouse to test and debuge heartbeat |
11:43 |
bernie |
dfarning_: ok |
11:44 |
dfarning_ |
Do you think dogi would be willing to give me root on housetree? |
11:44 |
bernie |
dfarning_: we should ask him. he told me not to give it to anyone, and I can understand... |
11:45 |
dfarning_ |
now, you do all sorts of wizardly things to set up an maintain vms. I would like to see about simplifiy that process |
11:45 |
bernie |
dfarning_: (housetree is the spare machine hosted at pika) |
11:46 |
|
dfarning_: I documented all the procedures I use in the wiki, but frankly many of those are to be tuned case by case |
11:46 |
|
dfarning_: unfortunately, the libvirt management tools are still very rudimental |
11:46 |
dfarning_ |
yah, better to have me learning on housetree than a production treehouse |
11:47 |
bernie |
dfarning_: for example, moving a machine from a file to a partition cannot be automated in any way and required me to mount the partition with a fancy command |
11:47 |
|
mount foo.img /mnt -o loop,offset=32256 |
11:47 |
|
or something like that |
11:49 |
dfarning_ |
we can revisit this later |
11:49 |
|
is there more on the angenda |
11:49 |
bernie |
dfarning_: I'll try to consolidate a little bit and document a little more, but really there's not much we can do at this time to simplify VM management beyond a certain point |
11:50 |
|
the best we could do is NOT create any more VMs. they are a huge cost in terms of maintenance |
11:50 |
|
dfarning_: ok. let's wrap up for today |
11:51 |
|
dfarning_: we need to talk with dogi, ivan on one side and with rit, eu and transworldix on the other |
11:51 |
|
I'm confident we can get one new box for our core services pretty easily at this point. |
11:51 |
dfarning_ |
bernie, I agree -- but I see no alteratative for testing and demonstrating scaling proof of concepts. |
11:52 |
bernie |
I just hope we can make this happen before I leave the country in january. |
11:52 |
dfarning_ |
can I give you two small action items? |
11:53 |
bernie |
dfarning_: test machines are ok. we can have dozens of them, as they don't need backups, monitoring, updates and real security anyway. |
11:53 |
|
dfarning_: I'd tend to limit the number of machines we have in production, just that. |
11:53 |
|
dfarning_: I still have to create 2 more: pootle and dns/ldap |
11:54 |
|
dfarning_: sure go on |
11:54 |
dfarning_ |
ok, Can you give also-more memory? it started to swap |
11:54 |
bernie |
dfarning_: ok, but I'm riding the bus now and abusing mel's connection |
11:54 |
|
dfarning_: can it wait until tonight or tomorrow? |
11:55 |
|
dfarning_: we have enough memory for now. |
11:55 |
|
dfarning_: which aslo machine do you need to expand? aslo-web? |
11:55 |
dfarning_ |
and aslo-web is down |
11:55 |
bernie |
dfarning_: uh-oh |
11:55 |
|
dfarning_: damn it's dog slow |
11:56 |
dfarning_ |
it ran out or memory and crashed this morning -- or got really slow |
11:56 |
bernie |
dfarning_: aslo-web is eating up cpu time. weird |
11:57 |
|
dfarning_: btw aslo-web should not swap because I didn't give it any swap :) |
11:57 |
|
dfarning_: it seems to be frozen the same way I've seen beamrider freeze |
11:57 |
|
dfarning_: beamrider certainly did not run out of memory, so it may be a different issue |
11:58 |
dfarning_ |
Any why - that is it for me for now. |
12:03 |
|
bernie, need to run. |
12:03 |
|
bye |
12:04 |
bernie |
dfarning_: rebooted aslo-web |
12:04 |
|
btw, resource usage on treehouse is still low: |
12:04 |
|
CPU: 66.6% Mem: 19456 MB (19456 MB by guests) |
12:04 |
|
CPU: 14.0% Mem: 19456 MB (19456 MB by guests) |
12:04 |
|
24 R 0 2 4.9 0.0 67:57:00 trixbox |
12:04 |
|
35 R 1 6 2.1 3.0 20:38:59 aslo-proxy |
12:04 |
|
48 R 0 0 861 844 1.6 6.0 0:39.93 aslo-web |
12:04 |
|
8 R 0 0 0.3 0.0 359:59.42 vig |
12:04 |
|
44 R 0 1 180 0 0.2 6.0 214:35.03 aslo-db |
12:04 |
|
42 R 0 1 180 0 0.2 12.0 29:45:14 beamrider |
12:04 |
|
5 R 0 0 0.0 0.0 573:52.21 meeting |
12:04 |
|
- (template-karmic) |
12:04 |
|
ID S RDRQ WRRQ RXBY TXBY %CPU %MEM TIME NAME |
12:05 |
|
going offline |
12:05 |
|
over |
12:05 |
dfarning_ |
bernie, do you have too #MMETING OVER |
15:07 |
bernie |
#endmeeting |