#olpc-meeting, 2011-11-14

21:06 meeting Meeting started Mon Nov 14 21:06:56 2011 UTC. The chair is Cerlyn. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:07 Cerlyn #meetingtopic OLPC 1.75 triage meeting 2011-11-14
21:07 #chair martin_xsa
21:07 meeting Current chairs: Cerlyn martin_xsa
21:08 martin_xsa don't sit on me
21:08 martin_xsa ok - agenda is to go quickly over dev.laptop.org/1.75
21:09 martin_xsa blockers and high pri items, and figure out if we need to file additional tasks
21:09 it's our chance to make sure we don't forget about stuff
21:10 so #10835 Implement power management code for MMP2
21:10 Quozl there's 115 tickets there, it will be hard not to forget about stuff.
21:10 martin_xsa from smithbone's email, it sounds like I could stop using wmb's branch
21:11 Cerlyn #topic blockers and high pri items
21:11 martin_xsa dilinger: does arm-3.0-wip s/r reasonably well?
21:11 smithbone martin_xsa: I hope so. I defer to dilinger's decision.
21:11 dilinger martin_xsa: i hope so. :)
21:11 smithbone The kernel he gave me ran on our bench on a 201 for 3+ days.
21:12 dilinger i tested a few variations on that kernel, but not that kernel exactly
21:12 martin_xsa ok, then I'll build from there.
21:12 dilinger regardless, i'm missing the point of testing the wmb branch
21:12 martin_xsa any further questions for marvell? or code we want released?
21:13 dilinger there are pending questions for marvell that they haven't answered yet (about the tlb flushes)
21:13 martin_xsa agreed - any questinos not asked yet? :-)
21:13 question_queue_flush()
21:14 dilinger not at the moment
21:14 i tested their exact sram code over the weekend
21:14 martin_xsa #10893 XO-1.75 - runin tests
21:14 dilinger on 2 machines, i reached 10,000 cycles
21:14 the other 2 machines crashed early on
21:14 martin_xsa dilinger: hmm, interesting, and that's what's in -wip, so we'll see more reports then.
21:15 right - moving on to #10893 XO-1.75 - runin tests
21:15 dilinger martin_xsa: nah, what's in -wip is different
21:15 -wip adds their TLB stuff, ignores some of their other changes, and keeps mitch's delays
21:16 martin_xsa ok - I'll trust you to pick the right mix
21:16 :-)
21:16 dilinger it's a.. how do you say.. work in progress? ;)
21:16 martin_xsa Quozl: for the runin tests, you have 0.16, does it include thermal?
21:17 Quozl martin_xsa: cforth sets up thermal watchdog.  openfirmware and linux do not dislodge the configuration.  olpc-runin-tests 0.16.0 provides support for CPU temperature logging and (thermal) excursion halt.
21:19 martin_xsa Quozl: right, so does 0.16 include the needed bits to talk to OFW (sdkit) and invoke the thermal tests?
21:19 Quozl martin_xsa: runin has no part in thermal watchdog behaviour.  it is monitoring, and forcing a fail if temperature is above a characterised limit, but we have not characterised that limit yet, so it may be insensitive.
21:20 martin_xsa: olpc-runin-tests 0.16.0 provides support for CPU temperature logging and (thermal) excursion halt.  you asked how it is implemented.  it uses sdkit, which is a userland build of ofw, to read the temperature sensor, because the kernel hasn't got a driver for it yet.
21:20 martin_xsa: there is no invokation of thermal tests in ofw.
21:20 (there are no thermal tests in ofw).
21:21 martin_xsa Quozl: it's ok, I'm asking whether the task is done. Of course it is better to get the description right :-))
21:21 so it's reasonable to set this to "add to build" ?
21:21 Quozl martin_xsa: i lack a specification to check against, sorry.  all i'm going on is task list from quanta.
21:22 martin_xsa: no, it is not reasonable to set #10893 to add-to-build, because #10893 is a metaticket, but #10954 is add-to-build, and that's where temperature monitoring for kernel is tracked.
21:22 martin_xsa great, I hadn't seen 10954
21:24 right, and it is add to build. great.
21:25 thanks. I'll add it to the build ;-)
21:25 #11369 XO-1.75 USB stops working after suspend/resume --
21:25 I guess nobody is working on this one yet -- probably a good candidate after the core S/R is considered stable
21:25 dilinger i haven't looked at that in the past week
21:26 right
21:26 martin_xsa #11379 PS/2 mouse, keyboard wedged after suspend/resume cycle
21:26 pgf: I've downgraded to high this one since we have your workaround
21:27 are there good reasons to block on having the root-cause fix?
21:27 or a better fix?
21:27 pgf not currently.
21:27 Quozl does this prevent a user from typing ahead while the system is suspended?
21:28 pgf ?  you can't do that now.
21:28 in any case, there's a separate bug for first-keystroke-lost
21:29 martin_xsa #11401 ^
21:29 #11395 - Implement Linux Suspend/Resume on MMP2
21:30 already discussed
21:30 #11397 DCON freeze not working under X
21:30 Quozl really?  but this ticket covers X, VT, libertas as well.
21:31 martin_xsa jnettlet has pushed a patch to -wip, and a new dovefb rpm, they'll get into the next build
21:31 sets it to add-to-build
21:32 #11400 Date resets to 1970 each boot
21:32 that looks like a regression over earlier kernels
21:32 pgf it broke when we enabled the 2nd rtc, i think
21:33 there's a config option to tell the kernel which rtc to prefer.  we should try using that to select the external rtc.
21:33 martin_xsa cjb: do you think silbe's fix is the right one? we can add it to the defconfig
21:33 pgf: is rtc1 the external one?
21:33 pgf the internal rtc doesn't actually need to have the correct time.
21:33 it must be, since rtc0 is internal.  :-)
21:33 Quozl is the discovery order predictable?
21:33 dilinger pgf: hm.  is there a runtime option, or is that compile-time only?
21:33 pgf all i know about is the config option.
21:34 cjb martin_xsa: if it works, it's a good idea; it involves having one of the two RTCs not knowing what the time is
21:34 Quozl #10899, is the only design we have at the moment.
21:35 martin_xsa cjb: looks like the ext one remembers, so if rtc1 is reliably the external one, we're go.
21:35 cjb yes
21:36 Quozl test that it doesn't break rtcwake.
21:36 pgf it will, indeed, break rtcwake.
21:36 Quozl we need rtcwake, don't we?
21:36 martin_xsa pgf: what?
21:36 pgf we'll have select the correct device on the rtcwake commandline.
21:36 Quozl then we'll break runin.
21:36 pgf oh -- not if the alarming one is rtc0
21:37 Quozl is the discovery order predictable?
21:37 martin_xsa that's very interesting,  the kernel cmdline says CONFIG_RTC_HCTOSYS_DEVICE
21:37 pgf we may break hwclock.
21:37 dilinger can we have an init script determine which rtc has the correct time (assuming it'll never be 1970 again) and set the remaining RTC devices accordingly?
21:37 martin_xsa pgf: I think we _fix_ hwclock with this :-)
21:37 pgf okay, maybe.
21:38 martin_xsa dilinger: from everything I hear here, rtc0 has zero chance of being right
21:38 it gets powered off with the SoC
21:38 Quozl dilinger: thus trusting openfirmware never to init rtc0.
21:39 openfirmware does have code that uses rtc0 for alarms, but i don't know when it is called.
21:39 martin_xsa ok - sounds like the right fix, I'll apply it, and we'll see if anything breaks. if it does, we'll handle the breakage.
21:39 dilinger martin_xsa: that assumes rtc0 is always rtc0
21:40 martin_xsa but the CONFIG_ variable seems to imply that it'll DTRT and not mess with the alarm clock
21:40 Quozl it merely needs to be tested before being committed.  ;-}
21:40 martin_xsa dilinger: that's true -- Quozl, yes, I'll get that done. I'll test it over a few reboots, removing power along the way.
21:41 and testing rtcwake, etc
21:41 any interesting testing idea, add to ticket :-)
21:41 #11401 keystroke which causes resume is lost
21:42 dilinger i don't think the device ordering is in any ways guaranteed.  as it happens now, the platform device for rtc-mmp gets created early, but that won't necessarily happen when that code is upstream
21:42 pgf rtcwake has precious few clients, so if it needs a "-d rtc1" to work, that'd be okay.
21:42 martin_xsa dilinger: is that viable to name via udev? or am I on drugs?
21:43 Quozl i propose "rtc_volatile" for the internal one.  ;-)
21:43 martin_xsa can we fit spawn_from_hell in there too?
21:43 dilinger martin_xsa: yeah, i think udev should be able to control the name
21:44 pgf they have different names in the sysfs tree already
21:44 Quozl pgf: you said in #10899 that the one outside doesn't have alarms ... that's changed?
21:44 pgf no
21:44 not that i know of
21:44 Quozl pgf: does rtcwake use alarms?
21:44 pgf certainly
21:45 martin_xsa dilinger: tjem we have to sync the defconfig values with the udev values...
21:45 then
21:45 this will be a lot of fun
21:45 pgf this doesn't all have to be solved now.   if the order is consistent today, run with it.
21:46 if it changes when we upstream, we can deal then.
21:46 martin_xsa good point
21:46 martin_xsa in any case, I'll explore it a bit
21:46 Next! #11401 keystroke which causes resume is lost
21:46 pgf no progress
21:47 we have some ideas, but don't know if any of them will fly.
21:47 martin_xsa is this a blocker for this stage?
21:48 Quozl i don't know the blocker criteria.
21:48 pgf it's not a great user experience, as you'll find out when aggressive s/r comes to your laptop.
21:48 what's "this stage"?
21:48 martin_xsa current blocker criteria is: it blocks C2 runin
21:48 Quozl i don't know if quanta will whinge about it.
21:48 it does not block C2 runin.
21:48 pgf right
21:48 martin_xsa exactly -- Quanta, runin, tests on mfg line
21:49 Quozl it may cause Quanta's QA team to raise it because it is different, but not if we don't tell them.  ;-)
21:49 martin_xsa I'll demote it to high -- naturally it's a blocker to a shipping OS
21:49 pgf: we don't think it's got a hw component, right?
21:50 Quozl #11369 is not a blocker for C2 runin, since USB is not tested by runin.
21:50 pgf not specifically, no.   not any more than every other s/r related bug.
21:51 Quozl #11397 is not a blocker for C2 runin, since nobody is meant to watch the screen.
21:51 #11400 would impact C2 runin file datestamps, needs fixing, but the C2 runin tests would pass.
21:52 #11409 is not a blocker for C2 runin, since runin does not test headphone jack.
21:53 martin_xsa Quozl: quanta does look at the units during runin, and will report oddities #11397
21:55 martin_xsa and Quanta does test things and performs basic tests on machines after runin, so #11369 is certain to hold things up if we enable s/r in powerd
21:55 Quozl Of the "high" tickets, the following may impact C2 runin ... #11237, #11357, #11408, #11413, #11416, #11430 (if Quanta don't like it), #11438).
21:55 martin_xsa ok - we are going over them :-)
21:56 it's not the onlything we want to know
21:57 martin_xsa pgf: thanks
21:57 I'll skip audio for now, saadia couldn't join the meeting
21:58 Cerlyn: you reported that #11196 some 1.75 touchpads "wander" in ebook mode
21:58 still happens a bit?
21:58 Cerlyn a bit but not much with local testing
21:58 and that's if you are moving something behind the touchpad in ebook mode
21:59 or moving the XO to effectively do the same
21:59 pgf that's significantly different than originally, right?  it used to wander as soon as you closed the screen against the keyboard.
21:59 martin_xsa ok - anyway, that one's on Quanta's side, and we have olpc-kbdshim masking it
22:00 pgf it's really a vendor acceptance issue.
22:00 Cerlyn pgf: I think that's environmental conditions or potentially sample dependent
22:00 martin_xsa I am not too worried about it, -- agreed with pgf
22:00 Next: #11237 Video driver hang on XO-1.75 os4
22:00 I agree with Quozl that's potentially a blocker
22:00 pgf martin_xsa: i just don't want quanta to say "not a problem" because we've worked around the problem in s/w.
22:01 martin_xsa pgf: absolutely -- we want the tp to be quiet so we don't wake up
22:01 pgf: if the vendor gives us a userland command to shut it up...
22:02 pgf martin_xsa: the already suggested flow-controlling the touchpad in h/w.
22:02 s/the/they
22:03 it's not out of the question, i guess.
22:03 martin_xsa pgf: but the window on hw changes is closed, unless we have things wired up so that we can do it?
22:03 pgf ps2 flow control is part of normal operation.  no physical h/w changes needed.
22:03 (i shouldn't have said "in h/w" above.)
22:04 (i meant "at the h/w level")
22:04 martin_xsa pgf: ah, so that'd be on the SP
22:04 Quozl: #11237 not a blocker, it triggers with an accelerated video option we aren't using now
22:04 pgf martin_xsa: or in the EC on the other laptops.  remember that they'll use that pad everywhere.
22:05 martin_xsa pgf: yep
22:05 cjb martin_xsa: are we shipping without any gpu acceleration, then?
22:06 martin_xsa cjb: you're parachuting into it :-)
22:06 cjb hm?
22:06 pgf the flow-control sol'n isn't as easy on the 1 and 1.5, because the EC doesn't have the "i'm an ebook" signal.
22:07 martin_xsa cjb: current criteria is for C2 build & runin, and  #11237 just doesn't meed the blocker crit
22:07 jon promises a fix soon, though
22:07 cjb: any progress on #11353 SD Card cannot detected in OS881? I did some basic tests...
22:08 cjb martin_xsa: does suspend/resume crash if UNSAFE_RESUME's off?
22:08 martin_xsa cjb: I didn't test /that/ :-)
22:09 Quozl pgf: i've just checked the current hardware call issues list, and there is no current entry for CL2 that covers #11196 wander, so please raise it with wad?
22:09 cjb martin_xsa: that would decide whether it's blocker.
22:10 martin_xsa cjb: it crashes
22:10 cjb ok
22:10 pgf Quozl: martin_xsa was driving that process with quanta and the vendor.  they were going to provide new firmware or somesuch.  i assume martin's still on top of it, and won't accept the new h/w until it passes our criteria.
22:10 martin_xsa cjb: last week you said you'd investigate -- any progress on that?
22:10 cjb martin_xsa: no, been too busy with XO-3
22:10 Quozl pgf: 'k.
22:11 martin_xsa pgf, Quozl - yep, it's in the quanta confcall topic
22:11 topic/task list
22:11 Quozl martin_xsa: i must have missed it, i just went through it.
22:12 cjb martin_xsa: what's your deadline for getting the blockers fixed?
22:12 Quozl (several closed items).
22:12 i used the tracking list dated 2011-11-10, you might have another one.
22:12 martin_xsa cjb: right now we're working towards C2 build, so blockers are defined as to whether they block C2
22:12 cjb right, I know
22:13 martin_xsa somethings are high pri just because they just are, and affect us beyond C2
22:13 Quozl now that i know the blocking criteria, things are clearer to me.  it's good.
22:13 martin_xsa I want to have the OS img for C2 1st of Dec, so closing date is Nov 30th
22:14 that's laid out in an email to techteam and devel, with titled 11.3.1 timeline or similar
22:15 cjb: I assume xo-3 is gonna keep you busy++
22:15 cjb Okay, thanks.  I'm ambivalent on whether to commit to this because Scott's currently blocked on me figuring webgl out.  At least there are two weeks rather than a couple of days for it, though.
22:15 (to fix the SD bug, whatever it is)
22:15 martin_xsa cjb: ok
22:16 understood
22:16 cjb thanks.  let's revisit next week, things should be less crazy then.
22:16 martin_xsa ok
22:16 next #11357 Boot freezing on third clock dot
22:17 as Quozl, this is a candidate blocker
22:17 Quozl i regularly see boot freezes on b1 and c1 units here, when i get to try linux.
22:17 martin_xsa yep- I see it lots too
22:18 Quozl they are either third-clock-dot or scsi_wait_scan.
22:18 pgf are you distinguishing between the two, or saying they're the same thing?
22:18 (i think the latter)
22:18 dilinger huh.  i don't think i've ever seen that
22:19 Cerlyn pgf: I believe they are the same thing; and technically the modprobe is blocked on something else and it may not be the module's fault
22:19 Quozl i don't know which is which, sometimes i'm looking at the screen, sometimes i'm not, sometimes i've booted from ok, sometimes i've got a USB to ethernet dongle involved, i just haven't fully diagnosed and separated out those symptoms, i was usually busy on some ofw errand.  ;-}
22:19 martin_xsa dilinger: do you build your psmouse into the kernel?
22:19 cjb I see it a lot on my C1, feel like I hadn't ever seen it pre-C1
22:19 dilinger martin_xsa: nope, i'm using the defconfig
22:20 martin_xsa dilinger: that must be a pain :-/
22:20 pgf dilinger: do you have modules present?
22:20 dilinger though i was using a non-modular kernel for the majority of my s/r testing, up until a few weeks ago.  so maybe that's why
22:21 martin_xsa: it's not too bad; since i've been working on the sram code non-stop, every rebuild only affects vmlinuz
22:21 martin_xsa maybe built-in psmouse changes the order of things just enough that we sidestep this one :-P
22:23 martin_xsa I'll try to see if dsd has any more debugging ideas
22:23 Next #11359 Psmouse kernel panic on shutdown
22:25 This doesn't break runin, but is damn weird - maybe we're trying to talk to AVC touchpads too fast.
22:25 Quozl #11359, no impact to C2 runin phase, since the power button is not pressed twice.
22:25 Cerlyn pgf and further testing suggests it may also happen with S/R cycles
22:26 Quozl #11359, if it happens with suspend only with keyboard or mouse input, then it shouldn't be a concern for runin.
22:26 Cerlyn does not seem to be the most common S/R failure here though
22:27 martin_xsa 11359 is a symptom -- at shutdown time -- of failure to init AVC touchpads
22:28 pgf i wouldn't be surprised if it's gone from s/r, since i disabled mouse resets across s/r.
22:28 (pushed today)
22:28 martin_xsa there's ~4 bugs reporting the symptoms, root cause seems to be squarely in psmouse init process
22:28 ah, good point, maybe now we wiggle it less
22:29 Cerlyn 2 test kernels used for S/R testing have it; once with os10 on the first S/R cycle
22:29 at least in my S/R log library
22:29 martin_xsa let's run quickly over the remaining high bugs -- to see if we have candidates for blockers. Quozl identified a few
22:30 #11396 Implement Linux support for wake-on-lan on MMP2 -- this is waiting for S/R stability
22:31 #11399 [CL2]Press enter many times in write activity and system will hang up -- not a blocker but important -- I hope jnettlet tackles it soon
22:33 #11406 powerd needs to be ported to new "wakeup events queue" kernel backend -- not a blocker -- but important, can we separate out the kernel tasks?
22:33 pgf it's done.  modulo bug fixes.
22:34 martin_xsa pgf: oh! great
22:34 pgf and modulo missing parts, like libertas.
22:34 martin_xsa that's the recent string of commits on -wip?
22:34 pgf parly.  half were that, half were the OLS support.
22:34 martin_xsa yeah, that's what I meant
22:35 pgf also modulo getting it into a build, which i've failed miserably at today.   see mail.
22:35 martin_xsa for powerd?
22:36 ok - I'll look into it, also tmw morning, if you're not around on irc, I'll give you a step-by-step of my steps and rationale
22:36 and the img build will happen tmw anyway :-)
22:37 pgf i'm sure it's easy if one does it frequently.   i'd appreciate a step-by-step.
22:37 my officemates would appreciate me getting that help as well, i'm sure.
22:37 martin_xsa I don't do it frequently, but I've done it a few times... so it'll be very step by step.
22:37 :-)
22:40 updated 11408 to block S/R, and be a blocker too
22:40 same with #11413 XO-1.75 libertas crash on resume
22:42 actually, #11413 doesn't quite make it to a blocker
22:43 dilinger the libertas-crash-on-resume is high on my todo list, post-ram stabilization
22:43 martin_xsa excellent!
22:43 it is odd -- it only crashes if associated
22:44 dilinger if you leave NM running, even if you're not telling it to associate, it'll trigger the bug quickly
22:45 martin_xsa #11416 C1 XO 1.75s may not wake up due to RTC from suspend -- marked as blocker
22:46 #11430 mmp-camera output corrupt after suspend/resume -- hesitant about his one, it's important but won't delay C2 on it
22:46 jon corbet is working on it
22:47 dilinger has the not-wake-up-from-suspend issue happened since pgf's keyboard workaround went in?
22:49 Cerlyn when was that committed?
22:49 martin_xsa it's in os10
22:49 and any kernels we tested on top of os10
22:50 Cerlyn I don't think it's happened since
22:50 but I may have woken some up this morning
22:51 dilinger it may have been a timing problem that was fixed by pgf's commits, then.  the rtc alarm was firing, but due to keyboard/touchpad bugs the system was delayed in getting to sleep.. by the time the system had gone to sleep, the alarm had already fired
22:52 if anyone sees it with os10+, please update the bug w/ dmesg output or something
22:52 martin_xsa dilinger: yeah -- though that can still happen with very short rtcwake calls
22:53 pgf dilinger: good thought.  i can't say for sure that that's not what i've seen.
22:53 martin_xsa methink we should have a guard around rtcwake invokations, that ignores irequests to sleep shorter than a minimum time (that being a safe amount of time to down cleanly)
22:54 pgf the new kernel PM infra supposedly makes it possible to prevent the wakeup-happens-while-suspending race.
22:54 i haven't tried implementing it yet, but i'm working toward it.
22:54 martin_xsa: that won't help, if the system takes longer to get to sleep than just >< this much.
22:55 martin_xsa pgf: you mean new kernel pm infra we are not currently using?
22:56 dilinger martin_xsa: i know mitch was a fan of 2s alarms; i tried to get people to use 5s wakeup alarms.  i think 5s is a good amount of time; if we delay that long on the way down, something is definitely wrong
22:56 martin_xsa linux 3.1 or 3.2?
22:56 dilinger: sure, and as we debug the process, we'll get it shorter
22:56 pgf martin_xsa: i think our current kernel has the necessary support.
22:57 martin_xsa pgf: but we hit this issue quite a bit. if the kernel infra "prevents" this from happening...
22:57 maybe that word doesn't mean what the implementor of this infra thinks it means...
22:58 anyway, later on we'll know now long it takes us to quiesce our system
22:58 pgf martin_xsa: i just said "i haven't tried implementing it yet, but i'm working toward it."
22:58 martin_xsa ah, sorry, there's a way to wire it up. oops, sorry.
23:01 ok - reading down the trac list. I don't see anything that looks likely to block us towards C2
23:03 pgf, dilinger, smithbone, jnettlet, do you have items you'd like to make blockers or discuss?
23:03 and Quozl :-)
23:03 jnettlet nothing new nope.
23:04 smithbone martin_xsa: nope
23:04 pgf nope
23:05 martin_xsa great -- now all we have to do is fix all the bugs discussed beforen next monday.
23:05 meeting adjourned ;-)
23:05 Cerlyn #endmeeting
