Quantcast

Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

abramson
I’m looking to get performance counters (i.e. perf/PAPI, etc.) working on my Overo Fire.  This *should* be relatively simple, but there are a few kernel hacks that need to happen, and I haven’t been able to get the right things working on Yocto.  I *had* one Overo that came with a Sakoman kernel (2.6.34) in the NAND from factory (that had kernel counters enabled), but for whatever reason that one won’t boot anymore (don’t ask).  The other 4 COMs that I have have a stock Angstrom 2.6.36 install with the counters disabled.  I’ve tried a few Yocto kernels and a few sakoman kernels on SD cards, but nothing seems to be working.

My idea scenario would be a console (or even X) NAND image with the counters enabled and the kernel modded so I can use the appropriate assembly instructions in user space.  But at this point, I’ll take anything.

I’ve seen posts like this: http://gumstix.8.x6.nabble.com/Overo-access-to-quot-nonsecure-privileged-quot-mode-td654078.html, and tried to follow these instructions: http://www.cosic.esat.kuleuven.be/publications/article-2166.pdf (which are step by step, but Bitbake runs in to some errors).

Can anyone give me a few pointers?  I’ve been banging my head against this for a few weeks and I really need to make some progress here.


Thanks SO much!


Jeremy
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
gumstix-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gumstix-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

Scott Ellis
I added perf and oprofile to a console image for the Overo.

I've never used either tool before, maybe you could try it out
and let me know if they are behaving the way you expect.

Some binaries for a 3.5 kernel system are here

http://www.pansenti.com/overo/

The yocto meta layer to build yourself is here

https://github.com/Pansenti/meta-pansenti

From the oprofile tools, iperf isn't supported for the platform,
but opcontrol, opreport, etc...  seem to be okay.

Some oprofile output
====================

root@overo:~# opcontrol --no-vmlinux

root@overo:~# opcontrol --start
Using 2.6+ OProfile kernel interface.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.

root@overo:~# opcontrol --stop
Stopping profiling.

root@overo:~# opreport
Using /var/lib/oprofile/samples/ for samples directory.
CPU: CPU with timer interrupt, speed 1e+06 MHz (estimated)
Profiling through timer interrupt
          TIMER:0|
  samples|      %|
------------------
     9824 94.2169 no-vmlinux
      174  1.6687 libglib-2.0.so.0.3400.3
      161  1.5441 libpthread-2.17.so
       95  0.9111 libcrypto.so.1.0.0
       94  0.9015 libQtCoreE.so.4.8.4
       28  0.2685 libc-2.17.so
       16  0.1534 bash
       14  0.1343 ld-2.17.so
       14  0.1343 libSyntroLib.so.0.9.2
        2  0.0192 SyntroLCam
        1  0.0096 libproc-3.2.8.so
        1  0.0096 libtinfo.so.5.9
        1  0.0096 oprofiled
        1  0.0096 libQtNetworkE.so.4.8.4
        1  0.0096 sshd


perf seems to be having some problems keeping up with C++
project compile (about 5 minutes worth)
====================
root@overo:~/SyntroCore/SyntroLib# perf record -f -g -e kmem:mm_page_alloc -c 1 make

... around 5 minutes of g++ build output

[ perf record: Woken up 69 times to write data ]
[ perf record: Captured and wrote 17.151 MB perf.data (~749338 samples) ]
Warning:
Processed 204564 events and lost 2 chunks!

Check IO/CPU overload!

root@overo:~/SyntroCore/SyntroLib# perf report -g

Samples: 203K of event 'kmem:mm_page_alloc', Event count (approx.): 203810
   86.10%   cc1plus  [unknown]  [.] 00000000
    8.77%        as  [unknown]  [.] 00000000
    2.58%        ld  [unknown]  [.] 00000000
    2.03%       moc  [unknown]  [.] 00000000
    0.33%       g++  [unknown]  [.] 00000000
    0.08%      make  [unknown]  [.] 00000000
    0.05%  collect2  [unknown]  [.] 00000000
    0.04%        ln  [unknown]  [.] 00000000
    0.01%        rm  [unknown]  [.] 00000000

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

abramson
Scott,
     First off, let me say THANK YOU SO MUCH for taking the time to respond, and build those images.  Unfortunately, they're not exactly what I had in mind.  I should have been more specific; the issue isn't *installing* perf or oprofile, but rather enabling user-space access to the kernel program counters in the PMU.  

Dmesg reports:
[    3.250610] oprofile: hardware counters not available

Which means the counters have not been enabled in userspace.  These are the hardware specific profiling counters that can count things like branch prediction misses, L2 cache misses, TLB hits, etc.

It's my understanding it's a minor tweak to allow this to happen; it's just a matter of issuing a command like the following in kernel mode (either at init, or as a kernel module):

asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

(There may be one or two other tweaks -- for example, to perf_events_paranoid -- but I'm not entirely sure).  

My problem is I'm not exactly sure how to do the above command in kernel space.  

There's similar instructions here: http://stackoverflow.com/questions/3247373/how-to-measure-program-execution-time-in-arm-cortex-a8-processor/3250835#3250835 but, as I said, I couldn't really get that to work.  I know this is possible -- even simple! -- but I haven't had any luck so far.

At any rate, I'm going to see about digging around with a Yocto build to do this from scratch, I'm just not entirely sure where to stick the assembly code above so it gets executed in kernel space at boot.

Any [more] help would be great appreciated.  As it stands, I appreciate your time!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

Scott Ellis
Okay.

Do you just want a kernel module as described here?

http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html

Code here

https://github.com/thoughtpolice/enable_arm_pmu

That's easy enough to try out if you want me to build it for you.

BTW, I'm pulling perf and oprofile from those images I posted.

I wasn't thinking when I did it on my [master] branch.

I had to patch a line in the overo 3.5 kernel source to get the perf recipe
to build. And then I added the perf package to my console image.
But that went and broke all the other platforms where I use the same
console image but with different unpatched kernels.

It's trivial to add back on another branch if needed.

oprofile can be loaded as a package later anytime. But if it doesn't
work without a kernel change, there is no sense keeping it in those
default images.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

Scott Ellis
I copied a build of Austin Seipp's driver here

http://www.pansenti.com/overo/enable_arm_pmu.ko

It loads at runtime with insmod. I don't know if it does what
you want. Can you test it?

It was built against the same 3.5 kernel posted earlier.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

abramson
Scott,


THANK YOU SO MUCH.

I've been incommunicado for a bit because I've been working with the module you created.  Again, thank you.

It works on a "standard" (whatever that means) Yocto install I had lying around on an SD card.  I don't know if it works on the NAND-based Angstrom distribution that the Overos ship with, but that's probably not a huge issue right now.  

A couple of things:

1).  Is there any way of getting this to run at a lower run level, so it's instantiated early enough so oprofile (and seemingly PAPI) work?  I've tried running papi_avail (which reports which H/W counters are available) and it doesn't report any, even with the module loaded at the login prompt.  Presumably if it was loaded early, it might work?

(This isn't a huge issue, as I've taken to just writing the assembly to access the registers directly, but it might be nice to have a library at some point, especially because of #2)

2).  Everything seems to be running how I expect with the exception of getting the user-define counters to count anything.  My guess is that I am somehow not loading the appropriate event code.  I can read the cycle counter, set up the configuration (i.e. enable the user counters), change things like the Div/64 flag, and even write to the user-configurable counters (which is too bad, my life would be a lot easier if they were read only), but whenever I assign an event to them, they always report 0.  This holds even for events that should be nonzero, like cycle counts or PC changes or writes to memory.

Perhaps you or someone else can help me sort out what's going wrong?

Here are the relevant parts of my code:

***

        int value = 1;
       
        // peform reset:  
        if (do_reset) {
                value |= 2;     // reset all counters to zero.  
                value |= 4;     // reset cycle counter to zero.
        }
       
        if (enable_divider)
                value |= 8;     // enable "by 64" divider for CCNT.

        value |= 16;
        // program the performance-counter control-register with mask constructed above
        asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  
        // enable all counters:  
        asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  

        // clear overflows:
        asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x80000001));

        // Select individual counter (0)
        asm volatile ("MCR p15,   0,    %0,  c9  ,   c12 ,   5\t\n":: "r"(0x00));

        // Write event (0x11 = Cycle count)
        asm volatile ("MCR p15,   0,    %0,  c9  ,   c13 ,   1\t\n":: "r"(0x11));

...
...
...

        unsigned int output;

        // Read current event counter
        asm volatile ("MRC p15,   0,    %0,  c9  ,   c13 ,   2\t\n": "=r"(output));
    printf("Event count 0: %d\n", output);


***

It still reports 0.  I'm completely baffled by this.  I have a number of debug statements where I read appropriate things from the registers (for example, it reads back 17 or 11 in hex when I read back the event number from the appropriate register).  The counters read as enabled, and as I said, I can write an explicit value to them and then read that back, but they just don't...count.

I hope someone can provide some assistance here, but either way, you've been instrumental in my progress, and I thank you so much.  Eventually, it might be nice to have the appropriate kernel sources and be able to compile the kernel mod on my own (or have a pre-baked kernel that had these enabled) but for now, this will do, presuming I can get those user-configurable counters to work.


Thanks again!

 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

Scott Ellis
1) I can try putting the code from that kernel module into the kernel board file so it gets loaded as early as possible.
Busy today, but tomorrow I can do it.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

Scott Ellis
Jeremy,

Try the binaries I posted here

http://pansenti.com/overo-perf

I enabled some PMU options in the kernel config and added that code
to enable PMU access from userland in the board file.

A boot log is here

https://gist.github.com/scottellis/8523696

Significant differences

root@overo:~# dmesg | grep perf
[    0.040252] Initializing cgroup subsys perf_event
[    0.040618] hw perfevents: enabled with ARMv7 Cortex-A8 PMU driver, 5 counters available
root@overo:~# dmesg | grep profile
[    3.262268] oprofile: using arm/armv7
root@overo:~# dmesg | grep PMU
[    0.040618] hw perfevents: enabled with ARMv7 Cortex-A8 PMU driver, 5 counters available
[    0.053894] Enabling user-mode PMU access

The 'Enabling user-mode PMU access' is the function I added to the board file.
The other changes are the result of kernel config options. I'll post those if
it works.

The console image above does include the perf and oprofile packages.

root@overo:~# opkg list-installed | grep perf
iperf - 2.0.4-r0
perf - 3.5.7-r8
root@overo:~# opkg list-installed | grep oprofile
oprofile - 0.9.8-r2.3

I did not test anything.  (I don't even know how at this point ;-)

So let me know if it works.

Scott


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hardware performance counters (oprofile / perf / PAPI) on Gumstix Overo Fire + Tobi COM (ARM Cortex A8)?

abramson
This post was updated on .
Scott,
     I just realized I don't think I ever responded to your [amazingly helpful, considerate, wonderful] post, and for that I apologize!  

I wanted to say that the image worked perfectly!  My work actually evolved into u-boot space, which is sort of ironic, as PMUs are trivially enabled there, but I spent a lot of valuable time in userspace testing/debugging things, and your image was a HUGE benefit there.  So thank you!

I *think* I have the image lying around in a tarball someplace -- it's certainly on a SD card -- but I don't exactly remember what it's called.  The link you posted no longer works.  Do you still the image (or perhaps, what you named it?)  I have a colleague who's debugging some checksums in user space, and he could really use it!

But thanks again.  I banged my head against the wall of trying to get those active in user space, and I couldn't have done it without you!

****

EDIT: I just think I found the image.  It was called pansenti-console-image-overo.tar, with the corresponding u-boot, uImage and MLO.  Hopefully that's right!
Loading...