jump to navigation

Measuring Cache miss August 25, 2008

Posted by Prabakaran Thirumalai in cache.
Tags: , ,
1 comment so far

cachegrind tool helps programmers to quantify and understand the cache behavior of programs and algorithms. we shall modify our code to be more cache-friendly and thereby make it run faster.

The following example demonstrates how to measure cache misses for L1 and L2 cache on Linux platform. Consider the following program which initializes two dimensional array (test1.c)

#include<stdio.h>
int array[10000][10000];
int setValue1()
{
int i,j;
for (i=0;i<10000;i++)
for (j=0;j<10000;j++)
array[j][i]=0;
}
int main()
{
setValue1();
return 0;
}

Compile the program using gcc compiler

$gcc -O2 -o test test1.c

Run the executable generated using the valgrind tool

$valgrind –tool=cachegrind ./test
==2605== Cachegrind, an I1/D1/L2 cache profiler.
==2605== Copyright (C) 2002-2007, and GNU GPL’d, by Nicholas Nethercote et al.
==2605== Using LibVEX rev 1732, a library for dynamic binary translation.
==2605== Copyright (C) 2004-2007, and GNU GPL’d, by OpenWorks LLP.
==2605== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
==2605== Copyright (C) 2000-2007, and GNU GPL’d, by Julian Seward et al.
==2605== For more details, rerun with: -v
==2605==
==2605==
==2605== I refs: 500,153,442
==2605== I1 misses: 547
==2605== L2i misses: 545
==2605== I1 miss rate: 0.00%
==2605== L2i miss rate: 0.00%
==2605==
==2605== D refs: 100,048,658 (33,484 rd + 100,015,174 wr)
==2605== D1 misses: 100,000,803 ( 629 rd + 100,000,174 wr)
==2605== L2d misses: 6,260,759 ( 589 rd + 6,260,170 wr)
==2605== D1 miss rate: 99.9% ( 1.8% + 99.9% )
==2605== L2d miss rate: 6.2% ( 1.7% + 6.2% )
==2605==
==2605== L2 refs: 100,001,350 ( 1,176 rd + 100,000,174 wr)
==2605== L2 misses: 6,261,304 ( 1,134 rd + 6,260,170 wr)
==2605== L2 miss rate: 1.0% ( 0.0% + 6.2% )

It displays the access patterns with respect to Level 1 and Level 2 caches which includes the total instruction references(I) and total data references(D), Level 1 instruction misses(I1) and data misses(D1) and L2 instruction(L2i) and data misses(L2d).
From the above output, we could see D1 miss rate is 99.9% and D2 miss rate is 6.2%. There were about 100,048,658 references to L1 cache out of which 100,000,803 references are misses in L1 cache.

The above output is summary for the whole program, When we run the cachegrind tool, it also generates a file named cachegrind.out with suffix as pid of the process. This file contains L1 and L2 cache information for each and every function in your program. Using this we can figure out which function is the candidate for optimization.
For the above run, cachegrind.out.2605 file contains the following information

desc: I1 cache: 32768 B, 64 B, 8-way associative
desc: D1 cache: 32768 B, 64 B, 8-way associative
desc: L2 cache: 2097152 B, 64 B, 8-way associative
cmd: ./a.out
events: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
fl=???
fn=(below main)
0 64 4 4 19 0 0 26 0 0
fn=???
0 1037 14 12 956 3 1 34 1 1
…….
fn=sbrk
0 25 2 2 9 0 0 5 0 0
fn=setValue1
0 500060005 0 0 2 1 1 100000001 100000000 6260000
fn=strcmp
0 9647 1 1 3161 14 11 0 0 0
fn=strlen
0 657 3 3 103 2 2 0 0 0
fn=strsep
0 711 2 2 241 0 0 7 0 0
fn=uname
0 8 2 2 2 0 0 0 0 0
fn=version_check_doit
0 19 2 2 7 0 0 5 0 0
summary: 500153442 547 545 33484 629 589 100015174 100000174 6260170


To understand the output, we should know what each of these values represent.
fn=sbrk means the function is sbrk() and values below that are in the format “
events Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw”

Ir Instructions Executed
I1mr L1 Instruction Cache read misses
I2mr L2 Instruction Cache read misses
Dr Total Memory Reads
D1mr L1 Data Cache read misses
D2mr L2 Data Cache read misses
Dw Total Memory Writes
D1mw L1 Data Cache write misses
D2mw L2 Data Cache write misses

For setValue1() function, 500060005 Instructions were executed and there were no instruction misses both in L1 and L2 cache. 100000001 memory writes and out of which 100000000 were L1 data misses and 6260000 were L2 data misses.

Further Reading

http://www.valgrind.org