Instruction analysis program
|
Instruction analysis program
|
|
|
This application reads in a binary list of instructions, and analyses it with a
|
This application reads in a binary list of instructions, and analyses it with a
|
set of functions looking at various parameters in each instruction.
|
set of functions looking at various parameters in each instruction.
|
|
|
Right now it's not so user friendly. Everything is hardcoded, and only support
|
Right now it's not so user friendly. Everything is hardcoded, and only support
|
for the OR1K instruction set exists.
|
for the OR1K instruction set exists.
|
|
|
It has been written in a way that should allow other instructinos to be added
|
It has been written in a way that should allow other instructinos to be added
|
easily. It remains to be seen how much would be reusable between the sets but
|
easily. It remains to be seen how much would be reusable between the sets but
|
for now, at least it would be easy enough to take the OR1K instruction
|
for now, at least it would be easy enough to take the OR1K instruction
|
analysis functions and drop in a different instruction set.
|
analysis functions and drop in a different instruction set.
|
|
|
The types of information given for OR1K instruction analysis is instruction
|
The types of information given for OR1K instruction analysis is instruction
|
frequency, immediate frequency for each instruction, branch distance value
|
frequency, immediate frequency for each instruction, branch distance value
|
frequency, and register usage frequency. For each instruction, the most common
|
frequency, and register usage frequency. For each instruction, the most common
|
n-tuple sets of instructions, finishing with that instruction, are presented,
|
n-tuple sets of instructions, finishing with that instruction, are presented,
|
for pairs, triples and quadruples. Additionally output is the most common
|
for pairs, triples and quadruples. Additionally output is the most common
|
overall n-tuples.
|
overall n-tuples.
|
|
|
Compile the program with:
|
Compile the program with:
|
|
|
$ make all
|
$ make all
|
|
|
And run a test (it needs the or32-elf- toolchain) with:
|
And run a test (it needs the or32-elf- toolchain) with:
|
|
|
$ make test
|
$ make test
|
|
|
To run the program itself, just give it a binary blob of instructions (usually
|
|
the output of objcopy -O binary).
|
|
|
|
Static analysis:
|
Static analysis:
|
|
|
For instance the Linux kernel ELF for OR1K can be prepared with the following
|
To generate a raw binary representation of the instructions that end up in
|
command:
|
something like the Linux kernel, take the ELF file that results from compilation
|
|
and pass it to or32-elf-objcopy like the following:
|
|
|
$ or32-elf-objcopy -O binary -j .text -S vmlinux vmlinux.text.bin
|
$ or32-elf-objcopy -O binary -j .text -S vmlinux vmlinux.text.bin
|
|
|
It is passed to the program like so, and the output is captured by redirecting
|
Use the -f flag to indicate the input file, and -o to indicate the output file.
|
stdout.
|
|
|
|
$ ./insnanalysis vmlinux.text.bin > vmlinux.insnanalysis
|
$ ./insnanalysis -f vmlinux.text.bin o vmlinux.insnanalysis
|
|
|
Dynamic analysis with binary execution log from or1ksim:
|
Dynamic analysis with binary execution log from or1ksim:
|
|
|
As of revision 202 of the OpenRISC repository, or1ksim is capable of generating
|
As of revision 202 of the OpenRISC repository, or1ksim is capable of generating
|
an execution trace log in binary format, logging each instruction executed.
|
an execution trace log in binary format, logging each instruction executed.
|
This log file can be given to insnanalysis.
|
This log file can be given to insnanalysis.
|
|
|
In the or1ksim config file ensure the line "exe_bin_insn_log = 1" is in the
|
In the or1ksim config file ensure the line "exe_bin_insn_log = 1" is in the
|
sim section. This will enable the binary instruction logging. The resulting
|
sim section. This will enable the binary instruction logging. The resulting
|
output file is then given to insnanalysis in the same manner as above.
|
output file is then given to insnanalysis in the same manner as above.
|
|
|
Output:
|
Output:
|
|
|
Currently there are only two output formats, human readable string and CSV.
|
Currently there are only two output formats, human readable string and CSV.
|
|
|
The output can be switched between human readable strings and CSV format (ready
|
The output can be switched between human readable strings and CSV format (ready
|
to be imported into a spreadsheet application) by uncommenting one of the
|
to be imported into a spreadsheet application) by uncommenting one of the
|
"#define DISPLAY_" defines in the instruction set header. The program must be
|
"#define DISPLAY_" defines in the instruction set header. The program must be
|
recompiled if this is changed.
|
recompiled if this is changed.
|
|
|
|
Individual instruction analysis:
|
|
|
|
Instead of only breaking the instruction up and recording statistics on an
|
|
opcode basis - the instructions can be tracked in their entireity and statistics
|
|
on the most frequently seen entire instruction presented. Use the -u flag when
|
|
running the program. Note that this will make execution time longer. For a
|
|
binary trace of Linux booting 1.7GB in size, a 2.5GHz Intel Core 2 Duo machine
|
|
took 30 minutes to parse with the -u option.
|
|
|
|
|
TODO:
|
TODO:
|
o Collect and display information about l.j and l.jal instruction immediates
|
o Collect and display information about l.j and l.jal instruction immediates
|
o Add an easy way to switch between human readable and CSV output
|
o Add an easy way to switch between human readable and CSV output
|
o Figure out how to tack this thing onto a simulator (or1ksim maybe) to give
|
o Figure out how to tack this thing onto a simulator (or1ksim for now) to give
|
results of execution when that finishes executing, or just how to get the
|
results of execution when that finishes executing, or just how to get the
|
simulator to output a binary dump of executed instructions to be fed through
|
simulator to output a binary dump of executed instructions to be fed through
|
this
|
this
|
o Add support for a list of binary files to be specified at the command line
|
o Add support for a list of binary files to be specified at the command line
|
o Allow statistics to be collated over different files - this would allow each
|
o Allow statistics to be collated over different files - this would allow each
|
function to be broken out of a library, or application, and in that regard
|
function to be broken out of a library, or application, and in that regard
|
the instruction sequence data would then be accurate for static analysis.
|
the instruction sequence data would then be accurate for static analysis.
|
|
|
|
|
July 24, 2010 - Julius Baxter
|
July 25, 2010 - Julius Baxter
|
|
|