1 |
3 |
xianfeng |
|
2 |
|
|
Date : 2004-Nov-26
|
3 |
|
|
Author: Gerald Schaefer (geraldsc@de.ibm.com)
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
Linux API for read access to z/VM Monitor Records
|
7 |
|
|
=================================================
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
Description
|
11 |
|
|
===========
|
12 |
|
|
This item delivers a new Linux API in the form of a misc char device that is
|
13 |
|
|
useable from user space and allows read access to the z/VM Monitor Records
|
14 |
|
|
collected by the *MONITOR System Service of z/VM.
|
15 |
|
|
|
16 |
|
|
|
17 |
|
|
User Requirements
|
18 |
|
|
=================
|
19 |
|
|
The z/VM guest on which you want to access this API needs to be configured in
|
20 |
|
|
order to allow IUCV connections to the *MONITOR service, i.e. it needs the
|
21 |
|
|
IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is
|
22 |
|
|
restricted (likely), you also need the NAMESAVE statement.
|
23 |
|
|
This item will use the IUCV device driver to access the z/VM services, so you
|
24 |
|
|
need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1.
|
25 |
|
|
|
26 |
|
|
There are two options for being able to load the monitor DCSS (examples assume
|
27 |
|
|
that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the
|
28 |
|
|
location of the monitor DCSS with the Class E privileged CP command Q NSS MAP
|
29 |
|
|
(the values BEGPAG and ENDPAG are given in units of 4K pages).
|
30 |
|
|
|
31 |
|
|
See also "CP Command and Utility Reference" (SC24-6081-00) for more information
|
32 |
|
|
on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning
|
33 |
|
|
and Administration" (SC24-6116-00) for more information on DCSSes.
|
34 |
|
|
|
35 |
|
|
1st option:
|
36 |
|
|
-----------
|
37 |
|
|
You can use the CP command DEF STOR CONFIG to define a "memory hole" in your
|
38 |
|
|
guest virtual storage around the address range of the DCSS.
|
39 |
|
|
|
40 |
|
|
Example: DEF STOR CONFIG 0.140M 200M.200M
|
41 |
|
|
|
42 |
|
|
This defines two blocks of storage, the first is 140MB in size an begins at
|
43 |
|
|
address 0MB, the second is 200MB in size and begins at address 200MB,
|
44 |
|
|
resulting in a total storage of 340MB. Note that the first block should
|
45 |
|
|
always start at 0 and be at least 64MB in size.
|
46 |
|
|
|
47 |
|
|
2nd option:
|
48 |
|
|
-----------
|
49 |
|
|
Your guest virtual storage has to end below the starting address of the DCSS
|
50 |
|
|
and you have to specify the "mem=" kernel parameter in your parmfile with a
|
51 |
|
|
value greater than the ending address of the DCSS.
|
52 |
|
|
|
53 |
|
|
Example: DEF STOR 140M
|
54 |
|
|
|
55 |
|
|
This defines 140MB storage size for your guest, the parameter "mem=160M" is
|
56 |
|
|
added to the parmfile.
|
57 |
|
|
|
58 |
|
|
|
59 |
|
|
User Interface
|
60 |
|
|
==============
|
61 |
|
|
The char device is implemented as a kernel module named "monreader",
|
62 |
|
|
which can be loaded via the modprobe command, or it can be compiled into the
|
63 |
|
|
kernel instead. There is one optional module (or kernel) parameter, "mondcss",
|
64 |
|
|
to specify the name of the monitor DCSS. If the module is compiled into the
|
65 |
|
|
kernel, the kernel parameter "monreader.mondcss=" can be specified
|
66 |
|
|
in the parmfile.
|
67 |
|
|
|
68 |
|
|
The default name for the DCSS is "MONDCSS" if none is specified. In case that
|
69 |
|
|
there are other users already connected to the *MONITOR service (e.g.
|
70 |
|
|
Performance Toolkit), the monitor DCSS is already defined and you have to use
|
71 |
|
|
the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name
|
72 |
|
|
of the monitor DCSS, if already defined, and the users connected to the
|
73 |
|
|
*MONITOR service.
|
74 |
|
|
Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor
|
75 |
|
|
DCSS if your z/VM doesn't have one already, you need Class E privileges to
|
76 |
|
|
define and save a DCSS.
|
77 |
|
|
|
78 |
|
|
Example:
|
79 |
|
|
--------
|
80 |
|
|
modprobe monreader mondcss=MYDCSS
|
81 |
|
|
|
82 |
|
|
This loads the module and sets the DCSS name to "MYDCSS".
|
83 |
|
|
|
84 |
|
|
NOTE:
|
85 |
|
|
-----
|
86 |
|
|
This API provides no interface to control the *MONITOR service, e.g. specify
|
87 |
|
|
which data should be collected. This can be done by the CP command MONITOR
|
88 |
|
|
(Class E privileged), see "CP Command and Utility Reference".
|
89 |
|
|
|
90 |
|
|
Device nodes with udev:
|
91 |
|
|
-----------------------
|
92 |
|
|
After loading the module, a char device will be created along with the device
|
93 |
|
|
node //monreader.
|
94 |
|
|
|
95 |
|
|
Device nodes without udev:
|
96 |
|
|
--------------------------
|
97 |
|
|
If your distribution does not support udev, a device node will not be created
|
98 |
|
|
automatically and you have to create it manually after loading the module.
|
99 |
|
|
Therefore you need to know the major and minor numbers of the device. These
|
100 |
|
|
numbers can be found in /sys/class/misc/monreader/dev.
|
101 |
|
|
Typing cat /sys/class/misc/monreader/dev will give an output of the form
|
102 |
|
|
:. The device node can be created via the mknod command, enter
|
103 |
|
|
mknod c , where is the name of the device node
|
104 |
|
|
to be created.
|
105 |
|
|
|
106 |
|
|
Example:
|
107 |
|
|
--------
|
108 |
|
|
# modprobe monreader
|
109 |
|
|
# cat /sys/class/misc/monreader/dev
|
110 |
|
|
10:63
|
111 |
|
|
# mknod /dev/monreader c 10 63
|
112 |
|
|
|
113 |
|
|
This loads the module with the default monitor DCSS (MONDCSS) and creates a
|
114 |
|
|
device node.
|
115 |
|
|
|
116 |
|
|
File operations:
|
117 |
|
|
----------------
|
118 |
|
|
The following file operations are supported: open, release, read, poll.
|
119 |
|
|
There are two alternative methods for reading: either non-blocking read in
|
120 |
|
|
conjunction with polling, or blocking read without polling. IOCTLs are not
|
121 |
|
|
supported.
|
122 |
|
|
|
123 |
|
|
Read:
|
124 |
|
|
-----
|
125 |
|
|
Reading from the device provides a 12 Byte monitor control element (MCE),
|
126 |
|
|
followed by a set of one or more contiguous monitor records (similar to the
|
127 |
|
|
output of the CMS utility MONWRITE without the 4K control blocks). The MCE
|
128 |
|
|
contains information on the type of the following record set (sample/event
|
129 |
|
|
data), the monitor domains contained within it and the start and end address
|
130 |
|
|
of the record set in the monitor DCSS. The start and end address can be used
|
131 |
|
|
to determine the size of the record set, the end address is the address of the
|
132 |
|
|
last byte of data. The start address is needed to handle "end-of-frame" records
|
133 |
|
|
correctly (domain 1, record 13), i.e. it can be used to determine the record
|
134 |
|
|
start offset relative to a 4K page (frame) boundary.
|
135 |
|
|
|
136 |
|
|
See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description
|
137 |
|
|
of the monitor control element layout. The layout of the monitor records can
|
138 |
|
|
be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html
|
139 |
|
|
|
140 |
|
|
The layout of the data stream provided by the monreader device is as follows:
|
141 |
|
|
...
|
142 |
|
|
<0 byte read>
|
143 |
|
|
\
|
144 |
|
|
|
|
145 |
|
|
... |- data set
|
146 |
|
|
|
|
147 |
|
|
/
|
148 |
|
|
<0 byte read>
|
149 |
|
|
...
|
150 |
|
|
|
151 |
|
|
There may be more than one combination of MCE and corresponding record set
|
152 |
|
|
within one data set and the end of each data set is indicated by a successful
|
153 |
|
|
read with a return value of 0 (0 byte read).
|
154 |
|
|
Any received data must be considered invalid until a complete set was
|
155 |
|
|
read successfully, including the closing 0 byte read. Therefore you should
|
156 |
|
|
always read the complete set into a buffer before processing the data.
|
157 |
|
|
|
158 |
|
|
The maximum size of a data set can be as large as the size of the
|
159 |
|
|
monitor DCSS, so design the buffer adequately or use dynamic memory allocation.
|
160 |
|
|
The size of the monitor DCSS will be printed into syslog after loading the
|
161 |
|
|
module. You can also use the (Class E privileged) CP command Q NSS MAP to
|
162 |
|
|
list all available segments and information about them.
|
163 |
|
|
|
164 |
|
|
As with most char devices, error conditions are indicated by returning a
|
165 |
|
|
negative value for the number of bytes read. In this case, the errno variable
|
166 |
|
|
indicates the error condition:
|
167 |
|
|
|
168 |
|
|
EIO: reply failed, read data is invalid and the application
|
169 |
|
|
should discard the data read since the last successful read with 0 size.
|
170 |
|
|
EFAULT: copy_to_user failed, read data is invalid and the application should
|
171 |
|
|
discard the data read since the last successful read with 0 size.
|
172 |
|
|
EAGAIN: occurs on a non-blocking read if there is no data available at the
|
173 |
|
|
moment. There is no data missing or corrupted, just try again or rather
|
174 |
|
|
use polling for non-blocking reads.
|
175 |
|
|
EOVERFLOW: message limit reached, the data read since the last successful
|
176 |
|
|
read with 0 size is valid but subsequent records may be missing.
|
177 |
|
|
|
178 |
|
|
In the last case (EOVERFLOW) there may be missing data, in the first two cases
|
179 |
|
|
(EIO, EFAULT) there will be missing data. It's up to the application if it will
|
180 |
|
|
continue reading subsequent data or rather exit.
|
181 |
|
|
|
182 |
|
|
Open:
|
183 |
|
|
-----
|
184 |
|
|
Only one user is allowed to open the char device. If it is already in use, the
|
185 |
|
|
open function will fail (return a negative value) and set errno to EBUSY.
|
186 |
|
|
The open function may also fail if an IUCV connection to the *MONITOR service
|
187 |
|
|
cannot be established. In this case errno will be set to EIO and an error
|
188 |
|
|
message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER
|
189 |
|
|
codes are described in the "z/VM Performance" book, Appendix A.
|
190 |
|
|
|
191 |
|
|
NOTE:
|
192 |
|
|
-----
|
193 |
|
|
As soon as the device is opened, incoming messages will be accepted and they
|
194 |
|
|
will account for the message limit, i.e. opening the device without reading
|
195 |
|
|
from it will provoke the "message limit reached" error (EOVERFLOW error code)
|
196 |
|
|
eventually.
|
197 |
|
|
|