Powermeter for HPC Systems André Filipe Gonçalves Duarte Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisors: Dr. Pedro Filipe Zeferino Tomás Dr. Nuno Filipe Valentim Roma Examination Committee Chairperson: Dr. Nuno Cavaco Gomes Horta Supervisor: Dr. Pedro Filipe Zeferino Tomás Member of the Committee: Dr. Francisco André Corrêa Alegria May 2015
117
Embed
Powermeter for HPC Systems - ULisboa€¦ · Powermeter for HPC Systems André Filipe Gonçalves Duarte Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Powermeter for HPC Systems
André Filipe Gonçalves Duarte
Thesis to obtain the Master of Science Degree in
Electrical and Computer Engineering
Supervisors: Dr. Pedro Filipe Zeferino TomásDr. Nuno Filipe Valentim Roma
Examination Committee
Chairperson: Dr. Nuno Cavaco Gomes HortaSupervisor: Dr. Pedro Filipe Zeferino Tomás
Member of the Committee: Dr. Francisco André Corrêa Alegria
May 2015
Courage and perseverance have a magical talisman, before which difficulties disappear and obstacles
vanish into air.
John Quincy Adams
iii
Acknowledgments
I would like to thank to Professors Pedro Filipe Zeferino Tomas and Nuno Filipe Valentim Roma for
all the support, advices and patience they showed over the past months. Without their guidance and
belief on me, I am sure it would not be possible to end this important stage of my life. I would like to
thank, as well, to Professor Jose Germano for all the useful advices he gave me and also my friends for
always being there even when one thinks they are not. I would like to show my pride and joy for being
part of a group of people who I know will follow me in the rest of my life. Thus, thanks to Joao Pedro
Costa e Castro, Ricardo Filipe Tomas Pires, Goncalo Diogo Gomes Mendes, Goncalo Gouveia Velez
Bidarra Saraiva, Guilherme Costa e Castro and for last but not least to the great Ortonimo (Flavio Jorge
dos Santos Lopes).
I can not end this text without show my highest esteem to my parents and my sister for loving me
so hard and for doing all they could to help me, even when it seemed I did not show appreciation for it.
Thank you, I love you
v
Abstract
The fast pace at which technology has been evolving has led to a significant increase of the amount
of energy that is consumed by nowadays High Performance Computing systems (HPC). Consequently,
it has become highly important to understand how the energy consumption of any given application
changes over time, envisaging the possibility to implement real-time power profiling and resources opti-
mization. The work that was developed in the scope of this thesis describes the design and prototyping
of an acquisition board (and related software API) composed by several Hall sensors and a microcon-
troller. Such board is capable of measuring the amount of power demanded by an HPC system, by
monitoring the current that passes through the several rails of the main Power Supply Unit (PSU) of a
personal computer. For such purpose, a broad set of conditioning modules were studied and imple-
mented, in order to ensure accurate and precise measurements under an ample dynamic range of the
measured signals. In particular, an Automatic Gain Controller (AGC) module was implemented in the
acquisition board, embracing both the analog and digital domains of the measurement procedure. The
results obtained from the experimental evaluation showed that the conceived device is highly suitable
for real-time power profiling of HPC systems under complex workloads, by providing fine-grained mea-
sures of the power consumption over time, hardly attained by other alternative state-of-the-art devices
or systems.
Keywords: High Performance Computing Systems, Energy Consumption, Real-time Power
Profiling, In-situ Measurements, Automatic Gain Control, PIC
vii
Resumo
O ritmo acelerado a que as tecnologias se tem desenvolvido levou a um aumento significativo da
energia consumida pelos sistemas de computacao de alto desempenho (HPC). Consequentemente, e
de extrema importancia perceber como e que o consumo energetico duma aplicacao varia ao longo do
tempo, visando a caracterizacao em tempo real da potencia consumida pelo sistema e a consequente
otimizacao de recursos. O trabalho que foi desenvolvido no ambito desta tese descreve o projeto e
a prototipagem duma placa de aquisicao (e a aplicacao de software associada) composta por varios
sensores de Hall e um microcontrolador. Esta placa e passıvel de medir a potencia requerida por
um sistema HPC, monitorizando a corrente em varias linhas de alimentacao provenientes da fonte de
alimentacao (PSU) de um computador pessoal. Assim, foram estudados e implementados diversos
modulos de acondicionamento, com o intuito de garantir medicoes exatas e precisas sob uma larga
gama dinamica dos sinais medidos. Em particular, foi implementado um modulo de Controlo Automatico
do Ganho (AGC), fazendo a ligacao entre os domınios analogico e digital da placa de aquisicao. Os
resultados experimentais obtidos revelaram que a placa concebida e particularmente adequada para
a caracterizacao em tempo real da potencia consumida por aplicacoes de elevada complexidade em
sistemas HPC, obtendo-se uma precisao de medida ao longo do tempo que dificilmente e alcancado
por outros dispositivos modernos e sistemas do mesmo genero.
Palavras-chave: Sistemas de Computacao de Alto Desempenho, Consumo Energetico,
Caracterizacao da Potencia em Tempo Real, Medicoes In situ, Controlo Automatico do Ganho, PIC
In this chapter, the system architecture of the proposed Powermeter device is presented, together
with a detailed description of its components. Specific aspects of the device are boarded, such AC and
DC signal conditioning, including the design of an Automatic Gain Controller , used to improve ADC’s
dynamic range. Thereby, an analysis of the ADC’s SNR, THD and SFDR is attained, as well as, some
design criteria are defined. In the end, the reader will have a clear idea about how the device samples
and treats data, so accurate measurements are attained.
3.1 System Architecture
ACSensor CPU
EPS12VSensor
GPU/PCI-ESensor
HDDSensor
ADC-10bits
PIC18F4550
HOST
USB
Power Cable
Current Sensor Output
ACWall Socket
RailsSensors
Motherboard Molex Connector
Signal AcquisitionBoard
Powermeter
Figure 3.1: System Architecture
Powermeter is capable of measuring all the available rails coming from the personal computer PSU,
including CPU EPS12V 6/8-pin, PCI-E 6/8-pin and SATA 4/5-pin connectors. Also, the 12 V, 5V and 3.3
V rails from the motherboard connector and the input of the PSU connector (the one directly connected
to the AC power socket) are sensed (see figure 3.1). The Powermeter sensor devices are inserted
directly on the power connectors coming from the PSU by plugging supply-side power cables into input
connectors of the Powermeter and connect the output to wherever the initial connector should plug in
(motherboard, CPU, peripherals and others).
Regarding the sensors, precise low-offset, linear Hall-Effect sensors were used, which convert a
magnetic field generated by a current to a proportional voltage. The Hall IC has a copper conduction
path which has an internal resistance of 1.2 mΩ providing low power losses. The device uses USB
protocol to power the device and to suit communications between it and the host, since USB is a fast,
versatile and reliable interface. Powermeter uses Microchip PIC18F4550 Microcontroller Unit (MCU),
20
inserted in a common demonstration board (PICDEM FS USB), to establish communication between
the host system’s USB and the current sensors on the device. The system communication is interrupt-
driven, simplifying the microcontroller code and decreasing the time overhead, since there is no need by
the host to poll the device to find if there is data to be collected. The used demonstration board, has a 20
MHz Oscillator as an input, which is then used to generate (with PLL’s) the 48 MHz MCU clock (the USB
peripheral also runs with a 48 MHz clock - Full-Speed USB Mode). This microcontroller also includes
timer modules (which were used to generate the necessary time stamps and sampling frequency) and
a 10-bit ADC.
3.2 Signal Acquisition
184550
230
Figure 3.2: Block Diagram of the Hardware Architecture
Figure 3.2 exposes a general view of the proposed acquisition system. The system properly acquires
the output of the DC and AC sensors. The AC acquisition introduces a AGC system, which dynamically
amplifies the input signal, allowing to distinguish small variations of the signal. The system comprises
a active band-pass filter, which reduces noise and only passes the 50 Hz component of the spectrum,
eliminating in the process the offset imposed by the current sensor; and a controller that dictates if
the signal gets amplified or attenuated, using a Programmable Gain Amplifier (PGA). Consequently, by
the usage of this novel approach the ADC’s dynamic range gets improved. Low power-loss (1.2 m Ω)
sensors (ACS712/14) were used to sense AC and DC current of several power rails. These sensors
operate based on Hall-effect principle and output a voltage which is proportional to the magnetic field
generated by the sensed current. The sensors have different sensitivities, depending on the range of
current they are able to sense. For instance, for a 20 A range we have a 100 mV/A sensitivity, whilst for
30 A range, the sensor gives a 66 mV/A output. However, this sensors introduce an offset approximately
equal to V cc/2. As a result and also because we want a good precision in ADC conversions (i.e. as
many filled bits as possible), the signal coming from the sensor has to be conditioned. AC and DC signal
conditioning requires different approaches, as it will be described in the next sections.
21
(a) Hall-Effect Sen-sor
(b) Allegro’sCurrent Sensor
Figure 3.3: Current Sensor. Source: Allegro MicroSystems
3.2.1 DC conditioning
Sensing
Amplification
ACS 712/714±5,±20,±30 [A]
185,100,66 [mV/A]
VCC
VIOUT
FILTER
GND
IP+
IP-
IP+
IP-
8
7
6
5
4
3
2
1
Ip
CF0.01μF
CBYP0.1μF
+5V (Vcc)
RF
RG
C11nF
+
-2
36
7
4
R1
R2
ADC
Power source
Load
LoadVCC
GND
Vref
VampViout
Figure 3.4: DC Conditioning
Figure 3.4 illustrates the DC signal acquisition. In sum, the output generated by the ACS712/14 sen-
sors passes through an amplifying stage before gets acquired by the ADC. For the DC power rails, the
conditioning procedure consists in cancelling the DC offset provided by the ACS712/14 sensor (which
can vary between 2,330 V and 2,5 V) and introduce some gain to the remaining signal, so it can be
successfully acquired by the ADC stage. Thus, the proposed conditioning circuit consists in a subtractor
amplifier stage (figure 3.5).
−
++5VR1
R2
C1
R3
VInVOut
R4
+5V
Figure 3.5: Subtractor Circuit
In equation 3.1, we have the equation corresponding to the output signal, obtained after analysing
the circuit presented in figure 3.5 for DC input voltages (implying ω = 0, the capacitor can be ’seen’ as
22
an open circuit).
VOut = −(VIn − V+)× R4
R3+ V+ (3.1)
Hence, for DC voltages the output gain is constant and equal to G = R4R3 . However, for greater
frequencies, the signal is attenuated at a rate of 20 dB per decade, since the capacitor introduces a pole
in the circuit. Thus, the cut-off frequency can be computed with equation 3.2.
ωH =1
R4C1(3.2)
This acts as an Anti-Aliasing filter, which is required when performing samples acquisition. The V+
voltages can be 2741 or 2848 mV and the gains 4.254 or 4.299, yielding always an output voltage lower
than the +5 V, but higher than 3.5 V for input voltages greater than 2216 mV. This is always true for input
currents greater than zero. Those voltages and respective gains were obtained after a study of the best
configurations to allow a better amplification of the measuring signal, without saturate the ADC’s range.
3.2.2 AC conditioning
By definition, instantaneous electric power is obtained by multiplying the voltage and current:
P (t) = V (t)I(t) (3.3)
For sinusoidal waveforms, this translates into:
V (t) = VM sin(ωt)
I(t) = IM sin(ωt+ φ)
P (t) = VMIM2 [cos (φ)− cos (2ωt+ φ)]
(3.4)
Thus, the power waveform comprehends a time varying component and a constant component. The
constant component, which can be obtained by averaging P (t), corresponds to the effective power
delivered to the load, also called Active Power. The active power can also be obtained resorting to
complex amplitudes analysis and using the complex power. By definition the complex power is given by:
S =1
2VMIMe
jφ =VMIM
2cos(φ) + j
VMIM2
sin(φ) (3.5)
where φ is the angle between voltage and current. The real part of equation 3.5 is the Active Power,
whilst the imaginary part is the Reactive Power, which corresponds to the maximum value of the power
component that oscillates between the mains and the load, resulting from the energy stored on capac-
itors or/and inductors. Using RMS (Root-Mean Squared) amplitudes, the active power is obtained with
equation 3.6.
PActive = VRMSIRMS cos(φ) (3.6)
23
The ratio between the active power and the Aparent Power (VRMSIRMS) is known as power factor
(p.f.) and for sinusoidal waves it corresponds to cos(φ). Equation 3.6 is an important result since it states
that if the RMS amplitudes of voltage and current are known and if the p.f. is also known, then we can
compute the Active Power. Fortunately, modern PSUs have power factor correction and the CORSAIR
TX750 PSU’s manufacturers guarantee a steady power factor of 0.99 [41]. Thus, it is only necessary to
find the amplitude of the sine wave at each cycle to compute the active power, since the power factor
and voltage RMS values are known (p.f. = 0.99 and VRMS = 230 V for European electric power). This,
in fact, means that the power of the AC signal is proportional to the current.
3.3 AGC - Automatic Gain Control
The need to measure signals with a wide dynamic range is quite common in the electronics industry,
but current technology often has difficulty meeting actual system requirements. Weigh-scale systems
typically use load-cell bridge sensors with maximum full-scale outputs of 1 mV to 2 mV. Such systems
may require resolutions on the order of 1,000,000 to 1, which, when referred to a 2-mV input, call
for a high-performance, low-noise, high-gain amplifier and a sigma-delta modulator. While the actual
sensor data typically takes up only a small portion of the input signal range, the system must often
be designed to handle fault conditions. This is exactly the problem of the used current sensors, which
outputs a very low-voltage amplitude signal. Thus, a wide dynamic range, high performance with small
inputs, and quick response to fast-changing signals, are key requirements. These requirements call
for a flexible signal-conditioning block, with low-noise inputs, relatively high gains, and the ability to
dynamically change the gain in response to input level changes without affecting performance, while
still maintaining a wide dynamic range. Existing sigma-delta technology can provide the dynamic range
needed for many applications, but only at the expense of an increase of the operation rate.
This section presents an alternative approach that uses a successive-approximation sampling 10-bit
ADC, combined with an autoranging PGA front end, forming a AGC system (figure 3.6). With gain that
changes automatically based on analog input value, it uses oversampling to increase the dynamic range
of the system to more than 80 dB.
1
1
<</>>
10 512
/
A
Figure 3.6: AGC Structure
24
The band-pass filter, introduced earlier, is useful to reduce the input noise of the system in compari-
son to the input signal and to eliminate the DC component of Hall sensor. However, the current sensor
output signal can be, still, a low amplitude signal and small variations of the amplitude get eliminated
during the process of quantization. Therefore, a automatic gain controller is proposed. This stage is sup-
posed to, along the time, distinguish small variations in the input signal and amplify them, guaranteeing
no loss of the input signal and an improvement of the dynamic range. It is important, for the design,
to know the minimum and maximum amplitude that the sensor output signal can reach over time and
after some experiments, it was determined that it varies between 30 mV and 50 mV. Figure 3.6 shows
a diagram with the major blocks that constitutes this system. Part of this scheme (the analog part) was
done using physical elements like operational amplifiers and digital controlled potentiometers, whilst the
digital part was performed by programming the PIC18F4550.
Starting from the left, the input signal (which is actually the output of the AC current sensor) gets
filtered and amplified by the band-pass filter. Following, and ignoring for now the subtraction node, the
signal is dynamically attenuated or amplified, depending on the range where the signal is. Since the goal
is to always get a signal whose amplitude is almost the full-range of the ADC, if, for instance, the signal
has a low amplitude (lets say 1 V), the AGC amplifies it. However, if it has a amplitude very close or
above the ADC’s full-range (e.g., 4.9 V), then the signal gets attenuated, ensuring all the times that the
signal lies between the full-range of the ADC. It is the PGA block that amplifies or attenuates the input
signal, in conjunction with the controller, which decides whether to amplify, attenuate or keep the current
gain. Because the ADC embedded in the PIC18F4550 microcontroller is unipolar (range between 0 and
5 V), the PGA introduces an offset of V cc2 , shifting the input signal to the middle-range of the ADC. This
guarantees that the input signal can be properly amplified without upper or lower saturation. In addition,
due to the non-linear characteristic of the ADC, this also reduces quantization errors. Afterwards, at
the digital domain, the initial input signal gets recovered, subtracting the DC offset and cancelling the
gain by the usage of arithmetic shifts 1. Then, the signal gets averaged with the last 32 samples, thus
resulting in a 32 point moving average filter, which mathematical formulation is in equation 3.7.
y(n) =1
32
31∑k=0
x(n− k) (3.7)
By using this procedure, it is possible to optimally amplify, even more, the small variations in the
current signal, which occurs over time. The 5-bit DAC is used to convert to the analog domain the digital
value obtained after the average computation. This value has a length of 10 bits, thus only the 5 MSBs
are used. The digital value is recorded in a variable, so it can be summed to the acquired signal to
successfully regenerate the original signal.
The controller system could be accomplished in several ways: one of the alternatives would be to
design a PI (Proportional Integral) controller, which increases or decreases the gain according with a
signal error (that signal would be the maximum voltage we want to be at the input of the ADC). However,
1 The C18 compiler of PIC18F4550 implements multiplications and divisions operations, of any length, that are not supportedby the hardware, by calling library functions. Hence, this kind of operations are very time consuming and are not appropriate fora real-time processing project. Consequently, an amplifier that only assumes gains of powers of 2 was projected. This eases theprocess of recovering the initial signal by performing right or left shifts, according to the gain imposed by the PGA.
25
in this particular case, the controller parameters would change in time, since they are dependent on the
amplitude of the input signal, which keeps varying in time. This would result in different time responses
of the loop. The parameters are also dependent on the sampling frequency, thus for every change
in the sampling frequency (during design project), new parameters may have to be computed. Other
issue is that the computations needed are not adequate for the PIC18F4550’s 8-bit architecture and
would consume much time - a multiplication and division of floats can reach 336 and 2712 clock cycles,
respectively, while arithmetic shifts of integers require, approximately, 20 clock cycles.
(a) On-Off Characteristic
10
.
AndLowpass Filter
1
(b) On-off Controller Block
Figure 3.7: On-off Controller
In order to meet the requirements, another solution was engineered: the Bang-bang controller. The
Bang-bang controller (or also denoted as on-off controller), is a simple and effective solution to this
problem. In this project, an ’on-off’ non-linearity with hysteresis and a dead-zone was implemented (see
figure 3.7).
The controller (figure 3.7(b)) was implemented in the digital domain and changes the gain over time
by controlling a PGA (the schematic of the electronic devices, which form the PGA will be addressed
in section 3.3.2). The controller comprehends three states: one where it increases gain, other where it
decreases it and the the dead-zone, where it maintains the current gain unchanged. The controller also
includes a maximum search algorithm and a low-pass filter: the search for the maximum is necessary
not only to test if the signal amplitude is within the allowable range, but, also, to compute the power
of the AC signal - remember that the power is proportional to the amplitude of the current signal; the
low-pass is used to avoid false gains changes, due to instantaneous fluctuations in the measured signal.
The output of the controller feeds the non-linearity, whose margins were computed taking into account
the variations of the current sensor output (the computation of those margins are revealed in the chapter
of the results 5).
3.3.1 MatLab Simulation
MatLab was used to test the controller. The schematic built in simulink, is based on the diagram in
figure 3.1. For simulation purposes, the used lower and upper margins, of the on-off characteristic, were
T0 = 488 mV and T1 = 1.95 V , respectively.
26
There are, fundamentally, at least two interesting simulation cases:
1. The first case, is the behaviour of the circuit for sinusoidal input with amplitude range between 120
mV and 200 mV;
2. The second case, starts with a very low amplitude signal (100 mV) and jumps suddenly to a very
high amplitude signal (5 V);
Those tests will allow to understand the behavior of the loop and observe if it can properly adjust the
amplitude to fit within the margins of the non-linearity, whether in a case where the input signal is a very
low-voltage signal (first case) or when it is a very high-voltage signal (second case), while providing the
highest gain as possible. It is also interesting to compare the results obtained with an ADC with and
without the interaction of the proposed controller.
Sinusoidal Input: 120-200 mV
With the help of a voltmeter and running some workloads at the PC, it was determined that the output
amplitude of the current sensor varies roughly between 30 mV and 50 mV. For the first test in study, the
controller was subject to a sinusoidal wave signal, which amplitude suddenly increases from 120 mV
to 200 mV. This signal simulates the output signal coming from the current sensor after filtered by the
bandpass filter, which introduces a gain of 4;
0 1 2
−0.2
0
0.2
Time (s)
Amplitude(V
)
Input Sinewave
(a) Input Sinewave
0 1 2−5
−4
−3
−2
−1
0
1
2
3
4
5
Time (s)
Amplitude(V
)
Output Sinewave
(b) Output Sinewave
Figure 3.8: Loop Response
Figure 3.8, demonstrates how the loop reacts to the input signal by amplifying it to a value which
guarantees that the limits of the on-off controller are respected. It must be said that the output plot does
not includes the offset component of 2.5V. This being said, even when the amplitude changes from 120
mV to 200 mV, the loop still amplifies the signal, but with a lower gain, since the signal is higher. In
figure 3.9, a comparison of the output signal, in volts, after being discretized by a common non-ideal
10-bit ADC (in red) without any sort of correction, with the result obtained when using the AGC (in blue)
is provided.
27
0 0.5 1 1.5
−0.2
−0.1
0
0.1
0.2
Amplitude(V
)
Time (s)
Output with AGCOutput without AGC
Figure 3.9: Reconstruction of the Input signal with and without AGC
Inspecting the figure, we can point out the differences in amplitude of the resulted signal. Where
in the case of the conversion with no correction the maximum amplitude of the signal reaches a value
lower than it should (e.g, for an input of 200 mV we only get approximately 170 mV and by using AGC
we get 197 mV). Ideally, we should get back 197 mV. However, besides the non linearity of the ADC, we
also have the quantization error of 12LSB (' 2.44 mV/bit). It is also obvious that with no correction the
signal gets clipped due to the unipolarity of the ADC. This test shows the obvious benefit of the usage
of such approach to sample data, since with it we get more precision and no signal loss.
Sinusoidal Input: 50 mV - 5 V
For the second test, the controller was, again, subject of a sinusoidal wave signal, which amplitude
is between 50 mV and 5 V. The goal of this test, is to conclude if the controller is also able to attenuate
the input signal, if needed, so the output signal meets the pre-conceived specifications.
As the reader can observe in figure 3.10, the loop can successfully correct the signal’s amplitude to
a desired value. It is, also, noticeable a transitory period which occurs in the instant of the instantaneous
change in the amplitude of the input signal from a very low voltage (50 mV) to a very high voltage (5 V).
As before, in the graphic of figure 3.11 we compare the output signal, in volts, after being discretized by
a common non-ideal 10-bit ADC without any sort of correction, with the result obtained when using the
AGC method.
Inspecting the figure, the reader can observe that the low voltage signal was lost when performing
the conversion without the AGC, since the input signal presents a very low amplitude. In spite of there
are some errors in the resulting signal, due to the transition period from a very low voltage signal to a
very high one, only with the AGC the input signal is successfully recovered. All other conclusions, that
were pointed out in the previous test, can be also applied to this case as well.
28
0 1 2−5
−4
−3
−2
−1
0
1
2
3
4
5
Time (s)
Amplitude(V
)Input Sinewave
(a) Input Sinewave
0 1 2−5
−4
−3
−2
−1
0
1
2
3
4
5
Time (s)
Amplitude(V
)
Output Sinewave
(b) Output Sinewave
Figure 3.10: Loop Response
0 2−6
−4
−2
0
2
4
6
Time (s)
Amplitude(V
)
Output With AGCOutput Without AGC
1
Figure 3.11: Reconstruction of the Input signal with and without AGC
3.3.2 Analog Domain Implementation
It is necessary to materialize the blocks introduced earlier, such as the band-pass filter, the PGA and
the subtraction node, by projecting electronic devices that comprise their respective functions. Every
circuit that will be analysed in the following sections belong to the full schematic of the proposed system,
which is provided in figure 3.12. The layout of the final circuit can also be found in the appendix of this
document (please refer to figure A.3 if needed).
29
Figure 3.12: Full Circuit
3.3.2.A Band-pass Filter
In Europe, the electric power is provided with a voltage signal which has 230 VRMS of amplitude
and f = 50 Hz of frequency. However, the offset introduced by the Hall sensor must be eliminated,
while preserving the 50 Hz component of the spectrum. For that reason, a band-pass filter was chosen
to full-fill this need, so it not only rejects the DC component, but also attenuates all other frequencies
but the 50 Hz component. Moreover, it also acts as an Anti-Aliasing filter. The filter specifications are
presented bellow:
• Central Frequency f0 = 50 Hz;
• Bandwidth b = 10 Hz;
• Filter Gain at the central frequency G(f0) = 4 ' 12 dB.
The chosen values guarantee a filter with a narrow bandwidth (good selectivity - Q = 5). The value
of the gain at this stage influences the rest of the circuitry, namely, the PGA gain range. The higher the
gain of the filter, the lower the maximum gain of the PGA has to be, in order to guarantee a high dynamic
range. Therefore, initially this filter was designed to have a gain of 10. However, after the implementation
of the PGA (which is provided in section 3.3.2.C), it was realized that the designed gain was too high
and, consequently, it was reduced to 4. A lower gain increases the available bandwidth, resulting in a
device with a faster response in time.
To implement this filter, a multiple feedback structure was used, which allows the implementation of
a simple and reliable 2nd order band-pass filter, for low quality factors.
VOUTVIN
= − Hω0s
s2 + ω0
Q + ω20
(3.8)
30
Figure 3.13: Band Pass Filter
The bandpass transfer function of a second order bandpass filter can be obtained with equation 3.8,
where H is a gain factor, ω0 is the central frequency of the filter and Q is the quality factor. The transfer
function of the above circuit is present in equation 3.9.
VOUTVIN
= −1
R1C4s
s2 + 1R5
(1C3
+ 1C4
)s+ 1
R5C3C4
(1R1
+ 1R2
) (3.9)
To get the components’ values, it is just a matter of solving a system of equations involving the
numerator and the denominator of both transfer functions. The process to get the values is the following:
1. Choose C3 Value;
2. Do k = ω0C3 and C4 = C3;
3. Now, R1 is1
R1C4= Hω0 ⇔ R1 =
1
Hk; (3.10)
4. Resistor R5 is2
C1R5=
1
k(2Q−H)⇔ R5 =
2Q
k; (3.11)
5. And, finally, resistor R2 is
R2 =1
R5C3C4ω20 − 1
R1
=1
k(2Q−H). (3.12)
Before going any further, we must guarantee that the current requested by the input load of the
amplifier is not higher than the one the current sensor can provide. The ACS714 current sensor can
provide a maximum current of 3 mA , so this means the band pass-filter has to have a sufficient high
input load or else the requested current will be too high:
VmaxZinput
≤ 3mA⇔ Zinput ≥3.2V
3mA' 1.07kΩ (3.13)
Where Vmax is the maximum voltage the current sensor will give and Zinput is the load seen by the
current sensor, which is equal to Zinput = R1+R2. In accordance, we need at least an input impedance
of 1.07 kΩ. The Quality Factor is given by Q = f0b = 50
10 = 5. The gain at the central frequency will be
influenced by the H factor, but also by the filter quality factor. Thus, to obtain the desired gain, the H
factor must be H = GQ = 4
5 = 0.8.
31
Thus, choosing H = 0.8, C3 = 0.22 µF (it has to be low to avoid the use of an electrolytic capacitor)
and after performing the remaining computations for the resistances values, the nominal values illus-
trated in table 3.1 were achieved. Note that the resistances with values obtained with summation are
obtained with the series of two resistors.
Component Nominal 5% Tolerance 1% Tolerance
R1 36.172 kΩ 36000 + 180 = 36180 Ω 23.2 + 13 kΩ
R2 738.1955 Ω 680 + 56 = 736 Ω 698 + 40.2 Ω
R5 289.37 kΩ 130 + 160 = 290 kΩ 243 + 46.4 kΩ
C3 = C4 0.22 µF - -
Table 3.1: Component Values
Resorting to Matlab [42] functionalities, an analysis of the filter was developed, by considering these
resistor values. All the components were subject to deviations in respect to their nominal values, consid-
ering 1% and 5% compnonents tolerances. The goal was to realize how the tolerance of the components
can influence the response of the filter. The results are illustrated in figures 3.14 and A.1 (in appenidx
A) and are also resumed in table 3.2.
101
102
103
−30
−20
−10
0
10
20
30
Bode Diagram
Frequency (Hz)
System: REAL Frequency (Hz): 48.4 Magnitude (dB): 12
The non-linearity is represented in figure 3.28(a). However, in order, to apply the theorems, this
linearity must be contained in the 1st and 3rd quadrants, must be time-invariant (in the case of Popov
Criterion) and memoryless. A slightly change can be made to the system to shift the graphic and center
it around zero. Thus, subtracting -200 to the input signal of the non-linearity, we can shift the graph of
figure 3.28(a) to the left by 200 and consequently satisfy the criteria with no loss of generality, whatsoever
or even change in the dynamic of the overall system2. Hence, the final plot of the non-linearity function
is presented in figure 3.28(b), which is an ’on-off’ non-linearity with hysteresis and a dead zone.
By the definition of equation 3.44, the above function belongs to the sector:
φ ∈ [0, 1] (3.52)
2The margins used for this analysis may not be the ones used physically. However, it does not matter which margins arechosen, since it is always possible to shift the characteristic to comprise the sector requirements
51
Resorting to MatLab, the space-state model of the system in figure 3.27 is:
x = Ax +Bu
y = Cx
u = −φ(y)
(3.53)
Where,
A =
−10 1.286× 1010
0 −7.85× 105
, B =
0
1
, C =[10 0
]and D =
[0]
(3.54)
Taking into account equation 3.43,this yields the following open-loop transfer-function:
G(s) =1.286× 1011
(s+ 785000)(s+ 10)(3.55)
The non-linearity is time-invariant, so Popov Criterion can be applied. Therefore, according to the
criterion in equation 3.45, by doing K = 1, q = 1 and if the next inequality is satisfied, then the stability is
guaranteed:
<(1 + jω)G(jw) > − 1
K= −1 ∀ ω ∈ < (3.56)
Figure 3.29(a) represents the Nyquist Diagram for equation 3.57.
H(s) = (1 + s)G(s) (3.57)
−2 0 2 4 6 10 12 14 16 18x 10
4
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1x 105 Nyqust Diagram
Real Axis
Imaginary
Axis
NyquistR = -1
8
(a) Nyquist Diagram of H(s) for Positive Frequencies
−1 0 5 15 20−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1x 104 Nyquist Diagram
RealAxis
Imaginary
Axis
NyquistR = -1
10
(b) Zoom in Nyquist Diagram for H(s)
Figure 3.29: Nyquist Diagram
Taking a closer look in the neighbourhood of the vertical line at R = -1 (see figure 3.29(b)), can be
observed that the diagram lies at the right of that line, so the Popov criterion states that this system is
absolutely stable for any non-linearity contained in sector φ ∈ [0, 1]. Note that the particular choice of the
sector, resulted in a line that passes in the point R=-1, thus coincidently resulting in the classic Nyquist’s
52
condition, which guarantees linear system’s stability. If other sector had been chosen, such as φ ∈ [0, 2]
(actually, the non linearity belongs to any sector where K1 ≥ 1), the system would also be stable. The
Popov criterion would not guarantee stability only for the sector [0,+∞[, since this would constrain the
Nyquist diagram to lie at the right of the imaginary axis and, as we witnessed, that it is not the case, at
least for q = 1.
3.4 Summary
This chapter addressed fundamental design aspects for the successful development of Powermeter
device. It started by introducing the architecture of the overall system, revealing to the reader how ev-
erything connects together (sensors, microcontroller, host, power rails), so power measurements can be
done seamlessly. The rails composing a common PSU were also studied in order to ensure that Power-
meter is suitable to sense any rail and component of a computing system. Furthermore, it was analysed
how the different signals that a PSU works with (AC and DC) could be measured and conditioned, so a
precise and accurate measurement could be attained. Under this topic, it was developed a convenient
approach to sense AC signals, the AGC. This method comprehends several electronic blocks, which
filter and dynamically amplify/attenuate the sensed AC signal. With this methodology, it is possible to
increase the ADC dynamic range, in order to provide very accurate readings and prevent signal loss of
very low voltage signals.
A thorough analysis to the system was developed. MatLab was used as the supporting tool to design
the controller and validate the system’s performance, under some relevant case examples. Since the
dynamic range is tightly influenced by the noise, a study to the system’s overall noise was accomplished,
calculating the ADC’s SNR, THD and SFDR, with and without oversampling. Finally, because the system
relies on a non-linearity, a theoretical analysis to the system’s stability was done, introducing some
theorems and frequency domain tools to deal with these kind of problems, entitled in the literature as
Lur’ problems. The analysis showed that the system is absolutely stable for the chosen sector.
The next section will present details about the software API developed under the scope of this thesis,
which implements the algorithms necessary for the energy/power computation.
The reader may recall that the frequency of operation used for all the results in chapter 3 was FS =
3.3(3) kHz. That frequency was obtained by performing the following analysis:
The sampling frequency is limited by, at least, three aspects:
1. Minimum time that data takes to be transferred between the processors (T1 = 5.09 µs);
2. Time resolution allowed by Timer01 module (T2 = 16 µs);
3. Time necessary to compute data within PIC;
The time necessary to process data is one of the constraints that limit the sampling frequency. This
time comprehends, among others, the time to manage all the required buffers to save, send and average
data, the time required for the energy computing and the time used by the controller stage to dynamically
change the gain of the PGA.
PIC18F4550 works at 48 MHz, but every instruction takes (typically) 4 clock cycles. Therefore, the
instruction rate is 12 millions instructions per second. It was measured the time spent by the microcon-
troller’s firmware to process all the software routines by using an oscilloscope and a flag within the main
loop, whose value alternates between the logic values ’0’ and ’1’. The obtained time was approximately
Tspent1 = 900 µs (1.1 kHz). Consequently, it was analysed the assembly code of the project and it was
done an effort to reduce this execution time Tspent1. The most used variables (including flags and buffers)
were moved to the microcontroller’s access bank, which is two times faster than the normal access mode1The project uses a timer module (Timer0) embedded in the microcontroller, which acts as the clock of the microcontroller
58
and loop unrolling (4x) was applied to the most critical loops. However, the bank has limited space of 95
Bytes and it is also used by the compiler to store temporary data. Therefore, not all data could be placed
there, namely the buffer used to average the 32 samples data. In sum, this has resulted in a reduction
to 273 µs, but to avoid working at the limit, it was used 300 µs instead (FS = 3.3(3) kHz).
This is, after all, the bottleneck of the project as regards the sampling frequency of the system,
when compared with the other limits referred at the beginning. For an Intel Core i7 3770K, with 4
cores and 3.5 GHz, which issues 4 IPC, this is a very acceptable frequency, since it corresponds to
an inspection window of 16.8 millions instructions (the lower the better). This is much better than the
resolution achieved by the PowerEgg [26], the WattsUp [27] or the approach discussed in [3], whose
sampling frequency is 4 Hz, achieving a resolution of 14 billion instructions.
4.1.3 Types of Data Transferred
There are different kinds of data being transferred between host and device: Asynchronous data
and Synchronous data (see table 4.1. As it was mentioned before, the synchronous interface is used
just to initiate some routines like the time synchronization between clocks and the sampling process.
The asynchronous interface is called to exchange large amounts of data along the time between the
microcontroller and the host), including data regarding clocks synchronization and the acquired samples
by the microcontroller.
Asynchronous Transfers Synchronous Transfers
Time Stamps and Sampled Data Time Synchronization Data Initialize Clocks Sync and Sampling Process
Table 4.1: Types of Data Transferred
Hence, it is necessary to distinguish the data sent/received by both systems (microcontroller and
host). Therefore, the messages are divided into an Header and a Payload. The Header gives the
possibility to differentiate the type of data being transferred, while the Payload is the data itself that the
device/host wants to transmit. When in asynchronous mode, the buffer size is at the most 256 Bytes,
while when in synchronous mode it is just 5 Bytes. Tables 4.2 and 4.6 shows how data is divided in the
In the synchronous structure, the byte named Header can be NTP INIT, START or STOP commands
(used to start clock synchronization, and start and stop the sampling process, respectively). The payload
comprehends the bytes referring to Channel 0, 1, 2 and 3, which can be any of the available channels
to sample (V230, CPU, HDD and GPU), chosen by the user.
In the case of the asynchronous data structure, the Header byte can be NTP HEADER or DATA HEADER,
referring to the different kinds of asynchronous data - Time Synchronization Data and Time Stamps and
59
0 1 2 3 · · · 58 59 60 61 62 63
Header (1Byte) Payload00h
Payload40h
Payload80h
PayloadC0h
Table 4.3: Asynchronous Data Structure
Sampled Data, respectively. The payload is the data exchanged between both processors, which can
be data about clocks synchronization or samples acquired during the sampling process.
4.1.3.A Synchronous - Clocks Synchronization and Sampling Process Initialization
The clocks synchronization and the sampling process initialization processes are done by using the
asynchronous structure. In table 4.4 it is presented an example of the structure used to request the start
of the sampling activity for CPU and GPU channels. Not requested channels are filled with NOP values.
All the commands and headers are specific identifiers common to both MCU’s and host’s application and
are referred with defines saved in a C header file. CPU CHANNEL and GPU CHANNEL are macros,
which refer to one of the thirteen input analog channels (AN0-AN12) of the used microcontroller, where
the sensors output are connected to.
0 39
START CPU CHANNEL GPU CHANNEL NOP NOP
Table 4.4: Sampling Process Initialization Command Example
4.1.3.B Asynchronous -Time Synchronization Data
0 71
NTP HEADER(1Byte) Time (8 Bytes)00h
Table 4.5: Host and PIC’s Clock Synchronization Data Structure
Asynchronous communication uses a buffer of 256 bytes, divided in packets of 64 bytes. However,
during the clocks synchronization process only needs 9 Bytes of information (1 Byte for the Header and
the remaining slots for Time data) - see table 4.5.
The clocks synchronization is achieved by using a protocol that resembles the one used in Network
Time Protocol (NTP) algorithm, for a client-server method. The algorithm has four distinct time stamps:
Originate Time Stamp - Time Request Sent by Client (T1); Receive Time Stamp - Time Received by
Server (T2); Transmit Time Stamp - Time Reply Sent by Server(T3) and Destination Time Stamp - Time
Reply Received by Client (T4).
60
ClientServer
T1
T2
T3
T4
ACK + T2 + T3
NTPREQ
UEST
Figure 4.3: Synchronization Protocol
In this situation the Client is the microcontroller and the Server is the host. In figure 4.3, is provided
the normal steps for synchronization. The Client requests a synchronization process, saving time T1;
then the Server acknowledges and sends times T2 and T3 to the Client. With those time stamps, one
can determine the delay (time taken since the message was sent and until it arrives to destiny) inherent
in a Client - Server communication, by resorting to the next formula:
Delay = (T4 − T1)− (T3 − T2) (4.2)
To compute the time difference between the two clocks (i.e., offset), we start by assuming that it
exists no asymmetry in the communication, that is, Client → Server time and Server → Client time are
the same:
Offset = T2 − [T1 +Delay
2] =
(T2 − T1) + (T3 − T4)
2(4.3)
If the offset is positive, implies that the server’s clock is ahead in respect to the client, so we must
sum the Offset value to clients’ time-stamp. On the other hand, if the offset is negative, then the server’s
clock is lagging in respect to the client and we must subtract the offset value to the client time-stamp.
Hence, with the knowledge of the Offset value, we can adjust the clock used to generate the time
stamp within the microcontroller and synchronize it with host’s clock. The process of synchronization
occurs long enough, to guarantee an offset time in the order of microsecond. Despite this process
allows a very accurate synchronization, along the time, the two clocks will inevitably lose synchronism.
This is mainly due to the asymmetric routes and the network congestion during host’s clock update with
the internet time (NTP). This asynchronism is practically linear with time as we can see in figure 4.4,
where we witness the increase of the offset value with time.
61
1 10 20 30 40 50 60 70
0
5
10
20
30
40
45
50
645.78 µs
Minutes Passed
Offs
et(m
s)
Offset TimesLinear Regression
Figure 4.4: Offset Change Along the Time
The linear regression of the data gives the rate at which the offset gets modified. According to it, the
offset increases approximately 645.78 µs per minute. Thus, we can adjust the microcontroller’s clock to
counter act this deviation by adding 645.78 µs every minute. With the expense of increasing the time
overhead, it was decided to synchronize the clocks every time a packet is received, allowing a better
synchronism, which is an important requirement to allow power profiling at real-time.
4.1.3.C Asynchronous -Time Stamps and Sampled Data
0 63
DATA HEADER (1Byte) Time Stamp00h
Time Stamp Samples40h
Samples80h
SamplesC0h
Table 4.6: Sampling Process Data Structure
The data that is exchanged using the asynchronous interface is the type of data that the application
will handle most of the time. Every time the host receives this type of data, it will receive a packet
containing not only the specific header, but also the payload containing a time stamp - representing
the time the first power sample was taken - and N power samples, which can be related to data coming
from one or more sensors. As it was mentioned, the buffer has a capacity for 256 Bytes of data (see table
4.6). Thus, excluding the Header and Time Stamp, which occupies 5 Bytes together, we can transfer up
to 125 data samples within a 256 Bytes size buffer (because, each sample occupies 2 Bytes of data).
For instance, for N = 100, if the CPU sensor and GPU sensor are being sampled, then the received
packet will have in total N2 = 50 samples from the CPU sensor and 50 samples from the GPU sensor.
This, however, does not translates into a reduction of the sampling frequency, but instead means that
the data will be exchanged more frequently between the host and the microcontroller.
The time stamp which gets associated with every chunk of data (i.e., N samples), is obtained by the
62
usage of another timer module (Timer1), with a period of 5 us. This variable is a 32 bit integer and it
is synchronized with the lower 32 bits of host’s real time clock (in microseconds). Since the time stamp
resolution is in microseconds and because it is a 32 unsigned int variable, the stamp will roll over at
some point in time. Despite that, it does not compromises the ability to correctly tag the data, since the
32 significant bits of host’s clock are also used to stamp data. In fact, every time the host receives a
packet, saves the time of arrival of that packet in a variable. This allows to compare that time with the
incoming time stamp and find out if there was a roll over and correct it, if that it was the case.
4.2 Firmware
Acquisition Board µC Host
Figure 4.5: System Diagram - Microcontroller
The firmware comprehends the set of instructions that are preprogrammed in a embedded system
(the microcontroller). Those set of instructions allows the communication between the hardware system
(acquisition board) and the host, establishing a bridge between both systems (figure 4.5). Following, it
is introduced some of the strategies and procedures to process data with the PIC microcontroller.
4.2.1 Buffering Strategy
_ B_
Figure 4.6: Dual-Buffer Strategy
A dual buffer strategy (figure 4.6) was used to save and send sampled data: every sample is saved
in a buffer (A buffer) of size 256 bytes. When this buffer is filled with data, we must send it to the host
and still allow the sampling process to continue seamlessly. Consequently, another buffer (B buffer)
was necessary to save those samples, whilst A buffer is being utilized only to transfer data. When
B buffer is filled, it is used to send data while A buffer will be responsible to store the sampled data.
Thus, the buffer to store or to send data alternates between A buffer and B buffer along time.
63
4.2.2 Oversampling and Maximum Search Algorithm
Figure 4.7: Oversampling with Rolling Buffer
As it was referred before, in chapter 3, it was performed oversampling of the acquired samples at f =
3.3(3) kHz, in order to improve the DR. Hence, it was used a rolling average buffer of 8 samples (figure
4.7) : this buffer has a pointer, which returns to the start of the vector as soon as the buffer is filled
with data. While the buffer is not completely filled with samples, the program sends the samples directly
without averaging them. This allows a constant sending of useful data even at the start of the program.
When a new sample arrives it is added to the accumulator after the older sample gets subtracted from
it. Finally, the output of the accumulation is averaged and the oversample process is over.
In the case of the sensing of the AC sensor, after the oversampling technique, it is also necessary a
search for the maximum of the acquired data. This is necessary, since in order to compute the power
demanded by the power supply, we have to know the amplitude of the current signal so we can multiply
it by the 230 VRMS and by the power factor (=0.99).
The pseudocode for the maximum search is depicted in algorithm 4.1, which it just illustrates the
main part of the algorithm. This algorithm returns the peak of the sine wave acquired during AC current
sensing. The algorithm is pretty simple, fast and most importantly, effective. It follows the acquired
signal and tests if the current sample is higher than the one before and if it is, declares it as temporary
maximum. Then, tests if the current sample is lower than the temporary maximum, taking into account
the delta parameter, which was set to 10 quantization levels, after some experiments. If this comes to be
true, then the current sample is an absolute maximum and a search for the minimum value is addressed
from this point on. These two routines are inextricably linked and the maximum can be found because
the search for the minimum updates the temporary maximum value.
64
Algorithm 4.1 Maximum Search1: if lookformax then2: if Current Sample <temporary max −delta then3: New absolute maximum found4: lookformax = 05: end if6: else7: if Current Sample >temporary max +delta then8: New absolute minimum found9: lookformax = 1
10: end if11: end if
4.3 Software
Acquisition Board µC Host
Figure 4.8: System Diagram - Host
The software application comprehends the set of instructions that are preprogrammed in the host,
necessary to establish the communication with the microcontroller and process and output relevant
data to the user (figure 4.8). In this section, it is presented the interfaces, functions and procedures
used in the development of the Powermeter application that the end-user has access to. First, it is
presented which functions of the API are used to output the energy and after, it is explored how the
energy computation (and other routines) work together to successfully return the energy and power
To use the first function, the user must choose which channel(s) to sample (CPU, GPU0, HDD or
V230). If there are channels that the user does not want to sample, then it must use NOP as argument.
For example, if only CPU channel must be sensed, the call must be: powermeter api init(CPU, NOP,
NOP, NOP. This function executes every necessary initializations, including libusb library initialization,
vectors and files initialization, and synchronization between clocks. This function
65
The powermeter api start(void) function gives order to start the sampling process, whilst the power-
meter api stop(void) function, besides stopping the sampling process, also closes opened files (used to
store, in different files, all the samples and time stamps for post-analysis), frees allocated memory and
comprise the energy calculation. The function indirectly calls energy calc(long long end time) func-
tion, whose goal is to return the energy spent between the start and stop calls. Table 4.7 summarizes
the main functions and their respective features.
Function Arguments Features
powermeter api init channels to sample General Initializations (Libusb, variables, clock sync)
powermeter api start Void Starts Data Sampling
powermeter api stop Void Stops Sampling, Frees Allocated Memory, Calculates Energy
Table 4.7: Main API Functions
4.3.2 Energy Calculation
The energy is calculated from the power curve by integrating it along time. Since the operation occurs
in the discrete time domain, the energy spent between start and stop commands, must be computed
by using numerical integration methods. There are various methods available to do one-dimensional
integration, that are based on interpolation functions: Rectangle Rule - Order 0 Polynomial, Trapezoidal
Rule - Order 1 Polynomial and Simpson’s Rule - Order 2 Polynomial.
Pwrn
−1
Pwrn
PwrN
−1
PwrN
y = f(x)
x
y
Energy =
N∑n=1
TSampling ×Pwrn−1 + Pwrn
2(4.4)
Figure 4.9: Trapezoidal Rule
For this work, it was used the trapezoidal integration rule (see figure 4.9) since it is a method which
leads to a small integration error, compared to the rectangle rule and also because it is simple. Simpson’s
rule would be a better choice, to reduce the error. However, it would require more sums, multiplications
and divisions by numbers which are not a power of 2 and that constitutes a problem for PIC18F4550’s
architecture.
The algorithm for the energy computing is presented in 4.2 as pseudocode. By analysing it, the
algorithm first starts by adding the energy of some batches of samples, previously calculated in the mi-
crocontroler before sending the entire packet to the host: with this procedure, the CPU is not overloaded
with extra computation, yielding less power and time overhead when running the Powermeter API.
66
Algorithm 4.2 Energy Calculation1: batch number = index/N samples2: for Energy batches below batch number do3: Total Energy += Energy Batch4: end for5: Calculate Remaining Energy using Trapezoidal Rule6: return Total Energy
Pwr0 Pwr1 PwrN
B0 B1 Bn-1 Bn
batch number
BN
Figure 4.10: Energy Computing (where Bn refers to energy batch of the nth packet)
The energy batches received from the microcontroller are stored orderly in a vector. Each batch
corresponds to the energy of a full packet of N samples received. So, there exists a correlation between
the index of a time stamp and the batch it belongs to. For instance, if the host receives 100 samples per
packet, if at the end it has received 200 samples of data (resulting in 2 energy batches) and if the time
at which the sampling process ceased refers to the sample number 110, then doing an integer division
by the number of samples per packet (110/100 = 1) we know that we can add directly the energy of the
first batch (observe figure 4.10). From that point on, the energy must be computed, by resorting to a
numerical integration method. It only stops when the power sample associated to the end time stamp
(the time at which the sampling process has stopped) is reached. Finally, the total energy is returned.
4.3.2.A Time Stamp Search
Besides the matter about how, specifically, the energy computation is done, there is also the issue of
finding the corresponding time stamp to the time at which the sampling process has terminated. Thus,
it is necessary a search algorithm to find the closest time stamp to that time. The pseudocode in 4.3
presents the algorithm used to solve the problem 2. To complement, in figure 4.11 is illustrated in a block
diagrams the search process.
Essentially, the algorithm starts by estimating the index of the vector where the end time is likely to
be: that is done with the pseudocode in line 1. The reasoning behind it, is that each time stamp is equally
spaced by the sampling period (T sampling), thus subtracting the end time by the first time stamp and
dividing by T sampling, it shall give a good estimate of the wanted index. Nevertheless, it is likely that
the obtained index is not the best choice, thus a linear search is conducted over the vector either in an
increasing or decreasing direction as illustrated in figure 4.11. When a possible time stamp is found by
the algorithm, a final comparison is performed between the end time and the two closest time stamps
to discover the one which produces the least absolute error.
2The pseudocode only presents half of the original source code, because the other half is similar, but instead sweeping vectorentries above the index, it searches for the time stamp on entries below it. The time stamps are saved in a structure containing avector with the data and a variable providing the actual size of that vector.
67
Algorithm 4.3 Time Stamp Search1: index← (end time− start time)/T sampling2: if time[index] > end time then3: for i← index− 1 to 0 do4: if time[i] <= end time then5: if time[i+ 1] is closer to end time than time[i] then6: index← i+ 17: break8: else9: index← i
10: break11: end if12: end if13: end for14: end if
T0 T1 Tn-1
i
Tn
index
TN
Figure 4.11: Case Scenario when time[index] = Tn > end time (where Tn = T0 + (n− 1)× TSampling)
4.4 Summary
In this section, the reader was introduced to subjects about the MCU and the host communication
and the development of the MCU’s firmware and host’s API. Firstly, a briefing about USB communication
was presented, explaining, roughly, how it works how it connects to the rest of the work. For instance,
the latency of USB transactions limit the maximum sampling frequency, that can be attained.
Afterwards, some aspects regarding the the microcontroller’s firmware were detailed, such as what
was the buffering strategy used and how the sampling frequency can be configured. It was also revealed
the several types of data interchanged between the host and the microcontroller and how they ”inter-
pret” each type, by verifying the header of each received message. The synchronization process was
explained, which consists of set of messages traded between the two systems, in order to synchronize
their clocks.
In the final section of this chapter, the general structure of Powermeter API was discussed. Within
this subject, the reader had a grasp of the main functions used to operate with the tool, understanding
their purposes and in what contexts they must be used. It was also specified how the energy computing
is done, resorting to a numerical integration method. Finally, it was provided a description of some
important algorithms (Time Stamp Search and Energy Computation).