abstract = "With the increasing interest in deploying embedded
sensors in a range of applications, there is also
interest in deploying embedded inference capabilities.
Doing so under the strict and often variable energy
constraints of the embedded platforms requires
algorithmic, in addition to circuit and architectural,
approaches to reducing energy. A broad approach that
has recently received considerable attention in the
context of inference systems is approximate computing.
This stems from the observation that many inference
systems exhibit various forms of tolerance to data
noise. While some systems have demonstrated significant
approximation-versus-energy knobs to exploit this, they
have been applicable to specific kernels and
architectures; the more generally available knobs have
been relatively weak, resulting in large data noise for
relatively modest energy savings (e.g., voltage over
scaling, bit-precision scaling). In this work, we
explore the use of genetic programming (GP) to compute
approximate features. Further, we leverage a method
that enhances tolerance to feature-data noise through
directed retraining of the inference stage. Previous
work in GP has shown that it generalises well to enable
approximation of a broad range of computations, raising
the potential for broad applicability of the proposed
approach. The focus on feature extraction is deliberate
because they involve diverse, often highly nonlinear,
operations, challenging general applicability of
energy-reducing approaches. We evaluate the proposed
methodologies through two case studies, based on energy
modelling of a custom low-power microprocessor with a
classification accelerator. The first case study is on
electroencephalogram-based seizure detection. We find
that the choice of two primitive functions (square
root, subtraction) out of seven possible primitive
functions (addition, subtraction, multiplication,
logarithm, exponential, square root, and square)
enables us to approximate feature in 0.41mJ per feature
vector (FV), as compared to 4.79mJ per FV required for
baseline feature extraction. This represents a feature
extraction energy reduction of 11.68 times. The
important system-level performance metrics for seizure
detection are sensitivity, latency, and number of false
alarms per hour. Our set of GP models achieves 100
percent sensitivity, 4.37 second latency, and 0.15
false alarms per hour. The baseline performance is 100
percent sensitivity, 3.84 second latency, and 0.06
false alarms per hour. The second case study is on
electrocardiogram-based arrhythmia detection. In this
case, just one primitive function (multiplication)
suffices to approximate features in 1.13 microJoules
per FV, as compared to 11.69 micro-J per FV required
for baseline feature extraction. This represents a
feature extraction energy reduction of 10.35 times. The
important system-level metrics in this case are
sensitivity, specificity, and accuracy. Our set of GP
models achieves 81.17 percent sensitivity, 80.63
percent specificity, and 81.86 percent accuracy,
whereas the baseline a",