Support Hardware FPU on Cortex-M4 with Linux

The Cortex-M4 cores features a floating-point unit (FPU) which offers hardware support for single-precision IEEE-754 floats. It can improve a performance of user space applications used floating-point operations.

To enable floating point support, the following kernel config option should be set: CONFIG_VFPM=y. Without this option, a process using floating point instructions will get a SIGSEGV.

Floating-point instructions for Cortex-M4 are enabled with the -mfpu=fpv4-sp-d16 compiler option. This option is enabled in the toolchain by default.

The GCC flag -mfloat-abi= specifies which floating-point ABI to use. Permissible options are: soft, softfp and hard.

  • soft causes GCC to generate output containing library calls for all floating-point operations.

  • softfp allows generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. That means that function calls are generated to pass floating point arguments in integer registers, which means that soft and softfp code can be intermixed. The problem is that copying data from integer to floating point registers incurs a pipeline stall for each register passed or a memory read for stack items. This has noticeable performance implications in that a lot of time is spent in function prologue and epilogue copying data back and forth to FPU registers.

  • hard allows generation of floating-point instructions and uses FPU-specific calling conventions, that means that floating point arguments are passed directly in FPU registers.

Currently, the hard option cannot be used , since the toolchain is built for softfp and the hard-float and soft-float ABIs are not link-compatible.

Here is an example of using hardware FPU in user space application.

  1. On the host activate the development environment (refer to Installing and Activating Cross Development Environment ):

    $ . ./ACTIVATE.sh
  2. Create a simple fpu C application on the host:

    $ cd /tmp $ vi check_float.c #include <stdio.h> #include <stdlib.h> #include <sys/time.h> #include <err.h> int main(int argc, char **argv) { register int n; register int i; struct timeval start, end; float x = 0.3f, y = 2.0f, z = 0.0f; n = atoi(argv[1] ? argv[1] : ""); if (n <= 0) n = 50000000; if (gettimeofday(&start, NULL)) err(1, "gettimeofday"); for (i = 0; i < n; i++) { z += x * y; x += 0.00001f; y -= 0.00001f; } if (gettimeofday(&end, NULL)) err(1, "gettimeofday"); long usecs = (end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec; printf("%i loops: x=%f, y=%f, z=%f time spent: %ld.%06ld\n", n, x, y, z, usecs / 1000000, usecs % 1000000 ); return 0; }
  3. Build the application for the target with the -mfloat-abi=soft GCC flag:

    $ ${CROSS_COMPILE_APPS}gcc -o check_float_soft check_float.c -Wall -mfloat-abi=soft -Os
  4. Build the application for the target with the -mfloat-abi=softfp GCC flag:

  5. Copy the binaries to the directory exported via NFS:

  6. Run the application binaries on the target. One easy way to do that is to have an NFS share mounted on the target, which immediately provides access to the host development directory:

Â