G.5   Architecture Considerations

This section describes the following characteristics of the ARM processors that you may need to keep in mind as you write a VxWorks application:

For comprehensive documentation of the ARM architecture and for specific processors, you may wish to refer to the ARM Architecture Reference Manual and the appropriate data sheets of the processors.

Processor Mode and Byte Order

VxWorks for ARM executes mainly in 32-bit supervisor mode (SVC32). When exceptions occur which cause the CPU to enter other modes, the kernel generally switches to SVC32 mode for most of the processing. No code should execute in user mode. No support is included for the 26-bit modes, which are obsolete.

ARM CPUs include some support for both little-endian and big-endian byte orders. This release includes only support for little-endian byte order, but network applications must convert some data to a standard network order, which is big-endian. In particular, in network applications, be sure to convert the port number to network byte order using htons( ).

For more information about macros and routines to convert byte order from little-endian to big-endian or vice-versa, see the VxWorks Network Programmer's Guide: TCP/IP Under VxWorks.

ARM/Thumb State

This release of Tornado for ARM supports both 32-bit instructions (ARM state) and 16-bit instructions (Thumb state).

Thumb Limitation

When running a Thumb kernel and using either the host or target shell, passing a function name as a parameter to a function does not pass an address suitable for calling. The failure is due to the fact that addresses in Thumb state must have bit zero set, but the symbol table has bit zero clear.

Example: At the shell prompt, type the following:

-> sp func1,func2

where func1 and func2 are names of functions. Function func1 is spawned as a task and passed the address of func2 as a parameter. Unfortunately, that address is not suitable for use as a Thumb function pointer by func1 because, when the shell looks up func2 in the symbol table, it gets back an address with bit zero clear. Calling that address causes it to be entered in ARM state, not Thumb state.

The simplest workaround is to type the following:

-> sp func1,func2 | 1

An alternative is to write func1 as follows:

extern int func2(void); 
int func1(void) 
    { 
    return func2(); 
    }

In this case, the loader provides the correct address for func2 when the object file is loaded. Thus func2 is entered in Thumb state as required when you type the following:

-> sp func1

A more flexible alternative is to write func1 as follows:

int func1(FUNCPTR f) 
    { 
    f = (FUNCPTR)((UINT32)f | 1); 
    return f(); 
    }

This allows you to call the function successfully as follows:

-> sp func1,func2

Interrupts and Exceptions

When an ARM interrupt or exception occurs, the CPU switches to one of several exception modes, each of which has a number of dedicated registers. In order to make the handlers reentrant, the stub routines that VxWorks installs to trap interrupts and exceptions switch from the exception mode to SVC mode for further processing; the handler cannot be reentrant while executing in an exception mode because reentry would destroy the link register. When an exception or base-level interrupt handler is installed by a call to VxWorks, the address of the handler is stored for use by the stub when the mode switching is complete. The handler returns to the stub routine to restore the processor state to what it was before the exception occurred. Exception handlers (excluding interrupt handlers) can modify the state to be restored by changing the contents of the register set passed to the handler.

ARM processors do not, in general, have on-chip interrupt controllers. All interrupts are multiplexed on the IRQ pin except for FIQs (see Fast Interrupt (FIQ)). Therefore routines must be provided within the BSP to enable and disable specific device interrupts, to install handlers for specific device interrupts, and to determine the cause of the interrupt and dispatch the correct handler when an interrupt occurs. These routines are installed by setting function pointers. For examples, see the interrupt control modules in installDir/target/src/drv/intrCtl. A device driver then installs an interrupt handler by calling intConnect( ). For more information, see Wind Technical Note #46.

Exceptions other than interrupts are handled in a similar fashion: the exception stub switches to SVC mode and then calls any installed handler. Handlers are installed by calls to excVecSet( ) and the addresses of installed handlers can be read by calls to excVecGet( ).

Thumb State Interrupt Handling

When an interrupt occurs in a Thumb kernel (in other words, a kernel built with CPU=ARM7TDMI_T) the CPU switches to ARM state. The kernel code then saves appropriate state information and calls the interrupt demultiplexing code. This code can, in theory, be ARM or Thumb code but only Thumb code is supported and tested.

The interrupt demultiplexing code then calls the device-specific ISR (the routine installed by a call to intConnect( )). Again, in theory, that code could be ARM or Thumb code but only Thumb code is supported and tested.


*

CAUTION: In non-Thumb kernels (kernels built with CPU=ARM7TDMI rather than CPU=ARM7TDMI_T) only ARM code ISRs will be entered correctly.

Interrupt stacks

VxWorks for ARM uses a separate interrupt stack to avoid having to make task interrupt stacks big enough to accommodate the needs of interrupt handlers. The ARM architecture has a dedicated stack pointer for its IRQ interrupt mode. However, because the low-level interrupt handling code must be reentrant, IRQ mode is only used on entry and exit from the handler; an interrupt destroys the IRQ mode link register. The majority of interrupt handling code runs in SVC mode on a dedicated SVC-mode interrupt stack.

Fast Interrupt (FIQ)

Fast Interrupt (FIQ) is not handled by VxWorks. BSPs can use FIQ as they wish, but VxWorks code should not be called from FIQ handlers. If this functionality is required, the preferred mechanism is to downgrade the FIQ to an IRQ by software access to appropriately-designed hardware which generates an IRQ. The IRQ handler can then make the call to VxWorks.

Floating-Point Support

In this release, no support is included for floating-point coprocessors. Support for floating-point arithmetic is provided as part of the GNU ARM distribution from the Free Software Foundation, in libgcc.a. The GNU implementation utilizes call-outs rather than emulation of floating-point instructions.


*

WARNING: On ARM processors, double variables have a different format from IEEE double on most other processors. The bit pattern used by the ARM hardware and software floating point implementations follows the IEEE standard; however, the byte order is different from standard practice leading to a cross-endian implementation. Be careful when sharing double values in memory between ARM and other processors.

On ARM, while each word of a double is stored little-endian, the most significant word (the word containing the sign bit) is at the lower address. This is neither pure big-endian (as implemented in 68k processors) nor pure little-endian (as implemented in x86 processors). Cygnus added a new binary floating point format, littlebyte_bigword, to GNU libiberty and changed GDB to use this format for ARM; this implementation is adopted for WRS ARM BSPs and for CrossWind.

This format is chosen to be consistent with the ARM hardware floating point implementation, the ARM7500FE FPA Coprocessor Macrocell. However, since most IEEE floating point implementations are pure big- or little-endian, shared memory exchange of double variables is complicated by the unique cross-endianness of ARM double variables.

Caches

The ARM-7TDMI processor does not have a cache or an MMU. The ARM SA-110 processor has a 16 KB instruction cache, a 16 KB data cache, a write buffer, and an MMU on the chip. The ARM SA-1100 has a 16 KB instruction cache, an 8 KB data cache, and a 512 byte minicache. The ARM-710A and ARM-810 each have an 8KB mixed instruction and data cache, with write buffer and an MMU on chip. The following subsections augment the information in 7. Virtual Memory Interface.

For all of the ARM caches, the cache capabilities must be used with the MMU to resolve cache coherency problems. When the MMU is enabled, the page descriptor for each page selects the cache mode, which can be cacheable or non-cacheable. This page descriptor is configured by filling in the sysPhysMemDesc[ ] structure defined in the BSP installDir/target/config/bspname/sysLib.c file. (For more information about cache coherency, see the cacheLib reference entry. For information on VxWorks's MMU support, see 7. Virtual Memory Interface. For MMU information specific to the ARM family, see Memory Management Unit.)

VxWorks for ARM does not support locking and unlocking of ARM caches. Not all ARM caches support cache locking and unlocking. Thus the cacheLock( ) and cacheUnlock( ) routines have no effect on ARM targets and always return ERROR.

The effects of the cacheClear( ) and cacheInvalidate( ) routines depend on the CPU type and on which cache is specified.

SA-110 and SA-1100 Caches

The SA-110 and SA-1100 processors contain an instruction cache and a data cache. By default, VxWorks uses both caches; that is, both are enabled. To disable the instruction cache, highlight the USER_I_CACHE_ENABLE macro in the Params tab under INCLUDE_CACHE_ENABLE and remove the TRUE; to disable the data cache, highlight the USER_D_CACHE_ENABLE macro and remove the TRUE.

The data cache, if enabled, must be set to cacheable copyback mode. Although the cache appears to support a write-through mode, the effect of the write-buffer is to make this effectively a copyback mode, as all writes from the cache are buffered. The USER_D_CACHE_MODE parameter in the Params tab under INCLUDE_CACHE_MODE should not, therefore be changed from the default setting of CACHE_COPYBACK.

It is not appropriate to think of the mode of the instruction cache. The instruction cache is a read cache that is not coherent with stores to memory so code that writes to cacheable instruction locations must ensure instruction cache validity. You should set the USER_I_CACHE_MODE parameter in the Params tab under INCLUDE_CACHE_MODE.to CACHE_WRITETHROUGH and not change it.

With the data cache specified, cacheClear( ) first pushes dirty data to memory and then invalidates the cache lines, while cacheInvalidate( ) just invalidates the lines (in which case any dirty data contained in the lines is lost).

With the instruction cache specified, both routines have the same result: they invalidate all of the instruction cache. As the instruction cache is a separate cache from the data cache, there can be no dirty entries in the instruction cache, so no dirty data can be lost.

ARM-710A Caches

The ARM-710A has a combined instruction and data cache. The cache is actually a write-through cache, but the separate write-buffer makes this a copyback cache if the write-buffer is enabled. VxWorks uses the USER_D_CACHE_MODE parameter in the Params tab under INCLUDE_CACHE_MODE (which must be the same as USER_I_CACHE_MODE) to determine whether to enable the write buffer.

With either cache specified, cacheClear( ) flushes the write-buffer and invalidates all the ID-cache, while cacheInvalidate( ) just invalidates all the ID-cache. As the cache is a combined, writethrough cache, no dirty data can be lost.

ARM-810 Caches

The ARM-810 has a combined instruction and data cache. Although the ARM-810 cache appears to support a write-through mode, the effect of the write-buffer is to make this effectively a copyback mode, as all writes from the cache are buffered. The USER_D_CACHE_MODE parameter in the Params tab under INCLUDE_CACHE_MODE should not, therefore, be changed from the default setting of CACHE_COPYBACK. The ARM-810 also has a Branch Prediction capability, but this feature is not supported in this release.

The ARM-810 has a copyback, combined instruction and data cache and invalidating a part of the cache is not possible without invalidating other, unintended parts of the cache which might contain dirty data. Before invalidating a cache line, others may need to be pushed to memory.

With the data cache specified, cacheClear( ) has the same effect as for the SA-110: it first pushes dirty data to memory and then invalidates the cache lines. For cacheInvalidate( ), unless ENTIRE_CACHE is specified, the behavior is the same as cacheClear( ). If ENTIRE_CACHE is specified, the entire ID-cache is invalidated.

With the instruction cache specified, the behavior of the cacheClear( ) and cacheInvalidate( ) routines is identical: both just flush the Prefetch Unit, so no dirty data is lost from the ID-cache.

Memory Management Unit

VxWorks provides two levels of virtual memory support. The basic level is bundled with VxWorks. The full level requires the optional product VxVMI. Both are supported by the ARM SA-110, SA-1100, ARM-710A and ARM-810 processors; the ARM-7TDMI supports neither since it does not have an MMU.

For detailed information on VxWorks's MMU support, see 7. Virtual Memory Interface. The following subsections augment the information in that chapter.

ARM Cache/MMU

The caching and memory management functions on the ARM are both provided on-chip and are very closely interlinked. In general, caching functions on the ARM require the MMU to be enabled. Consequently, if cache support is configured into VxWorks, MMU support is also included by default. On the SA-110, the instruction cache can be enabled without enabling the MMU, but no specific support for this mode of operation is included in this release.

Only certain combinations of MMU and cache enabling are valid, and there are no hardware interlocks to enforce this. In particular, enabling the data cache without enabling the MMU can lead to undefined results. Consequently, if an attempt is made to enable the data cache via cacheEnable( ) before the MMU has been enabled, the data cache is not enabled immediately. Instead, flags are set internally so that if the MMU is enabled later, the data cache is enabled with it. Similarly, if the MMU is disabled, the data cache is also disabled, until the MMU is reenabled.

All memory management is performed on "small pages" which are 4 KB in size. No use is made of the ARM concepts of "sections" or "large pages."

Support is provided for BSPs that include separate static RAM for the MMU translation tables. This support requires the ability to specify an alternate source of memory other than the system memory partition. A global function pointer, _func_armPageSource should be set by the BSP to point to a routine that returns a memory partition identifier describing memory to be used as the source for translation table memory. If this function pointer is NULL, the system memory partition is used. The BSP must modify the function pointer before calling mmuLibInit( ). The initial memory partition must be large enough for all requirements; it does not expand dynamically or overflow into the system memory partition if it fills.

Support is also provided for those SA-110/SA-1100 BSPs that provide a special area in the address space to be read, to flush the data cache. All SA-110/SA-1100 BSPs must declare a pointer (sysCacheFlushReadArea) to a readable, cached block of address space, used for nothing else. If the BSP has an area of the address space that does not actually contain memory, but is readable, it may set the pointer to point to that area. If it does not, it should allocate some RAM for this area. In either case, the area must be marked as readable and cacheable in the page tables. The declaration can be in the BSP sysLib.c file, for example:

UINT32 sysCacheFlushReadArea[D_CACHE_SIZE/sizeof(UINT32)];

or in the BSP romInit.s and sysALib.s files, for example:

.globl  _sysCacheFlushReadArea 
.equ    _sysCacheFlushReadArea, 0x50000000

Note that a declaration in sysLib.c of the form:

UINT32 * sysCacheFlushReadArea = (UINT32 *) 0x50000000;

cannot be used as this introduces another level of indirection, causing the wrong address to be used for the cache flush buffer.

During certain cache/MMU operations (for example, cache flushing), interrupts must be disabled and BSPs may wish to have control over this. The contents of the variable cacheArchIntMask determine which interrupts are disabled. This has the default value I_BIT | F_BIT, indicating that both IRQs and FIQs are disabled during these operations. If a BSP requires leaving FIQs enabled, the contents of cacheArchIntMask should be changed to I_BIT. Use extreme caution when changing the contents of this variable from its default.

Some systems cannot provide an environment where virtual and physical addresses are the same. (The SA-1100 CPU is an example of this.) This is particularly important for those areas containing page tables. In order to support these systems, the BSP must provide mapping functions to convert between virtual and physical addresses: the global function pointers _func_armVirtToPhys and _func_armPhysToVirt should be set to point to those functions. If these function pointers are NULL, it is assumed that virtual addresses are equal to physical addresses in the initial mapping. The BSP must set the function pointers before either mmuLibInit( ) or cacheLibInit( )is called.

ARM Memory Management Units

On those ARM CPUs with MMUs, you can set a specific configuration for each memory page. The entire physical memory is described by sysPhysMemDesc[ ], which is defined in installDir/target/config/bspname/sysLib.c. This data structure is made up of state flags for each page or group of pages. All the state flags defined in Page States are available for virtual memory pages.


*

NOTE: The VM_STATE_CACHEABLE flag listed in Table 7-2 sets the cache to copyback mode for each page or group of pages by setting the B and C bits in the page tables. On the ARM-710A only, set the cache to writethrough mode using VM_STATE_CACHEABLE_WRITETHROUGH which sets only the C bit in the page tables.


*

NOTE: The VM_STATE_BUFFERABLE flag is also available on the ARM. Setting pages to this state using vmStateSet( ) results in those pages being bufferable but not cacheable (only the B bit in the page tables is set). Thus writes go through the write buffer, but not the cache. If VM_STATE_CACHEABLE_NOT is used, pages are set to neither cacheable nor bufferable (both the B and C bits are clear).

ARM -710A

The ARM-710a has an MMU Control Register that is not readable. In order to have access to the information, a soft copy is kept in the architecture code. This soft copy is initialized to the symbolic constant MMU_INIT_VALUE. In all WRS ARM-710A BSPs, the initialization code sets the MMU Control Register to this value, so that the register and soft copy are in step.

Writers of other 710a-based BSPs must ensure that the register is set to the initial value of the soft-copy, and that (assuming they use the VxWorks MMU/cache) no discrepancy between the soft copy and the register is allowed to happen.

SA-1100

The SA-1100 CPU has elements of its physical address map fixed such that it is not possible to run VxWorks on it without enabling the MMU to produce a virtual address map of the standard form (in other words, RAM mapped over the exception vectors). BSPs for this CPU (such as Brutus) select INCLUDE_MMU_BASIC for inclusion by default, and use the MMU to implement a standard VxWorks virtual address map.

The SA-1100 contains extensions to the SA-110 MMU, including a read buffer, process ID mapping, and a minicache. No support is provided for the read buffer or process ID mapping in this release. However, the extra state VM_STATE_CACHEABLE_MINICACHE is available on the SA-1100, which is not available on other ARM CPUs. Setting pages to this state using vmStateSet( ) results in those pages being cached in the minicache and not in the main data cache. Calling cacheInvalidate( ) with the parameters DATA_CACHE, ENTIRE_CACHE invalidates the minicache and the main data cache.


*

WARNING: In all other respects, no support is provided for the minicache and the user is entirely responsible for ensuring cache coherency between the minicache, the main cache, and main memory. If no pages are marked with the flag VM_STATE_CACHEABLE_MINICACHE, then cache coherency is handled in the normal fashion, using the standard cacheLib( ) routines.

Memory Layout

The VxWorks memory layout is the same for all the ARM processors. Figure G-2 shows memory layout, labeled as follows:


Vectors

Table of exception/interrupt vectors.

FIQ Code

Reserved for FIQ handling code.

Exception pointers

Pointers to exception routines, which are used by the vectors.

Boot Line

ASCII string of boot parameters.

Exception Message

ASCII string of fatal exception message.

Initial Stack

Initial stack for usrInit( ), until usrRoot( ) gets allocated stack.

System Image

VxWorks itself (three sections: text, data, bss). The entry point for VxWorks is at the start of this region.

WDB Memory Pool

Size depends on the macro WDB_POOL_SIZE which defaults to one-sixteenth of the system memory pool. This space is used by the target server to support host-based tools. Modify WDB_POOL_SIZE under INCLUDE_WDB.

System Memory Pool

Size depends on size of the system image.The sysMemTop( ) routine returns the end of the free memory pool.


   

All addresses shown in Figure G-2 are relative to the start of memory for a particular target board. The start of memory (corresponding to 0x0 in the memory-layout diagram) is defined as LOCAL_MEM_LOCAL_ADRS under INCLUDE_MEMORY_CONFIG for each target.


*

NOTE: The initial stack and system image addresses are configured within the BSP.