18.5.4 Bus Master Common Buffer Operations

Bus master common buffer operations are more complex to manage than bus master read and write operations. because both the bus master and the processor may simultaneously access a single region of system memory. The memory ordering of PCI transactions generated by the PCI bus master is defined in the PCI Specification. However, different processors may use different memory ordering models. As a result, common buffer operations should only be used when they are absolutely required.

If the common buffer memory region can be accessed in a single atomic processor transaction, no hazards are present. If the processor has deep write buffers, a write transaction may be delayed. The EDK II library BaseLib provides the MemoryFence()function to force completion of all processor transactions. If a memory region to which the processor needs to read or write requires multiple atomic processor transactions, hazards may exist if the operations are reordered. If the order in which the processor transactions occur is important, insert the MemoryFence()between the processor transactions. Use sparingly, though. Inserting too many MemoryFence() calls may degrade system performance. For strongly ordered processors, the MemoryFence() function is a no-op.

A good example of MemoryFence()use is that of a mailbox data structure used to communicate between the processor and a bus master. The mailbox typically contains a valid bit that must be set by the processor after the processor has filled the contents of the mailbox. The bus master scans the mailbox to see if the valid bit is set. When the bus master sees the valid bit is set, it reads the rest of the mailbox contents and uses them to perform an I/O operation. If the processor is weakly ordered, there is a chance that the valid bit is set before the processor has written all of the other fields in the data structure. To resolve this issue, a MemoryFence() call is inserted just before and just after the valid bit is set.

Another mechanism used to resolve these memory-ordering issues is that of the volatile keyword in C sources. If the data structure used as a mailbox is declared in C as volatile, the C compiler guarantees that all transactions to the volatile data structure are strongly ordered. It is recommended that the MemoryFence() call be used instead of volatile data structures.

18.5.5 GB Memory Boundary

32-bit platforms may support more than 4 GB of system memory, but UEFI drivers for 32-bit platforms may only access memory below 4 GB. The 4 GB memory boundary becomes more complex on 64-bit platforms. Also, some 64-bit platforms may not map any system memory in the memory region below 4 GB. For more information about the 4 GB memory boundary on various architectures, see Section 4.2 of this guide.

A UEFI driver should not allocate buffers from, or below, specific addresses. These types of allocations may fail on different system architectures. Likewise, the buffers used for DMA should not be allocated from, or below, a specific address. Also, UEFI drivers should always use the services of the PCI I/O Protocol to set up and complete DMA transactions.


Caution: It is not legal to program a system memory address into a DMA bus master. Such programming may function correctly on platforms having a one-to-one mapping between system memory addresses and PCI DMA addresses, but it will not work on platforms that remap DMA transactions, nor on platforms using a virtual addressing mode for system memory addresses not one-to-one mapped to the PCI DMA addresses.


The following sections contain code examples for the different types of PCI DMA transactions supported by the UEFI Specification. It shows how to best use the PCI I/O Protocol services to maximize the platform compatibility of UEFI drivers.

EDK II contains an implementation of the PCI Root Bridge I/O Protocol for a PC-ATcompatible chipset, and assumes a one-to-one mapping between system memory and PCI DMA addresses. It also assumes that DMA operations are not supported above 4 GB. The implementation of the Map() and Unmap() services in the PCI Root Bridge I/O Protocol handle DMA requests above 4 GB by allocating a buffer below 4 GB and copying the data to that buffer below 4 GB.


Note: It is important to realize that these functions are implemented differently for platforms not assuming a one-to-one mapping between system memory addresses and PCI DMA addresses or if the platform can only perform DMA in specific ranges of system memory.