Implement new cache functions for or1k and create new bspstart function
for or1ksim to initialize instruction and data caches. Also, sim.cfg
is modified to enable/confiure cache units.
Adds functions that allows the user to specify which cores that should
perform the cache operation. SMP messages are sent to all the specified
cores and the caller waits until all cores have acknowledged that they
have flushed their cache. If CPU_CACHE_NO_INSTRUCTION_CACHE_SNOOPING is
defined the instruction cache invalidation function will perform the
operation on all cores using the previous method.
Also cleanup:
* Remove un-needed interrupt disables.
* Address errata "e989: FLASH: Disable Prefetch during programming and erase"
* Use RTEMS_ARRAY_SIZE() macro instead of own macro.
Fix context switch on SMP for ARM, PowerPC and SPARC.
Atomically test and set the is executing indicator of the heir context
to ensure that at most one processor uses the heir context. Break the
busy wait loop also due to heir updates.
This reverts commit d9ff8b3e68.
It is not that simple:
https://sourceware.org/ml/binutils/2014-06/msg00062.html
On Fri, Jun 06, 2014 at 01:31:48PM +0200, Sebastian Huber wrote:
> On 2014-06-06 13:23, Sebastian Huber wrote:
> >Ok, so this "cmplwi cr0, rX, ppc_exc_lock_std@sdarel" is illegal,
> >since
> >ppc_exc_lock_std@sdarel is signed and the immediate is unsigned
> >16-bit? The
> >assembler doesn't issue a warning about this.
> >
> >Exists there a way to rescue this cmplwi hack without relaxing the
> >overflow
> >checks?
>
> Hm, sorry, it was surprisingly simple. This works:
>
> "cmplwi cr0, rX, ppc_exc_lock_std@sdarel@l"
>
> I was not aware that you can add several @ in a row.
That is the wrong thing to use here. sdarel@l translates to a VLE
reloc which applies to a split 16-bit field in VLE insns.
You want
cmpwi cr0, rX, ppc_exc_lock_std@sdarel
to properly compare a 16-bit signed number from sym@sdarel.
Note that the assembler does error if you write something like
cmplwi 3,-30000
or
cmpwi 3,40000
so what the linker is now doing is extending this behaviour to link
time.
See also
https://sourceware.org/ml/binutils/2014-06/msg00059.html
On Fri, Jun 06, 2014 at 11:01:10AM +0200, Sebastian Huber wrote:
> I performed a git bisect and found this:
>
> 93d1b056cb396d6468781fe0e40dd769891bed32 is the first bad commit
> commit 93d1b056cb396d6468781fe0e40dd769891bed32
> Author: Alan Modra <amodra@gmail.com>
> Date: Tue May 20 11:42:42 2014 +0930
>
> Rewrite ppc32 backend .sdata and .sdata2 handling
Hmm, I'm surprised that your git bisect found this patch. Was
_SDA_BASE_ set differently before this?
> 0x00000000000dfc00 _SDA_BASE_
> 0x00000000000d7f78 ppc_exc_lock_std
> 4b8: 28 05 00 00 cmplwi r5,0
> 4ba: R_PPC_SDAREL16 ppc_exc_lock_std
ppc_exc_lock_std@sdarel will be calculating 0xd7f78 - 0xdfc00
which is 0xf...fff8378, and that falls foul of
commit 86c9573369616e7437481b6e5533aef3a435cdcf
Author: Alan Modra <amodra@gmail.com>
Date: Sat Mar 8 13:05:06 2014 +1030
Better overflow checking for powerpc32 relocations
cmplwi has an *unsigned* 16-bit field, and we now check the overflow
properly.
I wonder how many more of these we'll hit, and whether the uproar will
be enough that I'll be forced to relax the checks?
Guest systems in paravirtualization environments run usually in user
mode. Thus it is not possible to directly access the PSR and TBR
registers. Use functions instead of inline assembler to access these
registers if RTEMS_PARAVIRT is defined.
The last optimization missed was incorrect in regards to
PSR write instruction delay must be 3 instructions.
New optimizations:
* align to 32-byte cache line.
* rearrange code into three "blocks" of 4 instructions that
is executed by syscall 2 and 3. This is to optimize for
16/32 byte cache lines.
* use delay-slot instruction in trap table to reduce by one
instruction.
* use the fact that "wr %PSR" implements XOR to reduce by
one instruction.
The exit SPARC system call doesn't have a function entry
point like the others do. This is probably why people use
TA 0x0 instruction directly for shutting down the system.
We must not alter the is executing indicator in
_CPU_Context_Initialize() since this would cause an invalid state during
a self restart.
The is executing indicator must be valid at creation time since
otherwise _Thread_Kill_zombies() uses an undefined value for not started
threads. This could result in a system life lock.
The current implementation of task migration in RTEMS has some
implications with respect to the interrupt latency. It is crucial to
preserve the system invariant that a task can execute on at most one
processor in the system at a time. This is accomplished with a boolean
indicator in the task context. The processor architecture specific
low-level task context switch code will mark that a task context is no
longer executing and waits that the heir context stopped execution
before it restores the heir context and resumes execution of the heir
task. So there is one point in time in which a processor is without a
task. This is essential to avoid cyclic dependencies in case multiple
tasks migrate at once. Otherwise some supervising entity is necessary to
prevent life-locks. Such a global supervisor would lead to scalability
problems so this approach is not used. Currently the thread dispatch is
performed with interrupts disabled. So in case the heir task is
currently executing on another processor then this prolongs the time of
disabled interrupts since one processor has to wait for another
processor to make progress.
It is difficult to avoid this issue with the interrupt latency since
interrupts normally store the context of the interrupted task on its
stack. In case a task is marked as not executing we must not use its
task stack to store such an interrupt context. We cannot use the heir
stack before it stopped execution on another processor. So if we enable
interrupts during this transition we have to provide an alternative task
independent stack for this time frame. This issue needs further
investigation.
This partially reverts commit 1215fd4d94.
In order to support profiling of SMP locks and provide a future
compatible SMP locks API it is necessary to add an SMP lock destroy
function. Since the commit above adds an SMP lock to each chain control
we would have to add a rtems_chain_destroy() function as well. This
complicates the chain usage dramatically. Thus revert the patch above.
A global SMP lock for all chains is used to implement the protected
chain operations.
Advantages:
* The SAPI chain API is now identical on SMP and non-SMP
configurations.
* The size of the chain control is reduced and is then equal to the
Score chains.
* The protected chain operations work correctly on SMP.
Disadvantage:
* Applications using many different chains and the protected operations
may notice lock contention.
The chain control size drop is a huge benefit (SAPI chain controls are
66% larger than the Score chain controls). The only disadvantage is not
really a problem since these applications can use specific interrupt
locks and unprotected chain operations to avoid this issue.
Merge RTEMS_FATAL_SOURCE_BSP_GENERIC and RTEMS_FATAL_SOURCE_BSP_SPECIFIC
into new fatal source RTEMS_FATAL_SOURCE_BSP. This makes it easier to
figure out the code position given a fatal source and code.
Instead of SPRG0 (= special purpose register 272) use the new global
symbol _PPC_INTERRUPT_DISABLE_MASK to store the interrupt disable mask.
The benefit is that it is now possible to disable interrupts without
further run-time initialization in boot_card().
At least on Freescale e500 cores this leads also to a faster execution
since the mfmsr and mfspr instruction require four cycles to complete.
The instructions to load the mask value can execute while the mfmsr is
in progress.
According with comment in
rtems_cache_invalidate_multiple_instruction_lines(), final_address
indicates the last address which needs to be invalidated. But if in
while loop we got final_address == i_addr condition then loop breaks and
final_address will not be invalidated.
The new ARM_CP15_CTRL_XP is necessary to share ARMv6 and ARMv7
page-table formats and definitions.
It enables the extended page tables (introduced in ARMv6)
to be configured for the hardware page translation mechanism. This way
we can share ARMv6 and ARMv7 page tables entry formats.
Other Fault Status Register Definitions can be useful for debugging or
excpetion handlers.