USART, ISRRX, Overrun and latest compiler

General discussion relating to the library modules supplied with the compiler

Moderators: David Barker, Jerry Messina

Post Reply
tcdev
Posts: 15
Joined: Thu Jan 15, 2015 12:31 am

USART, ISRRX, Overrun and latest compiler

Post by tcdev » Fri Jan 30, 2015 7:13 am

You'll have to bear with me as I'm still a little up-in-the-air about exactly what is going on, but sometimes posting to a forum brings a 'eureka' moment. At this stage though, I'm going to go out on a limb and declare that either the code I have inherited is doing something very wrong, or there's some sort of bug in one of the compiler-supplied modules.

Some background: the code I inherited was built using an earlier (unknown) version of Swordfish and appears to run without issues. It uses the ISRRX(OnData) callback. ISRRX is configured as HIGH priority and it uses ISRRX.DataChar explicitly and never calls any of the ISRRX functions - and FWIW only the USART.Write function.

I rebuilt it for the latest compiler, and had a few issues. It would lock up after 20s or so. Studying the ISRRX routine I concluded it was getting buffer overruns and fixed it - a few weeks ago - by setting ISRRX.ProcessByte = false in the callback, thus preventing the ISRRX module from buffering the data. That appeared to make sense and so I forgot about it.

In the last few days I have been chasing a "freeze" which I thought I'd introduced; I've introduced a 2nd ISR (LOW priority). However, reverting to the original code and building under the new compiler, the same "freeze" occurs. I've deduced that this bug only appears to happen when the USART gets an overrun condition. And because the code I inherited is rather awful, this tends to happen at specific points in the operation of the system, which I can reproduce at will with the same result. This "freeze" actually stops my ISR from running!!! I can work around the "freeze" by disabling RX (RCSTA1.4=0) whenever an overrun is likely to occur; further evidence of the cause.

Whilst I have a work-around, like any good engineer I'm not happy until I understand the cause. And this is where I'm at...

I've hooked up an ICD3 debugger and had a quick prod around. Note that it's Friday afternoon and I'm running out of time. What I have seen is that when the code "freezes", it's actually running the main background code just fine; my LOW priority ISR is suddenly disabled and the ISRRX interrupt IS FIRING despite the fact that there is no traffic to receive! The ISR then tests USART.Overrun and it's TRUE, so it exits! Looks like there's no mechanism to clear the overrun?!?

My questions then:

[*] How are USART overruns cleared? Since the OnData callback is not called in this case, the user can't be responsible?
[*] Why is my low priority interrupt not being triggered? I could understand if the ISRRX interrupt was perpetually firing but I can see my background (main loop) code being executed so it's not being starved of CPU.
[*] Why is the ISRRX interrupt firing when no data is received?

In theory the ADC in my ISR should be interrupting continually (timed from CCP2) so is it possible that it's actually - somehow - causing ISRRX to be firing? I could answer that if I could measure the frequency of ISRRX interrupts, but since it's not calling OnData, I can't wiggle pins.

That's all I have time for, need to dash off. Most likely I'll continue investigations, but maybe someone here can chime in with some suggestions?

Jerry Messina
Swordfish Developer
Posts: 1486
Joined: Fri Jan 30, 2009 6:27 pm
Location: US

Re: USART, ISRRX, Overrun and latest compiler

Post by Jerry Messina » Fri Jan 30, 2015 2:51 pm

>>How are USART overruns cleared? Since the OnData callback is not called in this case, the user can't be responsible?
>>>Why is the ISRRX interrupt firing when no data is received?

The default ISRRX interrupt routine doesn't automatically clear a UART Overrun error.

To clear the error you would need to:
a) detect that the error occurred (presumably in your main loop/serial handing routines)
b) call USART.ClearOverrun()

On the pic, if the usart OERR bit ever gets set (RCSTA.bit(1)), then the uart will stop receiving bytes until you clear the error condition, either by clearing the CREN bit of the RCSTA register, or by resetting the usart by clearing the SPEN bit of the RCSTA register.The characters already in the FIFO buffer can be read, but no additional characters will be received until the error is handled.

If you don't do this, then the code the ISRRX isr handler won't ever read the RCREG fifo, the RX IF will never be cleared (reading the RCREG is what clears IF), and you'll get interrupts over and over again. This would also explain why the low priority isr never gets a chance to run (assuming it's enabled).

If you're getting OERR, then I would expect that perhaps somewhere in the code the uart interrupt is being disabled, and there's no check for the uart status before it's re-enabled. That's an almost sure way to get a lockup.


In my code I usually check for OERR (and FERR) in the ISR, handle them as appropriate, and set some flags so my main code knows these things happened. FERR isn't as bad since it doesn't lock the usart, but it's queued along with the rx fifo.

tcdev
Posts: 15
Joined: Thu Jan 15, 2015 12:31 am

Re: USART, ISRRX, Overrun and latest compiler

Post by tcdev » Fri Jan 30, 2015 10:13 pm

Jerry Messina wrote:The default ISRRX interrupt routine doesn't automatically clear a UART Overrun error.
Noted, as I suspected. Given the horrible architecture of this code, it's a major issue, hence my work-around by disabling RX during potential overrun code paths.
Jerry Messina wrote:If you don't do this, then the code the ISRRX isr handler won't ever read the RCREG fifo, the RX IF will never be cleared (reading the RCREG is what clears IF), and you'll get interrupts over and over again. This would also explain why the low priority isr never gets a chance to run (assuming it's enabled).
What I don't understand is why my background (main loop) code runs but the low priority ISR does't get triggered? A perpetual RXIF - which I assumed at first - should starve the main loop of CPU, but that's definitely NOT happening. I can see the code execute dozens of instructions (I'm yet to investigate exactly how long it runs) before vectoring back to ISRRX(OnRX). This made me question the perpetual RXIF theory.
Jerry Messina wrote:If you're getting OERR, then I would expect that perhaps somewhere in the code the uart interrupt is being disabled, and there's no check for the uart status before it's re-enabled. That's an almost sure way to get a lockup.
There's definitely no code outside the compiler-supplied modules that controls the uart. ISRRX(OnData) doesn't touch any uart controls or flags at all. The only other code touching interrupts is my EEPROM (emulation) write routine which only runs after touch screen calibration (if required) on start up and does it "by the book".

My investigations are admittedly half-baked as I posted late Friday afternoon immediately before walking out the door. Unfortunately this phase of the project is about fighting fires and stabilising the product as it is already being rolled-out across multiple sites. So further investigations may have to wait until the next phase of the project.

The previous "developer" (cowboy/hack) really let them down and there was also clearly no formal acceptance testing process so they've really shot themselves in the foot and come to us in absolute desperation. One option was to can the project immediately (and it was my call) and revert to customisation of an existing product but I believe the new hardware and chosen system architecture is solid - it's the execution and the software that really lets it down. And the PIC code is only half the problem... the 'protocol' - and it's a stretch to call it that - is worse than you can possibly imagine and the PC-based controller software, written in VB6 (yes, 6!) is so horribly bad it would be funny if the system wasn't in production. Next phase is to re-write the software from scratch, including the PIC code. At least my modifications to the code can be lifted straight into a new code base.

Jerry Messina
Swordfish Developer
Posts: 1486
Joined: Fri Jan 30, 2009 6:27 pm
Location: US

Re: USART, ISRRX, Overrun and latest compiler

Post by Jerry Messina » Sat Jan 31, 2015 12:26 am

Sounds like lots of fun!
What I don't understand is why my background (main loop) code runs but the low priority ISR does't get triggered? A perpetual RXIF - which I assumed at first - should starve the main loop of CPU, but that's definitely NOT happening. I can see the code execute dozens of instructions...
That is odd. Off the top of my head, I can't explain that. A perpetual interrupt situation would let the main loop run, but only one asm instruction at a time, so it'd take forever to do anything. It doesn't sound like that's happening.

It sounds like maybe there's more going on somewhere. Some things I'd look for:
- any enable()/disable() statements. Disabling the high priority intr on the PIC18 automatically disables the low priority one too.
- make sure the correct interrupt priority is assigned to match the vector (the IPRx bits)
- see if anyone's mucking around directly with INTCON, or the peripheral IE/IF bits
- make sure only one peripheral is using each priority... otherwise you need to add additional code to the isr's to check for both the IF and IE bits being set.
That means there's a max of two interrupts unless you radically change most isr code.

tcdev
Posts: 15
Joined: Thu Jan 15, 2015 12:31 am

Re: USART, ISRRX, Overrun and latest compiler

Post by tcdev » Sat Jan 31, 2015 2:03 am

Jerry Messina wrote:Sounds like lots of fun!
For some definitions of the word 'fun'! :wink:
Jerry Messina wrote:- make sure only one peripheral is using each priority... otherwise you need to add additional code to the isr's to check for both the IF and IE bits being set.
My low priority ISR handles both ADC and Timer0 interrupts, but I'm handling those as you'd expect - not the 1st time I've written ISRs for multiple interrupt sources.

Noted your other suggestions and can't think where anything untoward is being done in the code with interrupts. It all comes back to the overrun somehow. I'll do some more investigation when time allows because in my books, bugs aren't fixed until you can explain both the cause and the fix! And I can't explain the former atm. :oops:

Thanks for your input!

Jerry Messina
Swordfish Developer
Posts: 1486
Joined: Fri Jan 30, 2009 6:27 pm
Location: US

Re: USART, ISRRX, Overrun and latest compiler

Post by Jerry Messina » Sat Jan 31, 2015 12:08 pm

Here's something you could try that might help...

Code: Select all

// nop instruction
public inline sub nop()
   asm
      nop
   end asm
end sub

// software breakpoint instruction
// this is undocumented... it assembles to an opcode of 0x00E0
// it works with PICKit, ICD, etc
public inline sub _trap()
    asm
        trap
    end asm
    // add a nop for brkpt skidding
    nop()
end sub


interrupt isr_high(ipHigh)

	if ((periph_IE = 1) and (periph_IF = 1)) then
		<< do normal stuff >>
	else	// unhandled interrupt...
		_trap()
		nop()
	endif

end interrupt
That should generate an ICD breakpoint if you ever have an unexpected ipHigh interrupt. I don't know of any way to directly see the cause of an interrupt request, but try the following:

From the MPLAB menu 'Debugger | Settings... | Freeze On Halt' tab check 'Freeze Peripherals on Halt', and that will stop everything when you get a breakpoint.

Examine all the INTCONx, PIRx, PIEx, and IRPx registers. You should be able to see who is enabled, and for what priority.

You might even single step from the halt and see where you go back to. That should give you a clue if it's something in the program doing it, or if it's just the peripheral.

TonyR
Posts: 75
Joined: Fri Jan 14, 2011 11:49 pm
Location: Sydney
Contact:

Re: USART, ISRRX, Overrun and latest compiler

Post by TonyR » Sat Sep 12, 2015 11:18 pm

Theres always a danger in making comments without knowing facts, perhaps you should have contacted the cowboy for help.

I would guess, the cowboy being the codes author, already knew about the issues and how to fix them but hadn't due to lack of time or other reasons.

Your project sounds similar to one I knew of. The customer had actually sold the system the cowboy had to design BEFORE it existed! The customer was in so much trouble the cowboy was given 6 working weeks to design and test an RS485 networked system with multiple touch screens, hardware and software, Single handed!! The cowboy succeeded which was a REMARKABLE feat considering system design, PCB layouts, schematics, component selection, software for the terminals, control software for the controller AND manufacturing of three systems all had to be completed in 6 weeks. In the ridiculous short project time some aspects weren't perfect and choices were made based on expedience not technical perfection.

When the system was at first base and working the customer immediately breached payment agreements with the cowboy and refused to proceed with proper system testing, tuition of forthcoming contractors and quality control. Rather the customer dashed out the system to its customers much to the horror of the cowboy.

I would have expected that customer to pay for a version II of the system with bugs fixed (above) and documentation including fully commented code that would have helped future working contractors but the customer ignored the cowboys advice believing it knew better. So the customer was never at any time "let down" in the case I know of, I think that particular customer tried to ripoff the cowboy and the cowboy quit.

Its a pity we electronics design contractors can accidentally get involved with customers who have made false promises and act in a unethical way. As it turned out the cowboys predecessor had the same problems with that customer which is why he quit too. It would be much simpler to take ones time and design pristine products. Im sure many of us in this business have had the same issues as that cowboy did.

Post Reply