I've got a c++ program recording various telemetry data read from serial ports, etc. The program records data at approximately 12KB/s, but is buffered for about 4 seconds and written to file in chunks. So about 48KB at a time.
I noticed that around every 30 seconds, the program would 'freeze' causing a loss of data being read in. I switched from a Class 4 to a Class 10 microSD card and this reduced the frequency of the 'freeze'. I multi-threaded the program to double-buffer the write-data, this reduced the frequency even more, but I still hang about once every 2.5 minutes and lose a good number of samples in the process. FYI the code was tested on a desktop windows machine and a desktop Ubuntu machine and works exactly as expected, no errors, great data.
I'm using the pre-built image from Dec. 2011 (most recent pre-built image) I'm wondering if there are any I/O settings or something that would cause this kind of behavior. The data size&speed is well within the SD card capability and the code has been thoroughly debugged. Any suggestions?
In case it helps anyone, initial testing shows that moving to an ext4 filesystem helped further improve this error (although it did not eliminate it completely).
Testing the identical process on a temporary RAMFS completely eliminated the issue, but is unacceptable in terms of memory loss; however, it does help point to the data writing process as the culprit and not the code routine itself.
If anyone has any tips to make data writing safer/more reliable it'd be appreciated!
Also, has anyone worked with the industrial grade microSD cards, such as these: http://www.atpinc.com/p2-4a.php?sn=00000391
Maybe you could run the OS from the NAND flash and just store the data on the microSD card? I haven't used SLC microSD cards before, but I have used SLC USB drives and the write speeds are significantly faster. It is very noticeable when writing large amounts of data or cloning entire drives.
This post was updated on .
In reply to this post by qmay123
Some code or pseudo code from you might help.
Are you dropping data because the serial reader thread is blocked
by the writer thread?
Or is your whole system somehow freezing while disk writes are
going on causing you to miss data?
You could test this second case by running your current program
writing to a RAMFS and at the same time running another program
doing disk writes to try and freeze the system.
Hopefully it's the first case.
If so, it's not clear to me what your double-buffering step entails
or how adding an extra copy helps things.
Our typical multi-threaded data routines look like this
Collection Thread (queue writer)
alloc buffer (or more typically grab from a pre-allocated pool)
read data from device
add buffer to queue
Disk Thread (queue reader)
if queue empty
remove buffer from queue
write data to disk
free buffer (or return to pool)
With only one queue writer and reader, moving only the head
or tail respectively, you don't even need locking.
The only way the collection thread gets blocked with this algorithm is
if we run out of memory for buffers. But if that's happening then
we are truly exceeding the throughput limits of the system.
Your throughput requirements are not that big.
Thanks Ryan, I might try that, would need to flash a NAND with updated kernel first.
Read data from device
if data is new, then
Store data in shared global vars (lock and unlock var mutex before & after)
if time_elapsed > x then
grab data from shared global vars (lock and unlock var mutex before & after)
place data in preallocated buffer
if data has been grabbed 100 times, trip the write flag
if write flag
copy buffer locally
sleep for a little bit before check if write flag is tripped again
The purpose of the write thread is to allow the OS to determine the CPU priority given to writing data, with hopes that it could do it smarter than if it was in the buffer thread and to keep the write from blocking the buffer thread.
The 'hang' that I see is that the collection loop does not update the shared data, but the buffer thread continues to grab the data at the same rate, so I see the same data for .5 seconds worth of data or so, so several lines. So, the whole system is not frozen since the buffer thread is still going. I thought for a while that the collection thread had a bug, but the experiments and fixes I did as described all point to data writing as the bottleneck if I'm not mistaken. Testing by writing to ramfs and running flawlessly, in my mind, confirms that suspicion. What do you think?
On Thu, Jan 3, 2013 at 10:17 AM, jumpnowdev <[hidden email]> wrote:
where ideas take flight
1590 N. Roberts Rd., Ste 203
Kennesaw, GA 30144
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
gumstix-users mailing list
I guess I can't tell from your pseudo code where the problem is.
I agree that disk writing latency is probably your issue, but latencies in the
disk writing thread should not be impacting the collection thread.
If it does, then you've negated a primary reason to multi-thread.
As long as the system can sustain your total write throughput requirements,
then your algorithm should be buffering any speed differences between the
You could test your algorithm by putting in a sleep (maybe random) into your
your current disk writer code when you are using a ram disk. See if you get the
Putting a timer around your disk writing code could get you some idea of the
variability and worst case times when using the disk.
You could also copy some files from the shell and measure the O/S disk write
speeds on your system. There's no reason your app shouldn't be able to match
Just some ideas.
FWIW, I know we've done systems with higher throughput then this on the Gumstix.
Thanks jump, I appreciate the thorough answers. I might go a little timer crazy on some of the processes and see if I notice anything.
I think I'll also run the process in ramfs and have a shell script copy over the write file at a certain rate and see if the behavior persists. Will report results.
|Powered by Nabble||Edit this page|