

READING FILES IN MATLAB PLUS
I typically see 90% plus improvement in speed over single line readers.

It's a little bit tedious, but not that hard. Then if you want you can process the resulting large matrix one row at a time, before repeating the process until you read the end of the file. Then I either read additional single characters (or back off single characters) to get a round number of lines, and then run your string parsing (e.g. I typically read in a large batch of characters (1e5, 1e6 or thereabouts, depending on the memory of your system). I understand that sometimes you cannot fit a whole file into memory. your bottleneck will always be file I/O if you are reading these files line by line. %% Reading in large batches into memory, vectorized operations (non-compliant solution)ĬHECK = round((CHECK + mean(scannedData(:)) ) /2) įprintf(1,'Fully batched operations. %% Using Java scanner for file reading and string conversionįprintf(1,'Using java single item token scanner. Reader = java.io.LineNumberReader(java.io.FileReader('demo_file.txt'),bufferSize ) įprintf(1,'Using java single line file reader and sscanf on single lines. %% Using Java single line readers + sscanf ScannedData = reshape(sscanf(data,'%d, %d'),2,)' ĭataIncrement(end+1) = fread(fid,1,'uint8=>char') %This can be slightly optimizedįprintf(1,'Reading large batches into memory, then sscanf. While ~isempty(dataIncrement) & (dataIncrement(end) ~= eol) & ~feof(fid)ĭataIncrement(end+1) = fread(fid,1,'uint8=>char') %This can be slightly optimized %% Reading in large batches into memory, incrementing to end-of-line, sscanfĭataBatch = fread(fid,bufferSize,'uint8=>char')' ĭataIncrement = fread(fid,1,'uint8=>char') Nums = įprintf(1,'Using textscan in large batches. ScannedData = textscan(fid, '%d, %d \n', bufferSize) ScannedData = reshape(fscanf(fid, '%d, %d', bufferSize),2,)' įprintf(1,'Using fscanf in large batches. %d check \n', t, CHECK) įprintf(1,'Using sscanf, once per line.
READING FILES IN MATLAB CODE
Sample code for all of the solutions described above are included below. In fact that solution is 2 - 3 times slower than the comparable single line result using native readers. (Not the "check" value does not match for the last entry.)įinally, in direct contradiction a previous edit of mine within this response, no savings are available by switching the the available cached Java, single line readers. However, some algorithms do not lend themselves to this, so we leave it alone. If we are willing to violate rule number three in the original post, another 7/8 of the time can be reduced by switching to a fully numeric processing. More than half of the original time (68 -> 27 sec) was consumed with inefficiencies in the str2num call, which can be removed by switching the sscanf.Ībout another 2/3 of the remaining time (27 -> 8 sec) can be reduced by using larger batches for both file reading and string to number conversions.

Fully batched operations (non-compliant).Using java single line file reader and sscanf on single lines.Reading large batches into memory, then sscanf.I put together a quick script to test out the ingestion speed (and consistency of result) of 6 variations on these themes.

