Matlab - Parsing Data Files With Header Content
25 Oct 2007 Daniel Sutoyo 10 comments 10236 views
Part 2: Breaking Down the Code
The next piece of code deals with the header information.
for i=1:3 readin = fgetl(id); para(i) = str2num(readin(6:length(readin))); end fgetl(id);
If all of your data files have the same number of rows in their header content, we can use a for loop to read it in. In this case, I have parameters n, d, and k that I need to read in. Thus, I want to run the for loop 3 times to read in 3 lines. Lets discuss what happens within this for loop. When i=1 (the first time through the loop), we have:
readin = fgetl(id);
Once this line of code is executed, we get
readin= ‘n := 9′ , note that readin is of String type
We then move to the next line of code:
para(i) = str2num(readin(6:length(readin)));
Once this line of code is executed, we get
para(i) = 9 , note that para(i) is of Number type
Anything you read in from the fgetl function will be a string value. But what we need in this case is the numerical value. One way to do this is to just grab the last few characters of the string in readin. In this case, the numerical value starts in the 6th character of the string (’n := ‘ accounts for the first 5). However, your parameter values could potentially have more than one digit. By reading in 6:length(readin) instead of 6 we ensure that all the digits for each parameter are read, making the code more flexible. Since the value obtained from fgetl function will be a string, we need to use the str2num function to convert from a String type into a Number type.
We are ready to read in the numerical data now! The line param p:1 2 3 4 5 := from the input file does not serve much purpose, so we can just call fgetl again to read it in, but not use it. This essentially creates a skipline in reading files. So now we just read in the numerical data with a simple for loop.
for i=1:para(1) data(i,:) = str2num(fgetl(id)); end
Recall that para(1) is where I stored the value n. So this for loop will run n times to read in all the data points I measured. This is convenient if your data sets have different number of data points. Just like before, I use the fgetl function to read in the data. But since all the values in the txt file are all numbers now, I can call the str2num function to convert the output of fgetl(id) into a numerical array. This data is stored in data(i,:).
fclose('all')
Lastly, use fclose to terminate Matlab’s read access to your file. If you don’t do this, Matlab will have the file ‘open’ and you won’t be able to open, move, or delete your file in your windows directory.
I hope this helps people reading in data into Matlab. I realize there can be several different formats in your file. But this is a basic foundation into reading more complicated format.
10 Responses to “Matlab - Parsing Data Files With Header Content”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>

Also, if you don’t know how many rows your numerical data set consists of, you can use a while loop instead of a for loop.
r = 1
while 1
readin = fgetl(id);
if readin == -1 break; end
data(r,:) = str2num(readin);
r = r+1
end
I can only say
You Made My day
thanx a lot
Hi,
your matlab help is super useful for me. thank you.
Q: i have a file that has a row of text every 1000 lines of numeric data. looks like that:
text
int1
int2
…
int1000
text
int1
int2
…
int1000
text
etc etc.
if there a way to read the last line of text (it has useful information) and get numeric data. well, i’m sure there is a way, i’m wondering if you know how to do it.
thank you
for the previous post-
i forgot to mention a few things.
1. the last line of text is not at the very end of the file
2. the original file is a .txt file
Misha,
the examples in the tutorial should lay out the framework you need… Since you already know there are 1000 numeric lines, this makes it straightforward
nblocks = # of txt + 1000 lines (if you don’t know this, change this to outer loop to while loop, and make it true when there is no empty lines read in)
for j=1:nblocks
1. call fgetl to read in text ( I don’t know how many lines or how your text are)
2. Then call 1000 times to read in numeric data
end
You can switch the order around if necessary. If you want your code to be more dynamic, you can always add ‘if’ statements to check if the content are text headers or numerical numbers.
hi
thanks for the previous postings…very helpful.
however, i have some data in a .txt file. i don’t know when the data row finish (i.e. i don’t know which row is the last row!) and i have some lines of text in between every (for instance) 10 or 20 rows of data. could you please help me with that?
cheerZ
behzad
behzad–
I’m doing something very similar, where the # of lines of numbers between headers is variable. What I do is use str2double to check if the current line is data or a string. I also put in state variables to keep track of where to put the data, ignore whitespace and empty lines, etc., but this is the core of it.
while ~feof(fid) % go until EOF
line = fgetl(fid);
if isnan(str2double(line)) % it’s a text header
disp(line); % do something with it
else % else, it’s data
disp(str2double(line)); % do something
end
end
Or if you need to input matrices, not just doubles and ints, use str2num:
while ~feof(fid) % go until EOF
line = fgetl(fid);
[x status] = str2num(line);
if ~status % it’s the text header
disp(line); % do something with it
else % else, it’s data
disp(x); % do something
end
end
Cheers!
Another Parsing Question: I have a text file where the first column of data is a string of the date (’2008-08-01 12:00:00′). I can import the data no problem but then I want to make this array into 6 different numerical arrays. I can’t seem to be able to figure out how to do this. How do I split a text array of time stamps into a year, month, day, hour, min, and sec numerical arrays? Any help will be greatly appreciated.
Hi Ellen
It sounds like you have quite a few dates in that format.
one quick way is to simply use indices
Now if your date format changes for some reason, or not all the values are in the same dimension you would probably use some regexp which is a little bit more complicated.
Thanks for the tutorial.
Is there any tutorial describing the fileopen-fileclose procedures in basic level?