Raw data (also called text data or similar) is stored in a format that is completely independent form any software and can be edited using a simple text editor. Normally a raw data file only contains data, no information on the data like variable names, descriptive information and some additional information is needed to create a correct data matrix, namely information on the format of the data, more precisely where the data is located in the data file.
This document only covers simple data structure, i.e. rectangular data matrices (variables by observations). . For more complex structures see
Data values for all variables for each observation are found in exactly the same column positions (in this examples there are obvously two lines for each observation)
CANADA 1CNDA 15 14312 2140633.6 2.06.3 BAHAMAS 1BHMS 35999999 17099.099.09.0 CUBA 1CUBA 27 5033 856599.0 6.17.5
Data values appear on a single line for each observation as a sequence of values (variable sequence), separated by a separator.
CANADA,1,CNDA,15,14312,2140633.6, 1,2.0,6.3 BAHAMAS, 1,BHMS, 35,999999, 17099.0, 99.0,9.0 CUBA,1,CUBA,27, 5033, 856599.0,6.1, 7.5
"CANADA",1,"CNDA",15,14312,2140633.6, 1,2.0,6.3 "BAHAMAS", 1,"BHMS", 35,999999, 17099.0, 99.0,9.0 "CUBA",1,"CUBA",27, 5033, 856599.0,6.1, 7.5
"CANADA" 1 "CNDA" 15 14312 2140633.6 1 2.0 6.3 "BAHAMAS" 1 "BHMS" 35,999999 17099.0 99.0 9.0 "CUBA" 1 "CUBA" 27 5033 856599.0 6.1 7.5
Some software is able to read in a first line that contains titles for each data column (variable names) to be read in. For spreadsheets this is simply a header that at appears at the top of every column, in statistical software this is usally a variable name that - like in the example below - follows strict naming conventions (no spaces, only letters and numbers and possibly some special symbols).
"Cname","Continent","Cntry", "Infmort","Adultpop","TotalPop","ExpGov","ExpMil","ExpEduc" "CANADA",1,"CNDA",15,14312,2140633.6, 1,2.0,6.3 "BAHAMAS", 1,"BHMS", 35,999999, 17099.0, 99.0,9.0 "CUBA",1,"CUBA",27, 5033, 856599.0,6.1, 7.5
Depending upon software installation, if you choose the .csv extension for the file name, software, for instance Excel, might be registered to handle this file type automatically, i.e. a double-click on the file name will launch the program (you can tell if a particular application handles this kind of file by looking at the file icon).
Note that sometimes this format is also used to store tables meant for printing, with titles, totals, subtotals, i.e. not a simple data matrix that you can analyze as such.
Language specific conventions might cause trouble, namely