Excel is evil | Advice to doctoral students

Excel is evil

Excel has caused more trouble for more doctoral students than I care to think about. Doctoral students can hurt themselves with Stata in at least two ways (there may be more).

  • Using it to clean, combine and otherwise manage data
  • Cutting and pasting results into Excel (or worse yet, Word) and then formatting them for presentation

Both of these a very inefficient uses of time. The first is a disaster for data integrity, because it is hard to document, almost impossible to revise, and very easy to mess up (sort only have the variables, be one row off when pasting, etc.) I briefly touched on data management in another posting and will probably write more in the future.

The second use of Excel is also prone to mistakes, although they are probably more easily corrected than butchering your data in Excel. Fortunately, there are many better approaches.

Descriptive statistics

There are several packages available to produce correlation matrices and descriptive statistics for publication. I wrote mkcorr several years ago (like most user-written additions to Stata, it can be installed with the command "ssc install xxxxx, replace"). mkcorr produces a correlation table in a format that is easy to import into a spreadsheet or word processing document. So, there is still formatting to do, perhaps even in, shudder, Excel. But, by writing the output directly to a logfile, it avoids two problems with cutting and pasting correlation tables from the results window. First, it allows an effectively unlimited number of variables without wrapping around. Second, it requires less post-processing in a spreadsheet or word-processor, particularly for more involved tables. It also offers a number of small advantages such as allowing the use of labels, controlling the number of decimal places used, and other formatting options. corrtex extends this and allows creation of tables directly in LaTeX.

(A brief aside--LaTeX is a document layout language. It allows for very sophisticated and elegant tables and excels in formatting mathematical formulae. It is also a bit of a throwback to pre-GUI days. If you don’t already use LaTeX, I wouldn’t learn it just as a complement to Stata, particularly in you have co-authors who don’t use it. Worse yet, several journals will only accept submissions in Word. There is a midway point: using LaTex to produce tables, rendering them as PDFs and inserting those into a Word document. See http://www.ats.ucla.edu/stat/stata/latex/default.htm for more information.)

Producing tables of results

Here is where the different between Excel and the right tools is largest. There are several approaches available within Stata.

outreg was the original add-on for producing tables more easily. It produces a tab delimited file with lots of formatting options. This can then be imported easily into a word processing or spreadsheet program, where additional formatting (cell-borders, bolding or italicizing) may be performed if desired. Because of the control it offers and the fact that it is programmable, it beats cutting and pasting.

estout is part of the “second generation” of table producing add-ons for Stata and gives the user tremendous control. I’ve yet to meet a table of results that it can’t produce. It can be used to produce a correlation table, too, but I find the more specialized mkcorr better for this purpose (of course, I wrote mkcorr, so I may not be totally unbaised). One of the great things about estout is that it produce its tables in comma delimited text, fixed format text, rich text format (rtf), html, or LaTeX. Just a little bit of code makes it easy (at least on a Mac) to have a beautiful table open automatically as either a PDF (from LaTex) or Word (from an rtf file). estout’s syntax can be a bit overwhelming, given all its options, so there is also esttab, which is just a “wrapper” for estout, providing a simpler syntax. It’s worth the effort to learn estout/esttab, because once you’ve mastered it’s basics, you’ll use the repeatedly.

Roger Newson has provided a suite of commands that offers a slightly different approach. One saves summary statistics, the results of regressions, etc. in “resultsets” (which are just Stata datafiles) and then uses one or more commands to produce tables from those. My personal preference is for estout, but Newson’s approach has a large and enthusiastic following.

xml_tab takes yet a different approach, producing an xml file that Excel can read directly (as opposed to importing as a comma- or tab-delimited text file). The advantage is that one can programmatically control fonts, cell borders, shading, etc. xml_tab doesn’t seem to work with the Mac version of Excel, so I’ve not had the chance to try it out.