Stata | Advice to doctoral students

Some useful utility programs for Stata

You’ll probably spend a lot of your time doing data management and statistical analysis (which you are doing in Stata, right?). So, small efficiencies in data related tasks can really pay-off in the long run. One way to get those efficiencies is through creating small utility programs that automate tasks that you perform many, many times.

Read more

Naming variables (particularly in Stata)

A consistent scheme for naming your variables is very helpful. It makes coming back to a project after it’s been under review for 3 months much easier and is especially valuable when collaborating with someone else. This is one of those points where there are bad practices and good practices, but no "right" practice. More important is consistent project within (ideally across) projects. So, as a starting point for your consideration, here is what I have developed over time, through lots of trial and error. I think this approach make it easy to find variables and understand their provenance.
Read more

Projects feature of Stata 13

One of the features of Stata 13 is “Projects”, which are meant to provide easier access to multiple files related to a, well, project you are working on. The files can be do files, data, logs, graphs, etc. In fact, they don’t even need to be Stata files. One advantage I have found is that they make it possible to maintain a strict organization of certain types of files going in certain directories, while still having access to all of those files from one pane within Stata. Read more

UPDATE OF "What statistical package should I use?"

Technological progress continues. In an older posting, I mentioned the role of specialized packages that addressed models not available in the general purpose software, such as LISREL for structural equation modeling (SEM). That example is now somewhat moot, as Stata 12 has an extensive SEM capability and new add-ons for R allow modeling of SEMs. I suspect that if I were a power user, I would find limitations in Stata/R relative to the dedicated packages, but at my level, I haven’t found them. Read more

Excel is evil

Excel has caused more trouble for more doctoral students than I care to think about. Doctoral students can hurt themselves with Stata in at least two ways (there may be more).

  • Using it to clean, combine and otherwise manage data
  • Cutting and pasting results into Excel (or worse yet, Word) and then formatting them for presentation

Both of these a very inefficient uses of time. The first is a disaster for data integrity, because it is hard to document, almost impossible to revise, and very easy to mess up (sort only have the variables, be one row off when pasting, etc.) I briefly touched on data management in another posting and will probably write more in the future.

The second use of Excel is also prone to mistakes, although they are probably more easily corrected than butchering your data in Excel. Fortunately, there are many better approaches. Read more

A template for Stata .do files

There are many different approaches to writing and documenting the many steps that go into an empirical project. J. Scott Long has a great book, The Workflow of Data Analysis Using Stata, which I strongly recommend. He recommends developing a series of small, highly focused do files, which are run in sequence as needed. I take a different approach, which is keep all of a project’s code in one honking large do file, which is divided into sections. This posting provides more details about my approach. Read more

What statistical package should I use?

This is an amazingly contentious question. My first answer is "If you are comfortable with a package and it is serving your needs, keep using it." That can be complicated, of course, if you have a co-author dedicated to a given statistics package. If your only need to is pass data back and forth with that co-author, I strongly recommend Stat Transfer, which can convert from pretty much any statistical format to any other. Another consideration is the package most frequently used in your field. Read more