How to read research papers

Photo by ailatan on Flickr

These last couple of weeks I've been taking my groups of final year project students through the process of starting their literature reviews. There is a separate post on literature reviews on this site here and a post on why you should read academic literature in Computer Science here. This post isn't to do with those topics, this post is about how to read research papers. We often find that if students haven't done much of this sort of reading before their get to their final year getting started can be a bit of a shock. So, this post is designed to help you get started with academic literature and, just as importantly, to help you get the most out of the papers you read in the short space of time you have available (and it is a short space of time, believe me).

Remember the structure of a paper is just like the structure of your thesis

Other posts on this site have discussed the overall structure of your thesis, but in outline this is the sort of structure you should be expecting to produce:

  • Introduction should introduce the reader to your research question and the broad context of the research.
  • Literature review should describe the work that other people have carried out to answer your (or similar) research questions.
  • Method should describe what you did to answer your research question (or to support your thesis, if you think of it that way), and how you went about it. 
  • Results should evaluate what you have done, and say what answer (to your research question) you have arrived at. 
  • Conclusions should summarise what you have done and how you answered the research question.  

Academic writing (in the sciences) of all sorts follows something like this structure, including all of the papers that you will be reading for your project. There are a couple of exceptions to this rule. One is theoretical papers which sometimes put their "related work" (or literature review) somewhere towards the end of the paper rather than after the introduction. The second exception is survey papers. Surveys are extended literature reviews and, as such, are a good place to start in your own literature reviews. ACM Computing Surveys is a journal that publishes survey papers or you can sometimes find them in reputable journals.

Briefly review each paper for relevance

You don't have time to read everything, so it's important to make sure that what you do read is really relevant to your thesis. So, to check whether a paper is likely to be relevant to you first read the Abstract. This should give you a brief summary of the whole paper. So, at the very least the abstract should give you a good idea of what research question the authors were trying to answer. Next, read the Conclusions. This is also likely to be a summary and may well give you a better idea of what results the authors obtained and what work they did not finish but left for "future work". If that doesn't give you a good enough idea of the relevance of the paper to your own work, try reading the last part of the Introduction. This is usually where the authors summarise what is written in each of the following sections of the paper, so that should give you a much more detailed view of what the rest of the paper contains. 

If, after all of that, you think the paper is irrelevant to you, then discard it and move on to something else. Otherwise, you are ready to move on with your reading...

Focus your reading on specific questions

If you just go ahead and read a paper from start to finish the chances are that you won't get very much out of your efforts. You are likely to ramble around the paper, not taking very detailed notes and at the end of your efforts you may not have learned much. A much better way to go about your reading is to keep in mind a number of clear, focussed questions and read the paper with the intention of writing down answers to these questions in your notes. That way you will finish with a clear set of notes that you can be confident will be useful to you when you start writing up.

I would recommend you use this set of questions to guide your reading:

  1. What research question were the authors asking?
  2. Why did the authors believe that their research question was important?
  3. How did the authors go about answering their research question?
  4. What results did the authors obtain or, what did the authors learn from answering their research question?

You can find a template for some notes here.

Making use of your notes

When you have finished reading you should have a stack of notes on all the papers you have read. This should be a much more concise way to start writing up than having a much bigger stack of papers and (most likely) not much memory of what was in them! So, the next thing to do is decide on the structure of your literature review chapter.

The first paragraph of your chapter should introduce the rest of the chapter. This is a good place to remind the reader of your research question and explain how the current chapter relates to it. 

The last paragraph of your chapter should summarise what you have reviewed. This is a good chance to help the reader naviagte around your thesis. Briefly review what you have said in the chapter and refer the reader to the next chapter, explaining how the next chapter follows on from the current one.

The middle part of the chapter is more difficult and, since your writing will depend on your particular research question and the literature you have read, there isn't much generic advice to be given here. However, you can start by reading through your notes and looking for common themes. Think about how best to present the ideas to a reader who has not read the same literature. Do you want to take the reader chronologically through the literature, from the earliest point to the present day? Would it be easier to understand if you split the reading into particular topics that are related? When you have what you think is a good structure, write some section headings into your thesis and think about which papers go in which sections (of course, some papers may well go into several sections). Write the citations into each section using something like EndNote, Mendeley or BibTeX to format them for you. Play around with the structure until you are convinced that it will make sense then write in the details of each section. Make sure you check out this post to help you with your writing.

Why read from primary sources? Or: why reading blog posts is harder, not easier than reading papers

I've been meaning to write this post for a long, long time. Now that I have an enormous pile of marking to get through in double-quick time, I have the perfect excuse for a bit of structured procrastination.

What is a primary source?

primary source, is an original piece of writing, describing some research and written by the person or team who performed that research. A secondary source, is a description or discussion of a piece of research by someone who has read about the research, but did not carry it out themselves. So, if an academic performs an experiment and writes it up as a journal paper, that paper is a primary source. If another researcher then quotes the paper and cites it in one of their papers, then that is a secondary source. Newspaper articles, magazine articles, wikipedia, and most websites and blog pages are secondary sources. When it comes to scientific research, only writing published in peer-reviewed conferences, journals, books and magazines constitutes a primary source.

What is peer-review and why does it matter?

Even if a paper is a primary source describing some research, that doesn't guarantee that the research is rigorous, reliable and high-quality. To ensure that all academic writing meets basic standards of quality assurance, scientists use peer-review. This means that a number of professional scientists (usually two or more) will read through the work carefully, and critique it before it is published. If the work is of very poor quality, or very badly written, it will be rejected and the authors will have to re-write their papers and try to publish them elsewhere. If the work is of a high enough standard to publish, the authors will be given a list of improvements they must make before the paper goes to print. This way, we ensure that inaccurate, incorrect, or incomprehensible work doesn't get published in high quality conferences and journals.

Why read primary sources?

Students often complain about making the leap from reading textbook-style prose to formal, academic research literature. Part of the problem is that the style of writing is different, and takes some getting used to. More deeply, though, students today have likely grown up with the web and with reading informal, secondary sources, making the change is hard work, and nerve-wrecking for some. Why waste hours wading through pages and pages of long-winded, complicated, weirdly-written prose, when you can read a quick, accessible summary on Wikipedia? Well, of course Wikipedia is a good place to start to get a basic overview of an area and help your understanding of the primary sources you are reading. However, it is absolutely essential to read the primary sources themselves. Why? 

Reason 1: secondary sources editorialise

A secondary source will describe some parts of the primary source, but not others. The secondary source will take a particular point of view (i.e. the author will voice their own opinion) and will pick the parts of the primary source that are useful for that discussion. This doesn't necessarily mean that the secondary source is particularly biased (although it might be), it's more that secondary sources are selective in what they discuss. For example, if a paper on Web2.0 discusses the implementation, performance and usability of Web2.0 sites, a secondary source on the subject of usability is likely to leave out any mention of implementation and performance. So, by reading secondary sources you miss out on a lot of the detail of the original work and much of that detail may be very important to you and your work.

It is probably worth saying that there is an important exception to this: survey papers. A good survey paper should be like an extended literature review that discusses, in some detail, the literature available in a broad area of Comptuer Science. These survey papers are a good place to start when writing your own literature review. You can usually find survey papers in well established journals, or specialist survey journals such as ACM Computing Surveys.

Reason 2: secondary sources are sometimes wrong

Every academic field has a number of ideas which are passed on from one generation to the next with little reference back to the original research that generated those ideas. Be somewhat skeptical about this, most of the time there are good reasons to feel assured that this knowledge is sound, especially in fields where mathematical proof is the main way of advancing the field. However, in more subjective or experimental fields (such as Software Engineering or Usability) results can sometimes be misunderstood or misinterpreted over the years.

An example of this is Winston Royce's "Waterfall Method" which (as you probably already know) is a method for organising and planning large programming projects. The central idea in Royce (1970) is very simple and easy to understand: you split the work into a number of different "phases" (requirements gathering, analysis, design, coding, testing, maintenance) and your team performs each phase in turn. There's even a nice image to go with the idea, just to make it nice and easy to understand:

200px-waterfall_model_1

Image source: Wikipedia

For many people, this is where their understanding of the waterfall model stops. But in Royce's original paper there is a long discussion of the drawbacks of organising a project in this manner In fact Royce says that it is "risky and invites failure" (pp. 329). Moving on, the majority of Royce's paper is a list of changes to the sequential model which make it more workable. Some of these are of particular interest, for example "plan testing" is a step that Royce advocates should go with program design. In modern, more "agile" development methods we would advocate writing unit tests around this time, so Royce is presenting a very modern approach. The last modification Royce makes is to "involve the customer" at several points in the process. Again, a much more modern approach that many authors would say goes with agile or "eXtreme" development methods.

The picture Royce paints is not a simple sequential model at all, it's much more complicated than that. Tarmo Toikkanen has written an interesting blog post on this subject. He speculates that the reason people advocate for the basic waterfall method is that the diagram and analogy make it very easy to understand, so people don't delve any deeper into the details. In fact, Toikkanen points out that NATO even have a military standard (DOD-STD-2167) based on Royce's work. [Aside: If you wanted to test Toikkanen's idea that it's the diagram in Royce's paper that leads to the misunderstanding, what experiment would you devise to test that idea?]

More parochially, we often see University students writing something like "in my project I will use the Waterfall Method" sometimes even with a citation. DON'T DO THIS! Read Royce (1970) in full, understand what he's really arguing for, then use a more modern method, or at least use Royce's iterative method found at the end of the paper.

Reason 3: different primary sources may disagree

Research is all about creating and discovering new ideas. Very often primary sources disagree on how best to do that, or they have competing ideas and only through years of research and discussion does a consensus evolve. There are examples of this throughout the history of science. Whether it's the flat Earth debate, Big Bang vs. the Steady State theory, structured programming vs. object oriented programming, through debate, reason, mathematical arguments, prototype systems, models, simulation and all sorts of other techniques, the history of science is full of arguments and competing ideas.

When you read a secondary source, very often whatever "debate" has taken place is already in the past and the author of the secondary source will simply describe the consensus that has since been reached. For example, there were many good reasons for cosmologists to believe in the steady state theory before evidence for the Big Bang became overwhelming. Only by going back to this litereature can we see how the debate unfolded and why the evidence that supported the Big Bang (to do with background microwave radiation in the cosmos which was discovered in the 1960s) was so convincing.

In Computer Science there are also many of these debates. For example, most programming languages do not have a "goto" statement. In fact, Java has a keyword called "goto", but it is not used. In the late 1960s and 70s there was a heated debate about whether "goto" was a safe and useful construct and you can read through that debate in Dijkstra (1968), Knuth (1974) and plenty of other sources. Without going back to these papers, which were written well before the debate was settled, you cannot fully understand the arguments that, eventually, banished the "goto" statement from most modern languages.

Conclusions: reading blog posts is harder than reading papers

So, why did I say in the title of this post that reading blog posts is "harder" than reading papers? Actually reading blog posts may be easier, but in terms of getting a good grade in your project you are unlikely to produce a high quality literature review based on blog posts. Blogs will tend to be selective and biased in their nature. This isn't a criticism of blogs, far from it, blogs are a great place for lively debates. They aren't such a great place to descibe careful, peer-reviewed research in great detail -- that's best left for conferences and journals.

References

Edsger W. Dijkstra (1968) Letters to the editor: Goto statement considered harmful. Communications of the ACM 11, 3 (March 1968), 147-148. DOI=10.1145/362929.362947 http://doi.acm.org/10.1145/362929.362947

Donald E. Knuth (1974). Structured Programming with goto Statements. ACM Computing Surveys 6, 4 (December 1974), 261-301. DOI=10.1145/356635.356640 http://doi.acm.org/10.1145/356635.356640

Royce, W. Winston (1970), Managing the development of large software systems: concepts and techniques In proceedings of IEEE WESTCON, Los Angeles , 1--9 .

How to write a literature review for your final year thesis project



A long time ago I wrote an article on how to pass your final year thesis project, which several students found helpful. In the same vein, this post deals with a particular aspect of the final year project: the literature review.

Every year I supervise projects I find students tend to ask the same questions about their literature review. Most common are:

  • How many papers should I read?
  • How long should the literature review be?
  • Should I read books, articles, or ...?
  • Is it OK to reference websites such as Wikipedia?
  • Who will read my literature review and what can I assume about their knowledge of the area?
  • When should I start the literature review and when should it be finished?


These questions crop up frequently and will be familiar to any readers who are starting their own project. However, when you fully understand the purpose of the literature and how to go about writing one, you begin to realise that these questions are actually not that important. This post is designed to help students make that transition, from not yet understanding what the literature review is for, to having a thorough understanding of its purpose and a clear idea of how to write it up.

The meaning of a final year project

A lot of students starting off their projects talk about writing a "report" at the end of their "project" and doing some "research" as part of their literature review. This is where the subtleties of the English language tend to cause a lot of confusion. "Research" can have many different meanings, and to complete a really successful final year project the first thing to do it to understand fully what is expected of you in an academic context. Even if your final year at University is the last experience you have of academic work, remember that your work will be marked according to academic criteria, and academia has quite different aims to industry.

So, to be clear: research, in an academic context, means adding something new to the body of knowledge that humans have gathered in your area of interest. Your project as a whole will be a piece of research because you will be creating something new that has not been created before. What that is exactly will depend on the field you are studying. It might be a new perspective on a piece of literature, a new proof of a theorem, a new application of a particular technology, or something else. Since you are still an undergraduate it is likely (although not necessary) that your work will be a small step forward. It is unlikely that you will produce something completely ground breaking, so don't be intimidated by fact that your work has to be novel. That said, it may be that you produce an excellent piece of work and your supervisor may want to turn that into a technical report or conference paper with you, which would be great for your CV (or resume).

thesis is a statement of belief that is central to your research. Your dissertation will be a piece of writing that defends your thesis, based on your research. So, for example, if your thesis is regular, online tests help University students to learn new material then you will need to implement some sort of online tests for new material, design and run an experiment to test your thesis, and write it up in your dissertation. Equally, if your thesis is water causes cancer in mice then you will need to plan and run an experiment to determine whether or not this is true and write it up. Notice that you may disprove your thesis in your work. It may be that online tests do not help students learn, or that water doesn't cause cancer in mice. This is absolutely fine, so long as your experiments give a clear answer to the question and you can show that your experiments were performed fairly it doesn't matter whether your thesis turns out to be incorrect or correct (in so far as you have tested it). It may also be that your evaluation is inconclusive, which is also acceptable, so long as your experimental method is good and you can say exactly what further work is necessary to produce a definite result, you will be fine.

Alternatively, you might phrase your thesis as a research question. In which case, instead of having a thesis such as water causes cancer in mice you would ask the question does water cause cancer in mice and your dissertation would describe your efforts to answer that question.

The shape of your dissertation and where the literature fits in

Every dissertation is slightly different, but good dissertations will all contain the same elements. I should say that the advice given in this sections is likely only relevant to science based projects. If you are working in the arts or some areas in the humanities then the expectations of you may well be very different. Still, a good dissertation in the sciences will contain roughly the elements listed below. I say "roughly" because, depending on the exact nature of your work, it may be sensible to expand some sections into two chapters rather than one, or to coalesce some elements into a single chapter. Your supervisor can give you more specific advice on this.

  • Introduction: should introduce the reader to the broad context of the research and explain why this is an interesting area to work in. So, if your thesis is something to do with mobile computing, you might say something here about why mobile phones are important, why mobile computing is an interesting and important area, and broadly what other researchers are working on. At the end of the chapter you will want to introduce your specific research question, having said why the area you are working in (and therefore your question) is important.
  • Literature review: Now you have introduced the reader (who will likely not be an expert in your exact area) to the broad research agenda in the field, and your research question, you can start writing more specifically about your own project. In this chapter you will survey the work that other researchers have done to answer your research question, or related questions. At the end of the chapter you should briefly explain how your own work builds on and differs from the work that has gone before it.
  • Method: this chapter should describe what you did to answer your research question (or to support your thesis, if you think of it that way), and how you went about it. You should describe your work in sufficient detail that another researcher could recreate your work to check your results.
  • Evaluation: here, you should evaluate what you have done, and say what answer (to your research question) you have arrived at. It may be that in your method you describe some experiments, and this section records your results and analysis of those results. This is an important section -- most students gain or lose marks in either their literature review or evaluation. Key to producing a convincing evaluation is to plan very early in the project what information you will need to write this section. More on that in another blog post.
  • Conclusions: should summarise what you have done and how you answered the research question. It may be that your work produced a very clear answer to the question, or it may be that your work points to a need for further research to clarify or confirm your answer. You should refer back to the literature review and summarise how your research differs from (hopefully improves on) the work described in the literature. Make sure you also say what research you would do if you were to continue working on your project.
  • References: a list of publications cited in the main text, in Harvard style or similar format.

It is likely that most chapters will be roughly the same size, although the introductory chapter and conclusions are usually slightly shorter than the others. Try to let the lengths of each chapter be guided by the amount of useful and important information you have to convey to the reader, don't impose artificial word limits on yourself.

Summaries and synthesis: what should go in your literature review

Poor literature reviews often take the same form -- they tend to be a (usually short) list of papers that the student has read, briefly summarised. This is not really what is expected and will not gain high marks. Another common mistake is to review literature that has been used to inform some part of the practical work of the project, rather than to review work that has answered the same or related research questions. To do better, your writing needs to not only summarise the prior art in your area, but also synthesise what is in the literature. In fact, synthesis is one of the key skills that we expect to see from final year students.

So, what is synthesis? The main idea is that you should have understood the literature you have read and, more importantly, you should show that you understand the relationships between items of literature. That means what came first in your field, how it influenced later work, how each step forward in the research improved upon what came before it, and so on. Ideally, you will present your own view of the work you are describing. This partly means that you should be critical of the literature you read, and say where the shortcomings of the work are, and how the work could be improved upon (in particular how your work will improve on the prior art). Also, you might have your own view on where your area of research is likely to go in the future.

Critiquing the work of others is something that is often new to students in their final year. One of the most frequent mistakes I see from students is to criticise the style of the papers they read, rather than the research that those papers describe. Avoid writing things like "this paper is not well written" or "this paper is hard to understand". A literature review should really be a review of the research work that has gone before you, not a literary criticism of the style that other authors adopt.

Practical matters: how to start, how to finish and how to do the bit in the middle

Reading and understanding the work of others is a lifetimes work for professional researchers, it is not something that starts and stops on particular dates, according to a Gantt chart. Your final year project will have a hand in deadline, so you need to be a little more circumscribed about how work. 

Ideally, you should be reading some literature in your field very early on in your project, to help you choose a good topic and write an initial research proposal. I would suggest that you consider this to be the starting point for your literature review and keep on reading and adding to your writing all the way through your project until you hand in your final dissertation. To do this, I suggest you do two things. Firstly, keep a careful log of what you read in a format you find easy to work with. This might be a a log book on paper, or it might be a file on your computer, whatever works best for you. Each entry should include the title of the work you have read and enough information about the authors and so on that you can find the publication again. You should then summarise the work in the paper, including the research question answered by the work, the nature of the answer and the methodology of the research (i.e. what the authors actually did). This will give you a couple of advantages -- you won't forget anything you read because you have a record of it, your literature review can be written from your notes and if you are asked any awkward questions in a viva you can refer back to your log. Secondly, as soon as you feel you have read enough to understand the broad context of the literature, start writing it up formally as your second dissertation chapter. This should really be within two or three months of your starting date. Then, as the project progresses and you read more, you can integrate your new reading into your already drafted chapter. This might seem like a lot of effort early on, but when you come to write up your work, you will be incredibly grateful that your most time consuming chapter has already been written, when it was fresh in your mind, leaving you free to write up the later sections of your work.

An example of good writing

Now you know what a literature review is for, how it fits into your dissertation and how to go about writing your own, it would probably be useful to see an example of (part of) an example review. The paragraphs below give such an example and the text [in italics] is some commentary to explain how each part of the writing contributes towards the start of a good thesis chapter. When you read this, don't worry too much about the subject matter, just try to concentrate on the style of writing and the structure of the text. 

The area of pervasive, or ubiquitous, computing was founded by Wieser (1991) [ referenced] who predicted that computers would one day be integrated into everyday objects and interact with people seamlessly. Although few such products are available today Weiser’s work has led to the creation of a number of research areas, including ambient intelligence (Eli and Epstein 1998), smart dust (Khan et al, 1999) and the Internet of Things (Brickley et al, 2001). [Sets the historical context of the area and defines related areas.]

An early application of pervasive computing was the active badge location system, described by Want et al (1992), in which users and objects were tagged with an "active" badge which could locate and identify them. This system was based on ultrasound locationing, whereas later systems might use RFID technology to achieve the same effect. [describes how the field has changed over time] Uses of the active badge system included routing phone calls, email alerts and so on to the physical location of the receiver. [contextualises the fundamental research]
 
...