- Posts tagged hypothesis
- Explore hypothesis on posterous
How to evaluate your project
This post is all about how to evaluate your project. This is something you will do ideally about half way through your work, but more realistically towards the end of your work, just before you write up. However, it's imporant that you plan your evaluation early on, otherwise you run the risk of getting almost to the end of your work and finding that your evaluation process isn't going to give you any good results. So, the main thing you need to understand when you are planning is how evaluation works in the sciences and how to apply that understanding to your own projects.
Proof
Sometimes we see student dissertations which make claims like "this experiment has proved that ...". Almost always the student has used the word "proof" incorrectly, and will lose marks because they have given the impression that they have not understood the contribution that their work has made to the field they are working in. Very often students in this situation simply haven't understood how scientists use the word "proof".
Proof, in a scientific context, is a mathematical argument that is used to convince other mathematicians or scientists that a theorem (or a mathematical idea) is true. Proofs must never involve evidence or experiments, only arguments. There's an example proof of a simple theorem at the end of this post.
Once mathematicians are convinced that a proof is correct (and sometimes that is difficult in itself, if the proof is several hundred pages long) then it is irrifutable. This is very different to the sort of science that is advanced by experiments, where another scientist can find new data or eveidence that shows that an old idea was wrong.
So, we generally say that a theorem can be proved correct whereas a hypothesis (or guess!) can only be tested via experiments. A hypothesis might turn out to be wrong if experimental data cannot be found to support the hypothesis, or contradictary evidence is found. If a lot of evidence is found to support a hypothesis we might call it a theory. Even so, a theory cannot be proved correct in all cases. For example, if you came up with a theory that said all atoms have a particular shape, you might invent a special microscope to look at atoms and find out if they have your shape. This would provide some evidence to support your theory. You couldn't, however, test every single atom in the Universe, so your hypothesis might well become a theory, but it can never be "proved" correct.
[Aside: there's a long literature in this sort of philosophy of science. If you are really interested, read Karl Popper on Falsification, AJ Ayer on Verification and Paul Feyerabend on scientific revolutions and Imre Lakatos on Proof.]
Scientific method
Scientific method is the way that scientists decide whether a particular hypothesis (or guess) is likely to be a good model for the way the world works. If most scientists accept that the hypothesis is likely to be true, then we call it a theory. Of course, even theories have limitations, and it may be that as more experiments are carried out we find that a different theory fits the evidence better, or that the theory only works in certain circumstances. This is exactly what happened in physics to Newton's laws of motion. It turns out that Newton's laws describe the world pretty well in most cases, they can certainly tell you when your train is likely to arrive at its destination. For other circumstances, for example when you are travelling very fast, close to the speed of light, or for very small particles like quarks, other theories (like Einstein's theories or quantum mechanics) better fit the data we have gathered. Of course, much of this work in areas like physics is driven by what we can measure and observe. Better telescopes mean better theories of cosmology, and so on.
In computer science we also have hypotheses that we can test. For example "functional programming languages can run just as efficiently as imperative languages", "online learning increases student engagement", "objects and inheritance improve code reuse in software companies", and so on.
To be a true hypothesis, and not just the opinion of the author, a statement must be refutable, that is, it must be possible for experiments to determine that the hypothesis is incorrect. The opposite statement to a hypothesis is called an alternate hypothesis. Examples for the hypotheses listed above would be "functional languages are necessarily slower (or faster!) than imperative ones", "online learning has no effect on student engagement" and "objects and inheritance have no effect on code reuse in software companies".
So, to evaluate your own research questions, you need to do the following:
- Devise a hypothesis.
- Form your alternative hypothesis.
- Plan an experiment that tests whether the hypothesis or the alternative hypothesis is true.
- Conduct your experiment.
- Analyse the results of your experiments.
- If the results are conclusive, STOP. Else, re-run the experiments, or devise a better experiment and repeat.
In a student project, you may not have time to repeat your experiments, especially if they involve people, but you should design your evaluation in such a way that this would be possible, were you to continue the work.
About experiments
A good experiment should test one variable and one variable only. So, if your hypothesis is "neural network algorithms run faster in C than C++" then you will probably want to implement some neural network algorithms in both languages. You should make sure that the programs are as similar as possible, except for the language you are using. If you implement slightly different algorithms, it may be the algorithm and not the language which is causing any change in performance you observe. In this case, the programming language is called the independent variable and the algorithms are called the controlled variable and the speed is the dependent variable which is being measured.
Bad science? The case of Usability tests
Many students undertaking projects who have developed a web application, web site or content management system (often in response to a client brief) ask about the most suitable evaluation method for their project. In these cases precisely what to evaluate is often less clear cut than an experiment with an independent variable and a controlled variable (as in the "neural network algorithms run faster in C than C++" example above).
In order to settle on an evaluation method for such cases it is often necessary to return to the client and question them about their goals for the application you have built for them. The first question you need to ask is “Can my application be evaluated by automatic means?” i.e. for applications that are evaluated for technical performance (network performance, speed of execution, resource efficiency, memory performance, robustness against attack, etc) the answer is usually 'yes it can' and your evaluation may consist of running a number of automated performance tests and collating the results into comparison tables.
However, in cases where the client might be interested how humans (users) interact with the application and 'perform better' because of it, the evaluation solution is usually some form of 'user based test'. Your client may be interested in the performance of the application you have built in supporting users to achieve their goals. There are a number of parameters that could be tested:
Effectiveness: Can users actually perform and complete the (desired) specified task?
Efficiency: Can users do it quickly, without getting bored or frustrated?
Satisfaction: Is it fun, or at least pleasant to use?
Learnability: Can users 'pick up' the application without reaching for a manual or asking for help? Does it support learning? (see ISO 9241 section 11)
Different applications will have different emphases in terms of what you need to evaluate. Games, for example, need to be satisfying or challenging most of all. Terminal applications for call centres need to be efficient most of all. Most kinds of 'stand-alone' technology (Car park ticket machines, vending machines, ATM machines) need to be learnable most of all. All applications need to be effective.
At this point it is important to stress that working with real users introduces a large number of uncontrolled variables that cannot easily be 'designed out' of any user test you may undertake. The fundamental question about the 'scientific validity' of usability tests (i.e. whether you are 'really' evaluating user performance rather than the application's performance in the test) is very difficult (but not impossible) to answer.
There are two specific limitations with usability test data.
The reliability limitation: Is the data reliable?
No, if you are testing users who are not typical of the true intended user group, or if there are significant individual variations within the test group (this is made worse by small sample sizes*)
The validity limitation: Are the conditions under which the data was recorded reproducible?
No, if the test is run differently each time (users briefed differently, different equipment used, not quite the same test questions asked)
As Jeffrey Rubin points out [usability] “testing is always artificial” (Rubin, 1994, p.27). [Aside: There is a fascinating and long running debate on the scientific status of usability between Jakob Nielsen and Rolf Molich, particularly in relation to the small sample sizes championed by Nielsen. See here or here]
There is some good news about usability testing, however. Often it is entirely unnecessary to worry about the number of uncontrolled variables in a usability test because you are looking forindicators rather than proofs. In fact, if you think of a usability test as a 'design tool' rather than an 'experimental tool' you are closer to the way tests are used in the commercial world. This does not absolve you of the responsibility for designing and recording a test in which you have addressed the reliability and validity limitations OR that you have set appropriate benchmark metrics for judging the success (or not) of the application in supporting effectiveness, efficiency, satisfaction or Learnability. It does mean, however, that usability tests can, if designed properly, be a very good evaluation method for your project.
Mind your language: 'user friendly' and 'significant'.
There are two phrases you need to be very careful about using in your test reports.
User-friendly. Never use this phrase! Software cannot be friendly, it is not (yet) sentient. This is called ANTHROPOMORPHISM and should be avoided. To claim that an interface is 'user friendly' is also subjective and not testable. To claim, however, that an interface is USABLE is more sustainable as long as we measure against some predetermined metrics.
Significant. In normal English, "significant" means 'important', while in Statistics "significant" means not due to chance. For example, if I answer all questions in a multiple-choice quiz randomly, sometimes I will still pass the test. However, the number of times that I pass the test as a percentage of the number of tries I have in total should show that passing the test a few times just happened "by chance" and was therefore "not significant" in statistical terms. A research finding may be true without being important. When statisticians say a result is "highly significant" they mean it is very unlikely that the result happened by chance. They do not (necessarily) mean it is highly important. Be careful when you claim to have found 'significant' results.
(http://www.surveysystem.com/signif.htm)
Interpreting your results: correlation does not imply causation
Correlation by xkcd
When you perform an experiment, you are hoping that the outcome will lend some evidence to either your hypothesis or your alternate hypothesis. Going back to the example above, they hypothesis "neural network algorithms run faster in C than C++" has an alternate hypothesis "neural network algorithms run no faster in C than in C++". If we run an experiment to test this, and assume it's a fair experiment, and the results are that all our algorithms run faster in C, what has this told us? A naive answer would be that the experiments have confirmed the hypothesis that C is the faster language for this sort of algorithm. A more subtle answer would be that efficient neural networks are correlated with neural networks written in C. That means that when the algorithm is written in C it's likely to run quickly, which is what the experiment reported. This does not necessarily mean that the algorithms implemented in C ran quickly because they were written in C, it may be that there was some other factor involved that the experiment didn't effectively control.
In experimental work it is very important to understand this subtle distinction, otherwise you can easily fool yourself into believing that your experiments have discovered something far more conclusive than is actually possible.
To give you a better idea of how this distinction between correlation and causation works, below are some examples of incorrect conclusions drawn from perfectly reasonable correlations. See if you can work out why the conclusions are unreasonable:
- Children with bigger feet have higher reading ages. Therefore, people with bigger feet are more intelligent.
- Teenagers who text late at night have poor motivation in class (see news reports here). Therefore, using mobile phones leads to poor performance in class (see also a more skeptical analysis here).
- In the last 150 years there has been a dramatic increase in the number of people who report being abducted by aliens. There has also been a trend towards global warming. Therefore, alien abductions cause global warming.
In your own work, just be honest and straight forward about your results. If they aren't conclusive then say so and demonstrate your understanding by describing what future work could be done to gather more data.
Some basic dos and don'ts
This is some more specific advice, based on good and bad practice we have seen from students over the years:
- DO be clear and honest about what results your evaluation has obtained.
- DON'T claim to have "prooven" anything if you haven't written a formal, mathematical proof.
- DO use an appropriate experiment for your hypothesis. For example, if your work is about evaluating the performance or security of a technique, there is no need to involve real users in your evaluation. If your hypothesis is about usability you really must involve real users.
- DON'T use questionnaires unless you can guarentee to get a large sample size of answers (always well above thirty) and you understand the statistics needed to analyse the results. If you are in any doubt at all about this then seek the advice of a qualified statistician before you start your project. If you can't do that, think about using an alternative evaluation method such as semi-structured interviews.
Appendix
Example proof: The square root of 2 cannot be written as a fraction of whole numbers
Theorem
The square root of 2 cannot be written as a fraction of two whole numbers. (This is sometimes called the Theorem of Theaetetus)
Proof (by contradiction)
Imagine we could write the square root of 2 as a fraction of two whole numbers, say x/y where x and y are integers.
Let's say that x and y don't have any factors in common, so x/y is already written in its simplest form and no numbers can be "cancelled out" of the fraction.
So, we can also say that (x/y)*(x/y)=2
Therefore (x*x)/(y*y)=2
Therefore (x*x)=2*(y*y)
So we now know that x*x is even, since x is 2 times another number.
Since x*x is even, we also know that x is even (by the "Lemma" or little theorem that squares of odd numbers are never even).
Therefore, there must be a number, which we'll call z such that x=2*z
So, (2*z)*(2*z)=2*(y*y)
Or, more simply, 2*z*z=y*y
y must also be even, by the same argument that we used to say that x is even.
If y is also even, there must be some number, which we'll call w such that y=2*w
But if x/y=2z/2w then the fraction x/y was not in its simplest form like we assumed above.
This contradicts our initial assumptions, which must have been wrong.
So, the square root of 2 cannot be written as a fraction of whole numbers.
How to pass your final year thesis project
I've been thinking for a while that it would be good to distill some of the advice that colleagues and I give to students doing final year thesis project, so here it is -- enjoy!Think of yourself as an academic.Whatever you want to do when you leave University, your work will be written for academics and marked by them. A thesis project is like a small research project and to get the best marks available you should run your project with that in mind. So, you need to perform a thorough literature survey, follow a sound methodology, critically analyse your results and so on.Have a hypothesis (or a thesis statement or a research question).A lot of students approach their projects by saying "I want to do this", like "I want to invent a Foo, written in VB". This is not the best way to get good marks. Your project needs to have some sort of clear purpose and in the academic world it's best to state that purpose as a hypothesis, thesis statement or research question. These three are really just different ways of stating the same thing and which you choose is just a matter of taste. So, if your project is a study on the ratio of smokers which develop cancer (e.g. "I want to do some statistical analysis on smoking") then you might phrase that in one of these ways:
- Smoking is correlated with cancer -- hypothesis
- Smoking is correlated with cancer -- thesis statement
- Is smoking correlated with cancer? -- research question
Or, perhaps you want to design a new GUI widget to replace list boxes:
- Users will find WidgetFoo easier to use than ListBoxes -- hypothesis
- Users will find WidgetFoo easier to use than ListBoxes -- thesis statement
- Is WidgetFoo easier to use than a ListBox? -- research question
Part of the point of phrasing your work like this is that it should change the way you think about the work. You now have a very clear goal to reach -- to prove your hypothesis / thesis statement or to answer your research question. Everything you do in your project should now be directed towards this one goal.
Also, you have reduced your risk of failure. What if WidgetFoo turns out to be rubbish? Well, then you have still answered the research question or disproved the hypothesis -- you've got a result. In your write-up you can probably say a lot about why WidgetFoo wasn't as good as you thought, how you achieved your results and how other researchers can use your work to invent even better widgets.Produce something useful to others.So, you've got a hypothesis to answer. Great. But your teachers will still want to be convinced that the hypothesis you've chosen is worth six months of your time and lots of hours of their time. How do you know if your work is useful? Firstly, make sure you've read the relevant literature. Use Google, go to the library, talk to potential users. There should be a group of people in the world who have a clear reason to be interested in what you're doing. That might be a section of the research community, a user group, a company, whoever -- but there must be someone who wants your hypothesis validated or disproved. You should convince whoever's marking your thesis that this group of people exists by explaining in your thesis what the current literature says about the area you are working in and how your work contributes to this area.Do something (a bit) novel.
You're not expected to win a Nobel Prize on the back of your undergraduate work. However, there's no point in doing something that everyone has done a million times before. The worst example of this in Computer Science is a project which is basically "I'm going to write a website with a database". Usually the website is for a friend or relative and the database keeps track of users. There's nothing wrong with that if there's some real novelty in it (e.g. you've just invented a new sort of database and this is a demonstrator for it). But often this is something that most first years could complete for a coursework (so, it isn't stretching you) there are simple, free tools that can do the job for you (it's not novel) and the user is bogus (noone's interested in it). Choose wisely!
Have clear success / failure criteria.This should be taken care of when you write your hypothesis (or thesis statement or research question). However, it's important to know what will constitute a "successful" project for you. What results do you want to end up with? Once you know that, you can choose the appropriate methods to use in your work (e.g. will you be running a focus group? Writing a questionnaire? Writing some programs -- and testing them somehow?). You can also think about reducing the risks in your project. What if you don't finish part of your work on time? Is there some catch-up time in your schedule? What if early tests go badly? Is there time to repeat them?
Don't change the world.
Don't choose a project which is just far too big. If you're going to invent a new computer, writing an OS from scratch and a windowing environment for it is not a one-person, six-month, part-time job. Keep it small and give yourself some scope for extending the project if it goes better than you thought.Even if you've chosen something small with a clear hypothesis, once you've written out a project schedule it will still feel like far too much to do. You will feel overwhelmed. The trick to reducing your fear early on (and making sure you work to schedule) is to break down your work into small tasks. Each task should have it's own goal and deliverables with it's own success criteria and a deadline. So, if your hypothesis is "WidgetFoo is easier to use than ListBoxes", your sub-goals might be:
- Produce a literature survey on list widgets.
- Write comprehensive unit tests for WidgetFoo in the Gtk widget set.
- Implement a WidgetFoo in the Gtk widget set.
- Debug (goal is to pass all unit tests).
- Devise usability experiments (deliverable: methodology document).
- Write application software for testing in the experiments.
- Perform experiments.
- Analyse data.
- Write-up thesis.
Note that these tasks are dependent on one another. WidgetFoo cannot be written before you know how to test it. The experiments cannot be run until you have designed them and decided how to analyse the data they will produce.
Your new list will make life a lot easier, but you will probably still feel that it's too much to handle. To feel better about your work and motivate yourself, it's a good idea to take each of your sub-goals and write out the next physical action you need to perform to carry out that task. So, for the literature survey the next action might be "Google for 'list widget'". For the unit testing your next action might be "Find documentation about the UnitFoo testing framework". And so on. Most people find action lists much more motivating than task lists. Scroll down to see more stuff on productivity and where to keep your lists!Choose a project that will maintain your interest.
Six months is a long time. If you choose a boring project (maybe you think it'll be "easy") then you'll quickly lose interest, get bored, stop working and possibly fail. In some ways, it's good to choose a project with a lot of scope (so you can change direction a bit and still address your hypothesis) in an active area of research (so there's lots of work to build on). On the other hand, some people would find that sort of project unfocused and confusing.When you plan your project, think hard about what motivates you. Is it reaching towards a really interesting goal, or the fear of failing really badly? Either way (or both) you need to keep that motivation in mind whenever you're working. So, choose something interesting to do, give yourself at least a little scope for changing the details of the project over the year and keep in mind why you want to succeed (or not fail!).
Don't be too dependent on clients.If you have a client to work for, especially one from industry, be careful about how you plan your project. If you are expecting resources from your client what will you do if they don't turn up on time? If you want the client to test your work, what do you do if they get a major order in when you're ready to test and noone has the time to help you out? What if the client goes out of business? What if you have to sign a non-disclosure agreement -- can the University still mark your work?Having a "real" industrial client can be a big bonus. For one thing it's very easy to say that someone is really interested in what you're doing for the project. However, you need to make sure that if the client pulls out or doesn't cooperate you'll still have a viable project. Plan well and you won't have any problems.
Understand that research is not reading and a thesis is not a report.Most undergraduates (at least in Computer Science) seem to be pretty confused about what research really is. It certainly isn'tabout using Google or reading in the library. Research means adding something new to the body of knowledge on a particular subject. This is why it's so important to know what work has already been done (so you know your work is novel), to have a clear hypothesis (so you know what new understanding you're adding) and to write up your work well (so other researchers can use it).Also, understand the place of your thesis. You are not writing a report which tells people what you did. You are writing a thesiswhich tells people about the research you have done. This can be structured in whatever sensible way you prefer, but it needs to have the following parts:
- An introduction. What's your hypothesis? Why is your work interesting? What are your trying to achieve?
- A literature survey. What have other people done? What new knowledge will your work add? What is the current state of the art missing and how are you going to address that?
- Your methodology. How did you go about validating / disproving your hypothesis? Why is your method sound? Why should anyone trust your results?
- Your results. What did you do? How?
- Your analysis of your results. What do your results mean? Why are they interesting? Did you validate your hypothesis or disprove it?
- Conclusions. What did your work contribute and how could it be continued by others?
Eat well, sleep well, get some exercise and take a day off every week.Basically, look after yourself. To work productively you need to be in good physical and mental condition. If you feel ill or your not coping well with life, slow down a bit and take a break. Don't eat junk food all the time or you'll feel sleepy and miserable. Cut down on caffeine and alcohol or you'll get stressed and sleep badly. Sleep a sensible number of hours every night -- consistency is important. Get some exercise because you need endorphins to keep you happy and some oxygen getting to your brain. Omega-3 and the sorts of minerals that aren't found in hot dogs will keep your brain working sensibly.Most of all, take a whole day off University work every single week. It doesn't matter what you do with that day (have fun, earn money, write a novel, whatever) but working every day will limit your creativity massively. Most very creative (and productive) people find that their best ideas come after a day off. This gives your mind a chance to consolidate all the material you have learned, synthesise it and solve some of the problems you've been considering. In fact, to revise for an exam or solve an interesting problem, it's a good idea to spend a few days working really hard at reading everything you need to know and taking notes, then take a day off directly before the exam or the day before you're going to write the solution to your problem. This will give you the best chance to properly understand everything you're working and be creative about it.Be productive but don't spend time on productivity.You need to organise yourself well, which is a difficult problem in itself. However, if you spend even five per cent of your time on productivity management (e.g. using Microsoft Project!) then that is far, far too much and a massive waste of your most important resource -- your time.Good productivity strategies are effortless, effective and fun to use -- and take almost no time at all. I can recommend David Allen's Getting Things Done strategy and RememberTheMilk for managing lists of action items.
One thing you can do to really give yourself a head-start in the workplace is to estimate how much time it'll take you to do every single task in your project. You'll start out finding that your estimates are stupidly far out, but as your project progresses you'll get better and better at correctly estimating your tasks. Joel Spolsky has a nice essay on how to do this simply, which you can use with RememberTheMilk
