Homework Review, In groups discuss:

What are the hypotheses?

How are they tested?

Are they testable? How?

Are they empirical? Why or why not?


Reminders:

Books with project ideas on reserve in library.

Look at the example annotated bibliography.

Homework assignment due on Thursday.





Statistics

Why do we need statistics?

1-Test hypothesis (What is the null hypothesis?)

2-Find relationships

3-Describe data

A-average

B-quartile (ACT tests)

C-percentile


Statistics gives an accepted method for determining differences or lack of differences (i.e. testing a hypothesis). A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.


What if test had only 4 multiple choice questions and only one person took it and was rolling dice to determine answer? How many times could the person take the test with dice and get an 80% or better? The probability is high (over 1/20) that it will happen. If 100 people took the test the chances of getting an average 80% or better by rolling dice go way down (less than 1/20). If the test has 100 questions the possibility goes way down also.


Consider a commercial that claims that four out of five dentists recommend toothpaste X. If only five dentists were actually consulted would you be impressed? Would you not be more motivated to buy it if 4,000 out of 5,000 dentists recommended toothpaste X, in spite of the fact that 4/5 and 4000/5000 are both 80%? In like manner, statistical formulas take into consideration factors such as the number of subjects, responses, and test items when calculating the statistical significance.


In other words, an 80% vs. 85% score may not be significant if there are few test takers and few items, but an 80% vs. 81% may be significant if the test is long and many people took the test. Statistics takes this into consideration.


Types of Data Type of statistic depends on type of data.

Nominal-characteristics-sex, race, national origin, native speaker? yes/no

Ordinal-order, no fixed interval (1st, second, third place in a race)

Ratio-seconds elapsed, $, age, weight (fixed interval, absolute zero makes sense)

Interval-Celsius, IQ, on scale of 1-5 judgment (fixed interval, absolute zero makes no sense)


Variables Whatever you are measuring or manipulating

Dependent variable- what you are measuring

Independent variable- things that may affect what you are measuring



Statistics you will learn to use in this class (This is an overview. We'll get into specifics later.)


Correlation Are two ratio or interval measurements related to each other? Are two ordinal measurements related to each other? How related are they?















(Correlation .79)

















(Correlation -.63)


(Charts taken from http://www.nvcc.edu/home/elanthier/methods/correlation.htm)




What does the correlation coefficient “R” mean?


R indicates the strength and direction of the correlation. Look at the above graph of “income and years of education”. The coefficient is +.79. Since it is positive in means that higher incomes are related to higher education, and lower incomes to lower education. The correlation is positive because they both move in the same direction which results in an upwardly sloping line. If there were a perfect +1 correlation all of the dots would fall exactly on the line. This would mean that every year of education would be related to exactly a certain dollar amount in salary. Exact correlations are not often found in behavioral studies, but are in nature. For example, if you filled a container with water and weighed it, the weight of the water is perfectly correlated with the weight of the water; all of the volume and weight dots would be on the line.


A negative correlation occurs on the “GPA and hours of TV graph” (-.63). It is negative because more TV is correlated with a lower GPA, and a higher GPA is correlated with less TV; the two move in opposite directions. A perfect negative correlation of -1 would have all of the dots on the line and slope downward.


Coefficients close to zero indicate that there is no (or very little) relationship between the variables. The line would have no slope and the dots would be scattered all over the chart.


Click on this chart to look at an interactive graphic representation of what the coefficients represent. Click on the chart to add data points and see how the correlation changes.



What does “pmean?


P is not specific to correlation, but is used for statistics in general. I represents the probability that the data that was analyzed could have occurred by random chance. You are interested in results that are NOT random. Imagine that you teach English to two classes using two different methods. You then give the students a test to see if one method leads to higher test scores that the other. You want to be able to show that the test scores in one class are significantly better than the other. If the difference in scores could be obtained by chance then you can't conclude that one method is better than the other.


Significance is defined as a smaller than 1 in 20 probability of occurring by chance. The statistics programs make this calculation. In sum, if the p is 0.05 or SMALLER, than means the results are significant, that is they is a small probability of getting the results by chance. The SMALLER the p value the BETTER (more significant, less likely due to chance) the results.




Some possible linguistic correlations:


What is the past test of spling?

What is the past tense of creeze?

Computer

People

Computer

People

splung 35%

splung 22%

croze 12%

croze 6%




Correlation and Causation




Practice with correlations

Find how many different words begin with these prefixes in the BNC academic and spoken sections:

poly-

super-

hyper-

retro-

mono-

mega-

contra-

Now do a correlation by putting the number of different words (tokens) or each prefix in both registers side by side. Put the number of tokens in the academic search in the left column and the number of tokens from the spoken search for the same word next to it in the right column. Use this Correlation Calculator. What are the R and p values that are calculated? Is there a significant difference?


Search Instructions

In order to find the number of tokens of words beginning with poly- in the academic register of the BNC make sure the left panel is filled out this way:





The number of hits is set at an arbitrary high number (i.e. 33333) so that it shows all of the tokens. If it's set at the default 100 it will just show the first 100. You have to click on “Hide Options” in order for the bottom half of the screen to appear.