CO2 and Temperature: What is the Correlation?

(c) 2009 by Barton Paul Levenson



Some global warming deniers claim that the correlation between temperature and carbon dioxide is vague and coincidental. I'll check that here with what's called a "linear regression." This means essentially that I take two columns of numbers and graph them against each other (in mathematical space if not actually on a screen or paper), and then find the straight line that comes closest to all the points. I can then compute how "significant" the regression is, which means how likely the relationship was to come about by chance. In general statisticians look for a probability of 5% or less that the relationship is due to chance, or "p < 0.05". If you achieve that, chances are you have something real going on.

The raw data is in a table below. The columns are:

Here's the actual data:

YearAnomlnCO2CO2
1880-125.6723290.7
1881-125.6740291.2
1882-15.6757291.7
1883-55.6771292.1
1884-435.6788292.6
1885-245.6802293.0
1886-265.6812293.3
1887-465.6822293.6
1888-245.6829293.8
188965.6836294.0
1890-215.6843294.2
1891-555.6846294.3
1892-395.6853294.5
1893-405.6856294.6
1894-335.6860294.7
1895-335.6863294.8
1896-275.6866294.9
1897-165.6870295.0
1898-205.6877295.2
1899-255.6887295.5
1900-65.6897295.8
1901-55.6907296.1
1902-305.6920296.5
1903-365.6931296.8
1904-425.6944297.2
1905-265.6958297.6
1906-155.6974298.1
1907-405.6988298.5
1908-305.7001298.9
1909-315.7014299.3
1910-215.7028299.7
1911-255.7041300.1
1912-335.7051300.4
1913-295.7064300.8
1914-35.7074301.1
191555.7084301.4
1916-205.7094301.7
1917-475.7108302.1
1918-355.7118302.4
1919-95.7127302.7
1920-175.7137303.0
1921-55.7151303.4
1922-105.7164303.8
1923-175.7174304.1
1924-125.7187304.5
1925-165.7203305.0
192645.7216305.4
1927-65.7229305.8
1928-15.7246306.3
1929-235.7262306.8
1930-45.7275307.2
193125.7291307.7
193245.7307308.2
1933-115.7320308.6
193455.7333309.0
1935-85.7346309.4
193625.7359309.8
1937125.7366310.0
1938155.7372310.2
1939-25.7375310.3
1940145.7379310.4
1941115.7379310.4
1942105.7375310.3
194365.7372310.2
1944105.7369310.1
1945-15.7369310.1
194605.7369310.1
1947125.7372310.2
1948-35.7375310.3
1949-95.7382310.5
1950-175.7388310.7
1951-25.7401311.1
195245.7414311.5
1953125.7427311.9
1954-95.7443312.4
1955-85.7462313.0
1956-185.7481313.6
195785.7500314.2
195895.7523314.9
195955.7557316.0
1960-25.7586316.9
1961105.7609317.6
196255.7635318.5
196325.7652319.0
1964-255.7668319.5
1965-155.7684320.0
1966-85.7726321.4
1967-25.7750322.2
1968-95.7778323.1
196905.7827324.6
197045.7859325.7
1971-105.7879326.3
1972-55.7913327.5
1973185.7981329.7
1974-65.7998330.3
1975-25.8026331.2
1976-215.8056332.2
1977165.8108333.9
197875.8156335.5
1979145.8196336.9
1980285.8251338.7
1981405.8287339.9
198295.8323341.1
1983345.8371342.8
1984155.8419344.4
1985125.8462345.9
1986195.8498347.2
1987355.8549348.9
1988425.8622351.5
1989285.8662352.9
1990485.8698354.2
1991445.8738355.6
1992155.8760356.4
1993195.8778357.0
1994325.8830358.9
1995465.8885360.9
1996395.8934362.6
1997415.8965363.8
1998725.9044366.6
1999465.9089368.3
2000425.9121369.5
2001575.9163371.0
2002685.9218373.1
2003675.9286375.6
2004605.9333377.4
2005765.9393379.7
2006665.9450381.8
2007745.9495383.6

Here's the data in a graph:

chart image

The units for temperature anomaly should be cK, of course, not K. Sorry about that. Excuse the fuzziness; it's a .gif file.

I used Microsoft Excel to run the regression. The data points covered the period from 1880 to 2007 inclusive, so there were N = 128 data points. The regression line I found was:

Anom = -1876.715416   + 325.8718284 ln CO2
        (-20.19118332)  (20.20270777)

The numbers in parentheses are "t-statistics," and they measure how significant the numbers above them are. The coefficient of the CO2 term is significant at p < 2.4483 x 10-41. That means the chances against the relationship being coincidental are less than 1 in about 4 x 1040.

The correlation coefficient is about 0.874, which means 76.4% of the variance is accounted for. Every other factor that affected temperature during this time span, then, accounted for 23.6%.

More than coincidence? You be the judge!




Note added 15 March 2009:

Some half-educated global warming deniers pointed out that I didn't account for "autocorrelation in the residuals" in my regression, and said that my analysis was therefore invalid. There is something called the "spurious correlation problem" when dealing with two series increasing over time--they may seem to be correlated just because both are rising. There are ways to compensate for this. I performed a technique called "Cochrane-Orcutt iteration" in which I adjusted the regression coefficients using a factor called the "estimated autocorrelation coefficient of the residuals."

This factor, designated ρ with a circumflex over it or "rho-hat," measures the tendency of positive errors in the estimate to be followed by positive errors again, or negative errors by negative errors, and so on. Rho-hat for the regression above is about 0.4, so I transformed the two data series using an equation in rho-hat and ran the regression again, which gave a different, smaller figure for rho-hat. I repeated until the results stabilized. Rho-hat was down to insignificant levels (less than 0.1). The regression was no longer optimal by the simple least-squares criterion, but it still accounted for 60% of the variance.

So changes in carbon dioxide are still heavily linked to changes in temperature, and the correlation is not vague and coincidental. That was all I was trying to prove, and that point stands.

Thanks to David A. Benson for letting me know what was being said in the blogs.



Page created:01/23/2009
Last modified:  02/05/2011
Author:BPL