Saturday, November 07, 2009

Second attempt at Logit Regression for Hot Water Demand in Cairo

4 Tables for creating multiple regression models for the Culhane Thesis.

 

For H1: the following would be the mathematical expression of the hypothesis, with the focus on whether infrastructure investment and household preferences matter, and how, controlling for household attributes (such as income and education)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(household attributes, investment in household infrastructure, household preferences, presence of conventional heaters)

 

Independent Variable

Value Range

What variable is supposed to measure/explain

VTEFS

Total Expenses Divided by Family Size (proxy for income)

LE 31.18  - LE 2,622.33 (Mean 315 LE, Median 235 LE)

Household attribute, expenses reflect income and income is often correlated with adoption of  or desire for „modern conveniences“

VEdu

Education (Literacy)

0 = Illiterate, 1 = Some education or more

Household attribute, formal education is often correlated with adoption of  or desire for „modern conveniences“.

VHwp

Presence of hot water pipes

0 = not present, 1 = present

Infrastructure

If the required infrastructure has not been invested in „modern conveniences“ cease to be convenient. Conventional heaters would be absent and ranked lower.

VWa

Water availability

0 = Cut frequently or unavailable; 1 = always available

Infrastructure

If the required infrastructure has not been invested in „modern conveniences“ cease to be convenient. Conventional heaters would be ranked lower.

VCb

Ceramics in bathroom

0 = no ceramics, 1 = ceramics

Infrastructure

Ceramics are an indicator of investment in a „finished bathroom“. In incremental housing situations often people will not invest in bathroom appliances if the bathroom walls and floors have not yet been tiled with ceramics.

VSu

Seasonal Use of Hot Water

0 = Winter use only; 1 = all year

Household Preferences

If families don't use hot water all year round it might not be worth their while to invest in dedicated hot water appliances when they can use stoves that have other utilities for the same purpose. They would tend to rank conventional heaters lower.

VDf

Does the way in which you get hot water make a difference to you?

0 = No; 1 = Yes

Household Preferences

If respondents claim that it doesn't make a difference how they got their hot water then hot water appliances and use of stoves fall on the same place on their indifference curve, either rendering the same level of utility (satisfaction) for the consumer. We should expect more people who respond „no“ to have traditional heaters and rank conventional heaters lower

VWaterHeater

Presence of conventional heater (replaces hot water pipes in bathroom, with which it is correlated at .6)

0 = No conventional water heater; 1 = Conventional gas or electric heater

Household preferences

Absence of a dedicated hot water heater, if a family values such appliances and is not indifferent to them, would make them more likely to rank it first if discretional income were available. Presence would make them more likely to desire something else.

 

None of these variables are correlated any higher than .131 (for hot water pipes and total expenses divided by family size); all fall below the .4 mathematical specification.

 

Table 1: Dependent  = Presence of hot water appliances (VWaterHeater)

 

Presence of hot water appliance (VWaterHeater) = f(VTEFS + VEdu + VHwp + VWa + VCb  + VSu + VDf)

 

 

Model 1:Logit, using observations 1-463

Dependent variable: VWaterHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

p-value

 

const

-2.54113

0.434805

-5.8443

<0.00001

***

VTEFS

0.000430134

0.000470928

0.9134

0.36104

 

VEdu

0.340018

0.244153

1.3926

0.16373

 

VHwp

3.1683

0.353803

8.9550

<0.00001

***

VWa

0.0533421

0.246522

0.2164

0.82869

 

VCb

0.675961

0.291117

2.3220

0.02024

**

VSu

-1.0108

0.298142

-3.3903

0.00070

***

VDf

0.36334

0.272831

1.3317

0.18295

 

 

Mean dependent var

 0.542117

 

S.D. dependent var

 0.249997

McFadden R-squared

 0.321120

 

Adjusted R-squared

 0.296064

Log-likelihood

-216.7546

 

Akaike criterion

 449.5093

Schwarz criterion

 482.6111

 

Hannan-Quinn

 462.5405

 

 

Number of cases 'correctly predicted' = 370 (79.9%)

f(beta'x) at mean of independent vars = 0.250

Likelihood ratio test: Chi-square(7) = 205.056 [0.0000]

 

 Predicted

              0     1

  Actual 0  131    81

         1   12   239

 

Excluding the constant, p-value was highest for variable 9 (Vwa)

 

Sequential elimination using two-sided alpha = 0.10

 

 Dropping VWa              (p-value 0.829)

 Dropping VTEFS            (p-value 0.365)

 Dropping VDf              (p-value 0.210)

 Dropping VEdu             (p-value 0.152)

Convergence achieved after 6 iterations

 

Comparison of Model 1 and Model 5:

 

  Null hypothesis: the regression parameters are zero for the variables

    VTEFS, VEdu, VWa, VDf

 

  Test statistic: Robust F(4, 455) = 1.09554, with p-value = 0.358149

  Of the 3 model selection statistics, 3 have improved.

 

 

Model 5:Logit, using observations 1-463

Dependent variable: VWaterHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

Slope*

const

-2.06686

0.340551

-6.0692

 

 

VHwp

3.18641

0.347136

9.1791

0.627197

 

VCb

0.741604

0.284855

2.6034

0.182905

 

VSu

-0.902722

0.28868

-3.1271

-0.220217

 

 

Mean dependent var

 0.542117

 

S.D. dependent var

 0.249994

McFadden R-squared

 0.313883

 

Adjusted R-squared

 0.301355

Log-likelihood

-219.0652

 

Akaike criterion

 446.1305

Schwarz criterion

 462.6814

 

Hannan-Quinn

 452.6461

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 367 (79.3%)

f(beta'x) at mean of independent vars = 0.250

Likelihood ratio test: Chi-square(3) = 200.435 [0.0000]

 

Equation:

^VWaterHeater = -2.07 + 3.19*VHwp + 0.742*VCb - 0.903*VSu

               (0.341) (0.347)     (0.285)     (0.289)

 

n = 463, R-squared = 0.314

(standard errors in parentheses)

 

 

The greatest predictor of the presence of a Hot Water Heater is the Presence of Hot Water Pipes, suggesting that infrastructure is of paramount importance in explaining the absence of conventional hot water heaters. The second largest predictor is Seasonal Use of Hot Water, which paradoxically has a negative coefficient, suggesting that those who don't use water all year are more likely to have hot water heaters. But the coefficient is very small.   This paradox might be explained by noting that the community with the most hot water heaters (Darb Al Ahmar) also reports the greatest seasonality in hot water use, while those with the fewest hot water heaters (the Zabaleen) report more year-round use.  When considered alone the Darb Al Ahmar data shows a positive coefficient for seasonality (those who use hot water all year round are more likely to have heaters). When considered alone the Zabaleen data still shows a small negative coefficient – those using hot water all year round are less likely to have heaters.  This underscores the differences between these two communities (in Darb Al Ahmar, conventional heaters are part of the norm and those who use a lot of hot water would be expected to have them; by contrast in Zabaleen, where infrastructure is deficient, and people are used to boiling on the stove, those who use a lot of hot water would tend to rely on their stoves rather than incur the extra costs and troubles of using conventional heaters that may fail for a number of reasons.) The equation might be improved by using ethnicity rather than seasonality as the variable.  The third strongest predictor is Presence of Ceramics in Bathroom (Vcb), another indication that infrastructure is of paramount importance.  Expenditures, the proxy for income, seems to have no significant effect on consumer choice for hot water heating and was omitted from the model.

 

 

 

Table 2: Dependent = Ranking of hot water heater first if discretional income available

 

Rank hot water heater first(VwouldBuyHeaterFirst)  = f(VTEFS + Vedu + Vhwp  +  VWa + VCb + VSu + VDf )

 

None of these variables are correlated any higher than .337 (Ceramics and Hot Water Appliance); all fall below the .4 mathematical specification.

 

 

Model 2:Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

p-value

 

const

-2.20432

0.479684

-4.5954

<0.00001

***

VTEFS

-0.000400473

0.000592473

-0.6759

0.49908

 

VEdu

0.0135344

0.270104

0.0501

0.96004

 

VHwp

-1.37505

0.322708

-4.2610

0.00002

***

VWa

1.05978

0.277369

3.8208

0.00013

***

VCb

-0.0595799

0.332188

-0.1794

0.85766

 

VSu

1.24174

0.378021

3.2849

0.00102

***

VDf

0.255568

0.331383

0.7712

0.44058

 

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.113479

McFadden R-squared

 0.143848

 

Adjusted R-squared

 0.106308

Log-likelihood

-182.4530

 

Akaike criterion

 380.9060

Schwarz criterion

 414.0078

 

Hannan-Quinn

 393.9373

 

 

Number of cases 'correctly predicted' = 394 (85.1%)

f(beta'x) at mean of independent vars = 0.113

Likelihood ratio test: Chi-square(7) = 61.3102 [0.0000]

 

Predicted

              0     1

  Actual 0  369    14

         1   56    24

 

Sequential elimination using two-sided alpha = 0.10

 

 Dropping VWa              (p-value 0.829)

 Dropping VTEFS            (p-value 0.365)

 Dropping VDf              (p-value 0.210)

 Dropping VEdu             (p-value 0.152)

Convergence achieved after 6 iterations

 

Comparison of Model 2 and Model 6:

 

  Null hypothesis: the regression parameters are zero for the variables

    VTEFS, VEdu, VCb, VDf

 

  Test statistic: Robust F(4, 455) = 0.271741, with p-value = 0.896164

  Of the 3 model selection statistics, 3 have improved.

 

Sequential elimination using two-sided alpha = 0.10

 

 Dropping VEdu             (p-value 0.960)

 Dropping VCb              (p-value 0.858)

 Dropping VTEFS            (p-value 0.497)

 Dropping VDf              (p-value 0.437)

Convergence achieved after 6 iterations

 

 

Model 6:Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

Slope*

const

-2.19212

0.383377

-5.7179

 

 

VHwp

-1.40557

0.27176

-5.1721

-0.197864

 

VWa

1.08351

0.272178

3.9809

0.139057

 

VSu

1.29181

0.362077

3.5678

0.12627

 

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.114456

McFadden R-squared

 0.140990

 

Adjusted R-squared

 0.122220

Log-likelihood

-183.0620

 

Akaike criterion

 374.1241

Schwarz criterion

 390.6750

 

Hannan-Quinn

 380.6397

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 393 (84.9%)

f(beta'x) at mean of independent vars = 0.114

Likelihood ratio test: Chi-square(3) = 60.0921 [0.0000]

 

Equation:

 

^VWouldBuyHeater = -2.19 - 1.41*VHwp + 1.08*VWa + 1.29*VSu

                  (0.383) (0.272)     (0.272)    (0.362)

 

n = 463, R-squared = 0.141

(standard errors in parentheses)

 

The presence of a water heater is the strongest predictor of families ranking a hot water heater first in preference if discretional income available but since it is highly correlated with the presence of hot water pipes ( Pearson's r = 0.6031)  we excluded it from this analysis, and used Vhwp instead. The negative coefficient suggests that if you have hot water pipes you are unlikely to want to buy a new heater, but that is because those that have hot water pipes tend to have a hot water heater already. (Hot water heater presence also has a negative coefficient.)  The second greatest predictor is WATER AVAILABILITY; if you have water you are more likely to want to buy a water heater.  Seasonality is the next strongest predictor; if you use hot water all year round you are more likely to want to buy a water heater if you have discretional income.

 

 

 

 

For H2: the following would be the mathematical expression of the  hypothesis, with the focus on ethnicity, length of time lived in the community, and source of hot water (municipal vs. self-provisioning)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(ethnic dummy, cultural factors, municipal vs self-provisioning, expenditures as income proxy, employment, other household attributes)

 

 

Independent Variable

Value Range

What variable is supposed to measure/explain

VEth

Ethnic Community

0 = Darb Al Ahmar, 1 = Zabaleen

Ethnic dummy

Historical legacy issues and cultural norms may have more explanatory power for determing the indifference curves of different groups that otherwise share similar income and educational characteristics.

VTlt

Type of toilet

(replaces „hot water pipes“ variable with which it is correlated at .504)

0 = Balady (squat) 1 = European (throne)

Cultural Factor:

This could be a proxy for modernist thinking in bathroom provisions.  It could be argued that those who maintain the squat  toilet tradition might also maintain the stove heating tradition, while those who adopt European toilets also tend to adopt European water heating technologies.

VTth

Time to Heat Water in Minutes

(Pearsons r's for this are all low)

Range: 1 minute to 180 minutes, Mean: 32 minutes, SD 32.213, N = 463

Infrastructure

Assumption would be that those families which must spend longer heating water would tend to rank a modern heating appliance higher; however in this community, given family size and tendency to unplug heaters,  electric heaters can take as long or longer than stoves.

VLt

Length of Time in Community

1 = less than a year, 2 = 1 to 5 years, 3 = 6 to 10 years, 4 = 11 to 20 years, 5 = 21 to 40 years, 6 = more than 40 years

Cultural factor

In Incremental Housing the more time you have spent in a community the more likely you are to have accumulated consumer goods if they are valued.

VGft

What water heating system would you choose as gift? (is correlated with Veth, Ethnicity, at -.059)

0 = Unconventional (Babur, Hamil, Stove, Solar) 1 = Conventional (Gas or Electric Appliance)

Cultural Factor

Darb Al Ahmar answers were 18.6 % Unconventional and 81.4 % Conventional

By contrast the Zabaleen answers were  the reverse: 78.4 % Unconventional, 21.6% Conventional. This underscores the cultural differences between the communities.  When solar is considered separately Darb Al Ahmar values are 5.6 % Traditional, 81.4 % Conventional and 13 % Solar; Zabaleen are 5.6 % Traditional, 21.6 % conventional and 72.8 % Solar.

VTEFS

Total Monthly Houshehold Expenses Divided by Family Size (proxy for income)

LE 31.18  - LE 2,622.33 (Mean 315 LE, Median 235 LE)

Household attribute: Income

See Table 1

VWrk Employment: Work of Head of Household

0 = Uncertain, 1 = Certain

Household attribute: Employment

It is often assumed that those with uncertain employment are reluctant to tie themselves to „conveniences“ that incur running costs they cannot maintain.  Workers with uncertain income would tend to favor traditional multi-purposing of simple tools like stoves, that permit self-provisioning without locking them into payment schedules that could get them into trouble.

VWaterHeater

Presence of conventional heater (replaces type of toilet , with which it is correlated at  .52)

0 = No conventional water heater; 1 = Conventional gas or electric heater

Household preferences

Absence of a dedicated hot water heater, if a family values such appliances and is not indifferent to them, would make them more likely to rank it first if discretional income were available. Presence would make them more likely to desire something else.  Source of Hot Water may also be viewed as a cultural variable – self provisioning vs. Dependence on municipal provisions.

 

All Pearson's are under .4 for these variables.

 

 

Table 3: Dependent = Presence of hot water appliances

 

Presence of hot water appliance(VWaterHeater)  = f(VEth + VTlt  + VTth + VLt + VGft + VTEFS  + VWrk)

 

 

 

 

 

 

 

 

 

 

Model 3:Logit, using observations 1-463

Dependent variable: VWaterHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

p-value

 

const

-2.78649

0.605502

-4.6019

<0.00001

***

VEth

-1.52945

0.321767

-4.7533

<0.00001

***

VTlt

2.53628

0.359063

7.0636

<0.00001

***

VTth

0.0175271

0.00419164

4.1814

0.00003

***

VLt

0.0398523

0.0957539

0.4162

0.67727

 

V1Gft

0.227628

0.301025

0.7562

0.44954

 

VTEFS

0.0015666

0.000545688

2.8709

0.00409

***

VWrk

0.948621

0.251322

3.7745

0.00016

***

 

Mean dependent var

 0.542117

 

S.D. dependent var

 0.249336

McFadden R-squared

 0.332493

 

Adjusted R-squared

 0.307437

Log-likelihood

-213.1234

 

Akaike criterion

 442.2468

Schwarz criterion

 475.3486

 

Hannan-Quinn

 455.2781

 

 

Number of cases 'correctly predicted' = 360 (77.8%)

f(beta'x) at mean of independent vars = 0.249

Likelihood ratio test: Chi-square(7) = 212.318 [0.0000]

 

Predicted

                     0     1

  Actual 0  145    67

               1   36   215

 

Excluding the constant, p-value was highest for variable 3 (Vlt)

 

Sequential elimination using two-sided alpha = 0.10

 

 Dropping VLt              (p-value 0.677)

 Dropping V1Gft            (p-value 0.443)

Convergence achieved after 6 iterations

 

  Predicted

              0     1

  Actual 0  144    68

         1   36   215

 

Comparison of Model 3 and Model 7:

 

  Null hypothesis: the regression parameters are zero for the variables

    VLt, V1Gft

 

  Test statistic: Robust F(2, 455) = 0.384424, with p-value = 0.681063

  Of the 3 model selection statistics, 3 have improved.

 

 

Model 7:Logit, using observations 1-463

Dependent variable: VWaterHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

Slope*

const

-2.40752

0.425023

-5.6644

 

 

VEth

-1.68338

0.266822

-6.3090

-0.396673

 

VTlt

2.52551

0.356426

7.0856

0.533237

 

VTth

0.0173514

0.00417427

4.1567

0.00432572

 

VTEFS

0.00159791

0.000550712

2.9015

0.00039836

 

VWrk

0.938408

0.250985

3.7389

0.228675

 

 

Mean dependent var

 0.542117

 

S.D. dependent var

 0.249301

McFadden R-squared

 0.331262

 

Adjusted R-squared

 0.312469

Log-likelihood

-213.5166

 

Akaike criterion

 439.0331

Schwarz criterion

 463.8595

 

Hannan-Quinn

 448.8066

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 359 (77.5%)

f(beta'x) at mean of independent vars = 0.249

Likelihood ratio test: Chi-square(5) = 211.532 [0.0000]

 

 

^VWaterHeater = -2.41 - 1.68*VEth + 2.53*VTlt + 0.0174*VTth + 0.00160*VTEFS + 0.938*VWrk

               (0.425) (0.267)     (0.356)     (0.00417)     (0.000551)      (0.251)

 

n = 463, R-squared = 0.331

(standard errors in parentheses)

 

 

The strongest predictor of presence of a water heater appears to be presence of a European style toilet.  The second strongest predictor is ETHNICITY – the negative coefficient suggests that if you are a Zabaleen you are UNLIKELY to have a hot water heater.  Vwrk (Steady  =1 or Unsteady = 0) plays a weak role .  Time to heat water has a weak but positive coeficient, suggesting that those who take a lot of time to heat water are those with water heaters (electric heaters take the longest when you turn them on and off).  VTEFS (income divided by family size) has a very weak coefficient.

 

 

Table 4: Dependent = Rank hot water appliance first if discretional income available (VwouldBuyHeaterFirst).

 

 Ranking of hot water heater(VwouldBuyHeaterFirst)  =   f(VEth + VTlt  + VTth + VLt + VGft + VTEFS + VWrk )

 

All Pearson's are under .4 for these variables.

 

 

 

 

 

 

 

Model 4:Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

p-value

 

const

-0.485788

0.659361

-0.7368

0.46127

 

VEth

1.45443

0.428228

3.3964

0.00068

***

VTlt

-0.684986

0.296183

-2.3127

0.02074

**

VTth

-0.011735

0.00413363

-2.8389

0.00453

***

VLt

-0.221356

0.0983295

-2.2512

0.02438

**

V1Gft

-0.850322

0.368478

-2.3077

0.02102

**

VTEFS

-0.00080205

0.000678694

-1.1818

0.23730

 

VWrk

0.551315

0.287883

1.9151

0.05548

*

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.099221

McFadden R-squared

 0.203548

 

Adjusted R-squared

 0.166008

Log-likelihood

-169.7303

 

Akaike criterion

 355.4607

Schwarz criterion

 388.5625

 

Hannan-Quinn

 368.4920

 

 

Number of cases 'correctly predicted' = 390 (84.2%)

f(beta'x) at mean of independent vars = 0.099

Likelihood ratio test: Chi-square(7) = 86.7555 [0.0000]

 

 Predicted

                     0     1

  Actual 0  374     9

              1   64    16

 

Excluding the constant, p-value was highest for variable 5 (VTEFS)

 

Sequential elimination using two-sided alpha = 0.10

 

 Dropping VTEFS            (p-value 0.237)

Convergence achieved after 6 iterations

 

 Predicted

              0     1

  Actual 0  375     8

         1   63    17

 

Comparison of Model 4 and Model 8:

 

  Null hypothesis: the regression parameter is zero for VTEFS

  Test statistic: Robust F(1, 455) = 1.39654, with p-value = 0.237921

  Of the 3 model selection statistics, 2 have improved.

 

 

Model 8: Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

Slope*

const

-0.653934

0.659205

-0.9920

 

 

VEth

1.41303

0.427881

3.3024

0.145381

 

VTlt

-0.753585

0.292868

-2.5731

-0.0873367

 

VTth

-0.0116416

0.00420082

-2.7713

-0.00116226

 

VLt

-0.226955

0.0988579

-2.2958

-0.0226585

 

V1Gft

-0.870901

0.367965

-2.3668

-0.0888497

 

VWrk

0.61605

0.285801

2.1555

0.0641614

 

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.099837

McFadden R-squared

 0.198706

 

Adjusted R-squared

 0.165859

Log-likelihood

-170.7622

 

Akaike criterion

 355.5243

Schwarz criterion

 384.4884

 

Hannan-Quinn

 366.9267

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 392 (84.7%)

f(beta'x) at mean of independent vars = 0.100

Likelihood ratio test: Chi-square(6) = 84.6919 [0.0000]

 

This run of the model would not allow the productin of an equation.

 

In order to produce an equation we have to eliminate the constant, which had a p-value of 0.4613.

 

 

Model 9: Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

QML standard errors

 

Coefficient

Std. Error

z-stat

Slope*

VEth

1.11926

0.288673

3.8773

0.116878

 

VTlt

-0.869528

0.268956

-3.2330

-0.105663

 

VTth

-0.0122597

0.00427295

-2.8691

-0.00125695

 

VLt

-0.291359

0.0733533

-3.9720

-0.0298721

 

V1Gft

-1.00727

0.328476

-3.0665

-0.106021

 

VWrk

0.582418

0.287206

2.0279

0.0620986

 

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.102527

McFadden R-squared

 0.196296

 

Adjusted R-squared

 0.168141

Log-likelihood

-171.2758

 

Akaike criterion

 354.5517

Schwarz criterion

 379.3780

 

Hannan-Quinn

 364.3251

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 388 (83.8%)

f(beta'x) at mean of independent vars = 0.103

Likelihood ratio test: Chi-square(6) = 83.6645 [0.0000]

 

 Null hypothesis: the regression parameters are zero for the variables

    const, VTEFS

 

  Test statistic: Robust F(2, 455) = 1.14366, with p-value = 0.319564

  Of the 3 model selection statistics, 3 have improved.

^VWouldBuyHeater =  + 1.12*VEth - 0.870*VTlt - 0.0123*VTth - 0.291*VLt - 1.01*V1Gft + 0.582*VWrk

                  (0.289)        (0.269)      (0.00427)     (0.0734)    (0.328)      (0.287)

 

n = 463, R-squared = 0.196

(standard errors in parentheses)

 

 

 

The Zabaleen are positively correlated with desire for a hot water heater.  The strongest predictor of whether a family ranks a conventional hot water heater 1st (most preferred and would buy if discretional income available) seems to be the Ethnicity. The next best predictor is  „would choose a hot water heater as a gift“. The negative coefficient suggests that if one desires one as a gift one is unlikely to be able to purchase one.  Type of toilet has a negative coefficient, consistent with the idea that if you have a European toilet you are likely to already have a water heater and thus uninterested in buying a new one.  Work, whether steady (1) or unsteady (0) is the next strongest predictor – those with steady work seem more willing to buy a new heater.  Length of time lived in the community (Vlt) has a negative coefficient, suggesting that recent immigrants are more likely to want to buy a new heater than those who have been present for some time.  Time to heat water (Vtth)  has a negative coefficient pardoxically  suggesting that the less time you spend heating water the mosre  likely you are to want to purchase a new heater.  This may indicate the long heating times associated with electric heaters in these communities; those who spend the most time heating may already have electric heating appliances.

LOOKING AT VwouldBuyHeater as a function of VwaterHeater:

 

Model 10: Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

 

 

Coefficient

Std. Error

z-stat

p-value

 

const

-0.68608

0.145522

-4.7146

<0.00001

***

VWaterHeater

-2.60563

0.369351

-7.0546

<0.00001

***

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.097298

McFadden R-squared

 0.183693

 

Adjusted R-squared

 0.174308

Log-likelihood

-173.9616

 

Akaike criterion

 351.9232

Schwarz criterion

 360.1987

 

Hannan-Quinn

 355.1810

 

 

Number of cases 'correctly predicted' = 383 (82.7%)

f(beta'x) at mean of independent vars = 0.097

Likelihood ratio test: Chi-square(1) = 78.293 [0.0000]

 

 

 

^VWouldBuyHeater = -0.686 - 2.61*VWaterHeater

                  (0.146)  (0.369)

 

n = 463, R-squared = 0.184

(standard errors in parentheses)

 

We see a large negative coefficient suggesting, as expected,  that when one already possesses a water heater, one is less likely to want to purchase one.

 

 

Adding VEth  (Ethnicity) Variable to Model:

 

 

Model 11: Logit, using observations 1-463

Dependent variable: VWouldBuyHeater

 

 

Coefficient

Std. Error

z-stat

Slope*

const

-1.86014

0.338635

-5.4931

 

 

VWaterHeater

-2.15316

0.381332

-5.6464

-0.211609

 

VEth

1.50491

0.359419

4.1871

0.132892

 

 

Mean dependent var

 0.172786

 

S.D. dependent var

 0.084641

McFadden R-squared

 0.233129

 

Adjusted R-squared

 0.219052

Log-likelihood

-163.4263

 

Akaike criterion

 332.8526

Schwarz criterion

 345.2658

 

Hannan-Quinn

 337.7394

 

 

*Evaluated at the mean

Number of cases 'correctly predicted' = 383 (82.7%)

f(beta'x) at mean of independent vars = 0.085

 

  Predicted

              0     1

  Actual 0  383     0

         1   80     0

 

Comparison of Model 10  and Model 11:

 

  Null hypothesis: the regression parameter is zero for VEth

 

  Asymptotic test statistic:

    Wald chi-square(1) = 17.5315, with p-value = 2.82594e-005

    F-form: F(1, 460) = 17.5315, with p-value = 3.38748e-005

 

 

^VWouldBuyHeater = -1.86 - 2.15*VWaterHeater + 1.50*VEth

                  (0.339) (0.381)             (0.359)

 

n = 463, R-squared = 0.233

(standard errors in parentheses)

 

 

Correlation Matrix:

 

Correlation Coefficients, using the observations 1 - 463

5% critical value (two-tailed) = 0.0911 for n = 463

            VEth            VTlt             VLt           V1Gft

          1.0000         -0.3875         -0.2750         -0.5986 VEth

                          1.0000          0.1307          0.2057 VTlt

                                          1.0000          0.1991 VLt

                                                          1.0000 V1Gft

 

           VTEFS            VWrk            VEdu            VHwp

          0.0553         -0.0543         -0.0503         -0.1491 VEth

          0.0888          0.1625          0.0591          0.5201 VTlt

          0.0004          0.0584         -0.1170          0.1223 VLt

          0.0058         -0.0328          0.0423          0.0561 V1Gft

          1.0000         -0.0383          0.1219          0.1379 VTEFS

                          1.0000          0.0208          0.2119 VWrk

                                          1.0000          0.0882 VEdu

                                                          1.0000 VHwp

 

             VWa             VSu             VDf             VCb

          0.1328          0.2814          0.0203          0.0827 VEth

         -0.0663         -0.0870          0.0700          0.3888 VTlt

         -0.0123         -0.1630         -0.0633          0.0311 VLt

         -0.1469         -0.2705         -0.1895         -0.0691 V1Gft

         -0.0467         -0.0224         -0.0505          0.0987 VTEFS

          0.0457          0.0374          0.1282          0.0936 VWrk

         -0.1182          0.0334         -0.0267          0.0532 VEdu

          0.0430         -0.0360          0.0748          0.4590 VHwp

          1.0000          0.0472          0.1045          0.0369 VWa

                          1.0000          0.2864          0.1787 VSu

                                          1.0000          0.1922 VDf

                                                          1.0000 VCb

 

            VTth             VTE             VFS    VWaterHeater

          0.1954          0.2677          0.3231         -0.4055 VEth

          0.0226         -0.0502         -0.1792          0.5191 VTlt

         -0.0168         -0.0588          0.0249          0.1488 VLt

         -0.2068         -0.1083         -0.1963          0.2340 V1Gft

         -0.0398          0.7205         -0.3163          0.1288 VTEFS

          0.1192         -0.1059         -0.0216          0.2347 VWrk

         -0.0898          0.1451         -0.1057          0.1012 VEdu

          0.0965          0.0669         -0.0425          0.6031 VHwp

          0.0088         -0.0708          0.0794          0.0255 VWa

          0.1517          0.0411          0.0879         -0.1211 VSu

          0.1816         -0.0276          0.0762          0.0646 VDf

          0.1420          0.1034          0.0686          0.3344 VCb

          1.0000         -0.0292          0.1182          0.1370 VTth

                          1.0000          0.1228         -0.0261 VTE

                                          1.0000         -0.1553 VFS

                                                          1.0000 VWaterHeater

 

 VWouldBuyHeater V10_17_1Whatsys           INDEX

          0.3304          0.5906         -0.8660 VEth

         -0.2499         -0.1499          0.3139 VTlt

         -0.2050         -0.1679          0.2382 VLt

         -0.2872         -0.8494          0.4737 V1Gft

         -0.0738          0.0038         -0.1655 VTEFS

          0.0478          0.0635          0.1313 VWrk

         -0.0432          0.0051         -0.0321 VEdu

         -0.2464          0.0241          0.0674 VHwp

          0.1852          0.1931         -0.0184 VWa

          0.1837          0.2879         -0.2425 VSu

          0.0730          0.2121          0.0683 VDf

         -0.0872          0.1287         -0.1044 VCb

         -0.0491          0.2150         -0.1359 VTth

          0.0048          0.1015         -0.3222 VTE

          0.0810          0.1778         -0.2090 VFS

         -0.3941         -0.1477          0.3132 VWaterHeater

          1.0000          0.3304         -0.1368 VWouldBuyHeater

                          1.0000         -0.4631 V10_17_1Whatsys

                                          1.0000 INDEX

Correlation Coefficients (Pearson's r) for regression analysis

Correlation Coefficients, using the observations 1 - 463
5% critical value (two-tailed) = 0.0911 for n = 463

VEth VTlt VLt V1Gft
1.0000 -0.3875 -0.2750 -0.5986 VEth
1.0000 0.1307 0.2057 VTlt
1.0000 0.1991 VLt
1.0000 V1Gft

VTEFS VWrk VEdu VHwp
0.0553 -0.0543 -0.0503 -0.1491 VEth
0.0888 0.1625 0.0591 0.5201 VTlt
0.0004 0.0584 -0.1170 0.1223 VLt
0.0058 -0.0328 0.0423 0.0561 V1Gft
1.0000 -0.0383 0.1219 0.1379 VTEFS
1.0000 0.0208 0.2119 VWrk
1.0000 0.0882 VEdu
1.0000 VHwp

VWa VSu VDf VCb
0.1328 0.2814 0.0203 0.0827 VEth
-0.0663 -0.0870 0.0700 0.3888 VTlt
-0.0123 -0.1630 -0.0633 0.0311 VLt
-0.1469 -0.2705 -0.1895 -0.0691 V1Gft
-0.0467 -0.0224 -0.0505 0.0987 VTEFS
0.0457 0.0374 0.1282 0.0936 VWrk
-0.1182 0.0334 -0.0267 0.0532 VEdu
0.0430 -0.0360 0.0748 0.4590 VHwp
1.0000 0.0472 0.1045 0.0369 VWa
1.0000 0.2864 0.1787 VSu
1.0000 0.1922 VDf
1.0000 VCb

VTth VTE VFS VWaterHeater
0.1954 0.2677 0.3231 -0.4055 VEth
0.0226 -0.0502 -0.1792 0.5191 VTlt
-0.0168 -0.0588 0.0249 0.1488 VLt
-0.2068 -0.1083 -0.1963 0.2340 V1Gft
-0.0398 0.7205 -0.3163 0.1288 VTEFS
0.1192 -0.1059 -0.0216 0.2347 VWrk
-0.0898 0.1451 -0.1057 0.1012 VEdu
0.0965 0.0669 -0.0425 0.6031 VHwp
0.0088 -0.0708 0.0794 0.0255 VWa
0.1517 0.0411 0.0879 -0.1211 VSu
0.1816 -0.0276 0.0762 0.0646 VDf
0.1420 0.1034 0.0686 0.3344 VCb
1.0000 -0.0292 0.1182 0.1370 VTth
1.0000 0.1228 -0.0261 VTE
1.0000 -0.1553 VFS
1.0000 VWaterHeater

VWouldBuyHeater V10_17_1Whatsys INDEX
0.3304 0.5906 -0.8660 VEth
-0.2499 -0.1499 0.3139 VTlt
-0.2050 -0.1679 0.2382 VLt
-0.2872 -0.8494 0.4737 V1Gft
-0.0738 0.0038 -0.1655 VTEFS
0.0478 0.0635 0.1313 VWrk
-0.0432 0.0051 -0.0321 VEdu
-0.2464 0.0241 0.0674 VHwp
0.1852 0.1931 -0.0184 VWa
0.1837 0.2879 -0.2425 VSu
0.0730 0.2121 0.0683 VDf
-0.0872 0.1287 -0.1044 VCb
-0.0491 0.2150 -0.1359 VTth
0.0048 0.1015 -0.3222 VTE
0.0810 0.1778 -0.2090 VFS
-0.3941 -0.1477 0.3132 VWaterHeater
1.0000 0.3304 -0.1368 VWouldBuyHeater
1.0000 -0.4631 V10_17_1Whatsys
1.0000 INDEX

First trial of Logit Regressions for Hot Water Demand in Cairo


4 Tables for creating multiple regression models for the Culhane Thesis.

For H1: the following would be the mathematical expression of the hypothesis, with the focus on whether infrastructure investment and household preferences matter, and how, controlling for household attributes (such as income and education)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(household attributes, investment in household infrastructure, household preferences, presence of conventional heaters)

Table 1: Dependent  = Presence of hot water appliances (VWaterHeater)

Presence of hot water appliance (VWaterHeater) = f(VTEFS + VEdu + VHwp + VWa + VSu + VDf)

Independent Variable
Value Range
What variable is supposed to measure/explain
VTEFS
Total Expenses Divided by Family Size (proxy for income)
LE 31.18  - LE 2,622.33 (Mean 315 LE, Median 235 LE)
Household attribute, expenses reflect income and income is often correlated with adoption of  or desire for „modern conveniences“
VEdu
Education (Literacy)
0 = Illiterate, 1 = Some education or more
Household attribute, formal education is often correlated with adoption of  or desire for „modern conveniences“.
VHwp
Presence of hot water pipes
0 = not present, 1 = present
Infrastructure
If the required infrastructure has not been invested in „modern conveniences“ cease to be convenient. Conventional heaters would be absent and ranked lower.
VWa
Water availability
0 = Cut frequently or unavailable; 1 = always available
Infrastructure
If the required infrastructure has not been invested in „modern conveniences“ cease to be convenient. Conventional heaters would be ranked lower.
VSu
Seasonal Use of Hot Water
0 = Winter use only; 1 = all year
Household Preferences
If families don't use hot water all year round it might not be worth their while to invest in dedicated hot water appliances when they can use stoves that have other utilities for the same purpose. They would tend to rank conventional heaters lower.
VDf
Does the way in which you get hot water make a difference to you?
0 = No; 1 = Yes
Household Preferences
If respondents claim that it doesn't make a difference how they got their hot water then hot water appliances and use of stoves fall on the same place on their indifference curve, either rendering the same level of utility (satisfaction) for the consumer. We should expect more people who respond „no“ to have traditional heaters and rank conventional heaters lower




None of these variables are correlated any higher than .131 (for hot water pipes and total expenses divided by family size); all fall below the .4 mathematical specification.

Results:
Model 1:Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

Coefficient
Std. Error
z-stat
p-value

const
-2.44537
0.425788
-5.7432
<0.00001
***
VTEFS
0.000465587
0.000462365
1.0070
0.31395

VEdu
0.345653
0.243068
1.4220
0.15501

VHwp
3.41211
0.341556
9.9899
<0.00001
***
VWa
0.052209
0.247367
0.2111
0.83284

VSu
-0.869722
0.283494
-3.0679
0.00216
***
VDf
0.457343
0.27147
1.6847
0.09205
*

Mean dependent var
 0.542117

S.D. dependent var
 0.249984
McFadden R-squared
 0.312669

Adjusted R-squared
 0.290745
Log-likelihood
-219.4528

Akaike criterion
 452.9057
Schwarz criterion
 481.8698

Hannan-Quinn
 464.3081


Number of cases 'correctly predicted' = 367 (79.3%)
f(beta'x) at mean of independent vars = 0.250
Likelihood ratio test: Chi-square(6) = 199.66 [0.0000]

           Predicted
                      0     1
  Actual  0  128    84
              1   12   239

Excluding the constant, p-value was highest for variable 9 (Vwa = Water Availability)



Sequential elimination using two-sided alpha = 0.10

 Dropping VWa              (p-value 0.833)
 Dropping VTEFS            (p-value 0.318)
 Dropping VEdu             (p-value 0.124)
 Dropping VDf              (p-value 0.130)
Convergence achieved after 6 iterations

Model 1OmitVariables:Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

Coefficient
Std. Error
z-stat
Slope*
const
-1.90636
0.331194
-5.7560


VHwp
3.45858
0.336652
10.2734
0.659945

VSu
-0.707627
0.262593
-2.6948
-0.174099


Mean dependent var
 0.542117

S.D. dependent var
 0.249978
McFadden R-squared
 0.303468

Adjusted R-squared
 0.294072
Log-likelihood
-222.3905

Akaike criterion
 450.7810
Schwarz criterion
 463.1942

Hannan-Quinn
 455.6678


*Evaluated at the mean
Number of cases 'correctly predicted' = 367 (79.3%)
f(beta'x) at mean of independent vars = 0.250
Likelihood ratio test: Chi-square(2) = 193.784 [0.0000]

Comparison of Model 1 and Model 8 (model 1 Omit Variables):

  Null hypothesis: the regression parameters are zero for the variables
    VTEFS, VEdu, VWa, VDf

  Test statistic: Robust F(4, 456) = 1.46139, with p-value = 0.212912
  Of the 3 model selection statistics, 3 have improved.

NO CHANGE IN NUMBER OF CASES CORRECTLY PREDICTED WHEN MODEL SIMPLIFIED.

Equation for Model 8 (Model 1 Omit Variables):
^VWaterHeater = -1.91 + 3.46*VHwp - 0.708*VSu
               (0.331) (0.337)     (0.263)

n = 463, R-squared = 0.303
(standard errors in parentheses)

When we look at the Zabaleen data alone the equation is:

^VWaterHeater = -4.07 + 4.64*VHwp - 0.509*VSu
               (1.01)  (1.04)      (0.451)

n = 232, R-squared = 0.290
(standard errors in parentheses)

Model 1: Logit, using observations 1-232
Dependent variable: VWaterHeater
QML standard errors

             coefficient   std. error   t-ratio   p-value
  --------------------------------------------------------
  const       -4.06991      1.01049     -4.028    5.63e-05 ***
  VHwp         4.63792      1.03571      4.478    7.53e-06 ***
  VSu         -0.508829     0.450633    -1.129    0.2588

Mean dependent var   0.340517   S.D. dependent var   0.142751
McFadden R-squared   0.289805   Adjusted R-squared   0.269643
Log-likelihood      -105.6768   Akaike criterion     217.3537
Schwarz criterion    227.6939   Hannan-Quinn         221.5238

Number of cases 'correctly predicted' = 163 (70.3%)
f(beta'x) at mean of independent vars = 0.143
Likelihood ratio test: Chi-square(2) = 86.2457 [0.0000]

           Predicted
             0    1
  Actual 0  85   68
         1   1   78

and when we look at the Darb Al Ahmar data alone the equation is:

^VWaterHeater = -1.58 + 3.72*VHwp + 0.346*VSu
               (0.449) (0.437)     (0.426)

n = 231, R-squared = 0.385
(standard errors in parentheses)
odel 1: Logit, using observations 1-231
Dependent variable: VWaterHeater
QML standard errors

             coefficient   std. error   t-ratio    p-value
  ---------------------------------------------------------
  const       -1.58314      0.448793    -3.528    0.0004    ***
  VHwp         3.71652      0.436895     8.507    1.79e-017 ***
  VSu          0.345957     0.426098     0.8119   0.4168  

Mean dependent var   0.744589   S.D. dependent var   0.153158
McFadden R-squared   0.385389   Adjusted R-squared   0.362533
Log-likelihood      -80.67054   Akaike criterion     167.3411
Schwarz criterion    177.6683   Hannan-Quinn         171.5064

Number of cases 'correctly predicted' = 204 (88.3%)
f(beta'x) at mean of independent vars = 0.153
Likelihood ratio test: Chi-square(2) = 101.168 [0.0000]

           Predicted
              0     1
  Actual 0   43    16
         1   11   161


The greatest predictor of the presence of a Hot Water Heater is the Presence of Hot Water Pipes, suggesting that infrastructure is of paramount importance in explaining the absence of conventional hot water heaters. The second largest predictor is Seasonal Use of Hot Water, which paradoxically has a negative coefficient, suggesting that those who don't use water all year are more likely to have hot water heaters. But the coefficient is very small.   This paradox might be explained by noting that the community with the most hot water heaters (Darb Al Ahmar) also reports the greatest seasonality in hot water use, while those with the fewest hot water heaters (the Zabaleen) report more year-round use.  When considered alone the Darb Al Ahmar data shows a positive coefficient for seasonality (those who use hot water all year round are more likely to have heaters). When considered alone the Zabaleen data still shows a small negative coefficient – those using hot water all year round are less likely to have heaters.  This underscores the differences between these two communities (in Darb Al Ahmar, conventional heaters are part of the norm and those who use a lot of hot water would be expected to have them; by contrast in Zabaleen, where infrastructure is deficient, and people are used to boiling on the stove, those who use a lot of hot water would tend to rely on their stoves rather than incur the extra costs and troubles of using conventional heaters that may fail for a number of reasons.) The equation might be improved by using ethnicity rather than seasonality as the variable.

The new equation would look like this:

^VWaterHeater = -1.48 + 3.86*VHwp - 2.27*VEth
               (0.319) (0.366)     (0.292)

n = 463, R-squared = 0.412
(standard errors in parentheses)

Here we see that Ethnicity is the second strongest predictor of Water Heater Presence after hot water pipes, and the negative coefficient suggests that if you are a Zabaleen (value of 1) you are unlikely to have a hot water heater (Darb Al Ahmar was coded as 0).

The model run looks like this:


Model 13:Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

Coefficient
Std. Error
z-stat
Slope*
const
-1.47931
0.319287
-4.6332


VHwp
3.8575
0.365567
10.5521
0.707737

VEth
-2.26797
0.292268
-7.7599
-0.512576


Mean dependent var
 0.542117

S.D. dependent var
 0.249646
McFadden R-squared
 0.412053

Adjusted R-squared
 0.402657
Log-likelihood
-187.7214

Akaike criterion
 381.4428
Schwarz criterion
 393.8559

Hannan-Quinn
 386.3295


*Evaluated at the mean
Number of cases 'correctly predicted' = 367 (79.3%)
f(beta'x) at mean of independent vars = 0.250
Likelihood ratio test: Chi-square(2) = 263.123 [0.0000]




Table 2: Dependent = Ranking of hot water heater first if discretional income available

Rank hot water heater first(VwouldBuyHeaterFirst)  = f(VTEFS + VEdu +  VWa + VCb + VWaterHeater + VSu + VDf )

Independent Variable
Value Range
What variable is supposed to measure/explain
VTEFS
Total Expenses Divided by Family Size (proxy for income)
LE 31.18  - LE 2,622.33 (Mean 315 LE, Median 235 LE)
Household attribute
See Table 1
VEdu
Education
0 = Illiterate, 1 = Some education or more
Household attribute
See Table 1
VWa
Water Availability
0 = Cut frequently or unavailable; 1 = always available
Infrastructure
See Table 1
VCb
Ceramics in bathroom
0 = no ceramics, 1 = ceramics
Infrastructure
Ceramics are an indicator of investment in a „finished bathroom“. In incremental housing situations often people will not invest in bathroom appliances if the bathroom walls and floors have not yet been tiled with ceramics.
VWaterHeater
Presence of conventional heater (replaces hot water pipes in bathroom, with which it is correlated at .6)
0 = No conventional water heater; 1 = Conventional gas or electric heater
Household preferences
Absence of a dedicated hot water heater, if a family values such appliances and is not indifferent to them, would make them more likely to rank it first if discretional income were available. Presence would make them more likely to desire something else.
VSu
Seasonal Use of Hot Water
0 = Winter use only; 1 = all year
Household Preferences
See Table 1
VDf
Does the way in which you get hot water make a difference to you?
0 = No; 1 = Yes
Household Preference
See Table 1




None of these variables are correlated any higher than .337 (Ceramics and Hot Water Appliance); all fall below the .4 mathematical specification.

Results:

Model 2:Logit, using observations 1-463
Dependent variable: VWouldBuyHeater
QML standard errors

Coefficient
Std. Error
z-stat
p-value

const
-2.39812
0.526998
-4.5505
<0.00001
***
VTEFS
-0.000136254
0.000485963
-0.2804
0.77919

VEdu
0.0478653
0.279346
0.1713
0.86395

VWa
1.20743
0.296298
4.0751
0.00005
***
VCb
0.135672
0.294565
0.4606
0.64510

VWaterHeater
-2.81766
0.440996
-6.3893
<0.00001
***
VSu
1.08045
0.397691
2.7168
0.00659
***
VDf
0.465233
0.346598
1.3423
0.17950


Mean dependent var
 0.172786

S.D. dependent var
 0.080113
McFadden R-squared
 0.263320

Adjusted R-squared
 0.225780
Log-likelihood
-156.9925

Akaike criterion
 329.9849
Schwarz criterion
 363.0868

Hannan-Quinn
 343.0162


Number of cases 'correctly predicted' = 402 (86.8%)
f(beta'x) at mean of independent vars = 0.080
Likelihood ratio test: Chi-square(7) = 112.231 [0.0000]

   Predicted
                      0     1
  Actual 0    367    16
              1    45    35

Excluding the constant, p-value was highest for variable 7 (Vedu)


Model 2OmitVariables:Logit, using observations 1-463
Dependent variable: VwouldBuyHeater

Sequential elimination using two-sided alpha = 0.10

 Dropping VEdu             (p-value 0.864)
 Dropping VTEFS            (p-value 0.798)
 Dropping VCb              (p-value 0.651)
 Dropping VDf              (p-value 0.157)
Convergence achieved after 7 iterations

QML standard errors

Coefficient
Std. Error
z-stat
Slope*
const
-2.09703
0.390981
-5.3635


VWa
1.23884
0.295798
4.1881
0.119241

VWaterHeater
-2.70842
0.41104
-6.5892
-0.273088

VSu
1.16046
0.391831
2.9616
0.0818763


Mean dependent var
 0.172786

S.D. dependent var
 0.082083
McFadden R-squared
 0.257589

Adjusted R-squared
 0.238819
Log-likelihood
-158.2139

Akaike criterion
 324.4277
Schwarz criterion
 340.9786

Hannan-Quinn
 330.9434


*Evaluated at the mean
Number of cases 'correctly predicted' = 401 (86.6%)
f(beta'x) at mean of independent vars = 0.082
Likelihood ratio test: Chi-square(3) = 109.788 [0.0000]

                    Predicted
                     0     1
  Actual 0  362    21
             1    41    39

Comparison of Model 2 and Model 7 (Model 2 Omit Variables):

  Null hypothesis: the regression parameters are zero for the variables
    VTEFS, VEdu, VCb, VDf

  Test statistic: Robust F(4, 455) = 0.600323, with p-value = 0.662585
  Of the 3 model selection statistics, 3 have improved.

Equation for Model 7 (Model 2 Omit Variables):

^VWouldBuyHeater = -2.10 + 1.24*VWa - 2.71*VWaterHeater + 1.16*VSu
                  (0.391) (0.296)    (0.411)             (0.392)

n = 463, R-squared = 0.258
(standard errors in parentheses)

The presence of a water heater is the strongest predictor of families ranking a hot water heater first in preference if discretional income available. The negative coefficient suggests that if you have a hot water heater you are unlikely to want to buy a new one.  The second greatest predictor is WATER AVAILABILITY; if you have water you are more likely to want to buy a water heater.  Seasonality is the next strongest predictor; if you use hot water all year round you are more likely to want to buy a water heater if you have discretional income.

SLIGHT CHANGE IN NUMBER OF CASES CORRECTLY PREDICTED WHEN MODEL SIMPLIFIED (REDUCED FROM 86.8% TO 86.6%)

When we put Vdf („Does the way in which you get hot water make a difference to you“) back into the model we get our original prediction of 86.8% (see the following (Model 9):)

QML standard errors

                 coefficient   std. error   t-ratio     slope  
  --------------------------------------------------------------
  const           -2.36903      0.482696    -4.908            
  VWa              1.21041      0.294624     4.108     0.113726
  VWaterHeater    -2.77594      0.425021    -6.531    -0.276790
  VSu              1.09477      0.393539     2.782     0.0760947
  VDf              0.481683     0.340275     1.416     0.0358809

Mean dependent var   0.172786   S.D. dependent var   0.080335
McFadden R-squared   0.262685   Adjusted R-squared   0.239222
Log-likelihood      -157.1279   Akaike criterion     324.2558
Schwarz criterion    344.9444   Hannan-Quinn         332.4003

Number of cases 'correctly predicted' = 402 (86.8%)
f(beta'x) at mean of independent vars = 0.080

           Predicted
              0     1
  Actual 0  370    13
         1   48    32

Comparison of Model 2 and Model 9:

  Null hypothesis: the regression parameters are zero for the variables
    VTEFS, VEdu, VCb

  Test statistic: Robust F(3, 455) = 0.108245, with p-value = 0.955265
  Of the 3 model selection statistics, 3 have improved.



For H2: the following would be the mathematical expression of the  hypothesis, with the focus on ethnicity, length of time lived in the community, and source of hot water (municipal vs. self-provisioning)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(ethnic dummy, cultural factors, municipal vs self-provisioning, expenditures as income proxy, employment, other household attributes)

Table 3: Dependent = Presence of hot water appliances

Presence of hot water appliance(VWaterHeater)  = f(VEth + VTlt + VLt + VGft + VTEFS  + VWrk)

Independent Variable
Value Range
What variable is supposed to measure/explain
VEth
Ethnic Community
0 = Darb Al Ahmar, 1 = Zabaleen
Ethnic dummy
Historical legacy issues and cultural norms may have more explanatory power for determing the indifference curves of different groups that otherwise share similar income and educational characteristics.
VTlt
Type of toilet
(replaces „hot water pipes“ variable with which it is correlated at .504)
0 = Balady (squat) 1 = European (throne)
Cultural Factor:
This could be a proxy for modernist thinking in bathroom provisions.  It could be argued that those who maintain the squat  toilet tradition might also maintain the stove heating tradition, while those who adopt European toilets also tend to adopt European water heating technologies.
VLt
Length of Time in Community
1 = less than a year, 2 = 1 to 5 years, 3 = 6 to 10 years, 4 = 11 to 20 years, 5 = 21 to 40 years, 6 = more than 40 years
Cultural factor
In Incremental Housing the more time you have spent in a community the more likely you are to have accumulated consumer goods if they are valued.
VGft
What water heating system would you choose as gift?
0 = Unconventional (Babur, Hamil, Stove, Solar) 1 = Conventional (Gas or Electric Appliance)
Cultural Factor
Darb Al Ahmar answers were 18.6 % Unconventional and 81.4 % Conventional
By contrast the Zabaleen answers were  the reverse: 78.4 % Unconventional, 21.6% Conventional. This underscores the cultural differences between the communities.  When solar is considered separately Darb Al Ahmar values are 5.6 % Traditional, 81.4 % Conventional and 13 % Solar; Zabaleen are 5.6 % Traditional, 21.6 % conventional and 72.8 % Solar.
VTEFS
Total Monthly Houshehold Expenses Divided by Family Size (proxy for income)
LE 31.18  - LE 2,622.33 (Mean 315 LE, Median 235 LE)
Household attribute: Income
See Table 1
VWrk Employment: Work of Head of Household
0 = Uncertain, 1 = Certain
Household attribute: Employment
It is often assumed that those with uncertain employment are reluctant to tie themselves to „conveniences“ that incur running costs they cannot maintain.  Workers with uncertain income would tend to favor traditional multi-purposing of simple tools like stoves, that permit self-provisioning without locking them into payment schedules that could get them into trouble.




All Pearson's are under .4 for these variables.



Model 3:Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

Coefficient
Std. Error
z-stat
p-value

const
-2.26923
0.597179
-3.7999
0.00014
***
VEth
-1.30386
0.309976
-4.2063
0.00003
***
VTlt
2.5124
0.355897
7.0593
<0.00001
***
VLt
0.0563612
0.0932659
0.6043
0.54564

V1Gft
0.0847256
0.295292
0.2869
0.77417

VTEFS
0.00132005
0.000472412
2.7943
0.00520
***
VWrk
1.02782
0.246741
4.1656
0.00003
***

Mean dependent var
 0.542117

S.D. dependent var
 0.249363
McFadden R-squared
 0.300341

Adjusted R-squared
 0.278417
Log-likelihood
-223.3889

Akaike criterion
 460.7777
Schwarz criterion
 489.7418

Hannan-Quinn
 472.1801


Number of cases 'correctly predicted' = 358 (77.3%)
f(beta'x) at mean of independent vars = 0.249
Likelihood ratio test: Chi-square(6) = 191.788 [0.0000]

   Predicted
                     0     1
  Actual 0  139    73
              1   32   219

Excluding the constant, p-value was highest for variable 4 (V1Gft)



Model 3 OmitVariables: Logit, using observations 1-463
Dependent variable: VwaterHeater

Sequential elimination using two-sided alpha = 0.10

 Dropping V1Gft            (p-value 0.774)
 Dropping VLt              (p-value 0.541)
Convergence achieved after 6 iterations

QML standard errors

Coefficient
Std. Error
z-stat
Slope*
const
-1.93557
0.396568
-4.8808


VEth
-1.39028
0.242098
-5.7427
-0.333437

VTlt
2.50793
0.35539
7.0569
0.530208

VTEFS
0.00133329
0.00047127
2.8291
0.00033247

VWrk
1.02415
0.246359
4.1571
0.248696


Mean dependent var
 0.542117

S.D. dependent var
 0.249360
McFadden R-squared
 0.299584

Adjusted R-squared
 0.283924
Log-likelihood
-223.6306

Akaike criterion
 457.2612
Schwarz criterion
 477.9498

Hannan-Quinn
 465.4057


*Evaluated at the mean
Number of cases 'correctly predicted' = 356 (76.9%)
f(beta'x) at mean of independent vars = 0.249
Likelihood ratio test: Chi-square(4) = 191.304 [0.0000]

 Predicted
                    0     1
  Actual 0  138    74
              1   33   218

Comparison of Model 3 and Model 6 (Model 3OmitVariables):

  Null hypothesis: the regression parameters are zero for the variables
    VLt, V1Gft

  Test statistic: Robust F(2, 456) = 0.230405, with p-value = 0.794304
  Of the 3 model selection statistics, 3 have improved.

Equation for Model 6 (Model 3 Omit Variables):

^VWaterHeater = -1.94 - 1.39*VEth + 2.51*VTlt + 0.00133*VTEFS + 1.02*VWrk
               (0.397) (0.242)     (0.355)     (0.000471)      (0.246)

n = 463, R-squared = 0.300
(standard errors in parentheses)

The strongest predictor of presence of a water heater appears to be presence of a European style toilet.  The second strongest predictor is ETHNICITY – the negative coefficient suggests that if you are a Zabaleen you are UNLIKELY to have a hot water heater.

SLIGHT CHANGE IN NUMBER OF CASES CORRECTLY PREDICTED WHEN MODEL SIMPLIFIED (77.3% TO 76.9%)

When we just eliminate V1Gft we get 77.1% correctly predicted:

Convergence achieved after 6 iterations

Model 10: Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

             coefficient   std. error    t-ratio      slope  
  -------------------------------------------------------------
  const      -2.20132      0.538554      -4.087              
  VEth       -1.35381      0.251582      -5.381    -0.325332  
  VTlt        2.50888      0.354037       7.086     0.530324  
  VLt         0.0570199    0.0933458      0.6108    0.0142187
  VTEFS       0.00132924   0.000473718    2.806     0.000331464
  VWrk        1.02189      0.246666       4.143     0.248179  

Mean dependent var   0.542117   S.D. dependent var   0.249363
McFadden R-squared   0.300210   Adjusted R-squared   0.281418
Log-likelihood      -223.4309   Akaike criterion     458.8618
Schwarz criterion    483.6882   Hannan-Quinn         468.6353

Number of cases 'correctly predicted' = 357 (77.1%)
f(beta'x) at mean of independent vars = 0.249

           Predicted
              0     1
  Actual 0  139    73
         1   33   218

Comparison of Model 3 and Model 10:

  Null hypothesis: the regression parameter is zero for V1Gft
  Test statistic: Robust F(1, 456) = 0.0823241, with p-value = 0.774303
  Of the 3 model selection statistics, 3 have improved.

Test for addition of variables -
  Null hypothesis: parameters are zero for the variables
    V1Gft
  Asymptotic test statistic: Chi-square(1) = 0.0823241
  with p-value = 0.774303

When we just eliminate Vlt and leave V1Gft in we also get 77.1%:

Convergence achieved after 6 iterations

Model 12: Logit, using observations 1-463
Dependent variable: VWaterHeater
QML standard errors

             coefficient   std. error    t-ratio      slope  
  -------------------------------------------------------------
  const      -2.01037      0.480475      -4.184              
  VEth       -1.33712      0.303182      -4.410    -0.321600  
  VTlt        2.51162      0.357235       7.031     0.530757  
  VTEFS       0.00132366   0.000470020    2.816     0.000330067
  VWrk        1.03026      0.246369       4.182     0.250108  
  V1Gft       0.0893134    0.294369       0.3034    0.0222689

Mean dependent var   0.542117   S.D. dependent var   0.249360
McFadden R-squared   0.299731   Adjusted R-squared   0.280939
Log-likelihood      -223.5838   Akaike criterion     459.1676
Schwarz criterion    483.9940   Hannan-Quinn         468.9411

Number of cases 'correctly predicted' = 357 (77.1%)
f(beta'x) at mean of independent vars = 0.249

           Predicted
              0     1
  Actual 0  138    74
         1   32   219

Comparison of Model 3 and Model 12:

  Null hypothesis: the regression parameter is zero for VLt
  Test statistic: Robust F(1, 456) = 0.365185, with p-value = 0.545941
  Of the 3 model selection statistics, 3 have improved.

IT WOULD APPEAR THAT THE ORIGINAL MODEL HAD GREATER PREDICTIBILITY.


Table 4: Dependent = Rank hot water appliance first if discretional income available (VwouldBuyHeaterFirst).

 Ranking of hot water heater(VwouldBuyHeaterFirst)  =   f(VEth + VTlt + VWh + VTth + VLt + VTE  + VWrk + VFS)

Independent Variable
Value Range
What variable is supposed to measure/explain
VEth
Ethnic Community
0 = Darb Al Ahmar, 1 = Zabaleen
Ethnic dummy
See Table 3
VTlt
Type of toilet
0 = Balady (squat) 1 = European (throne)
Cultural Factor:
See Table 3
VWh
Presence of Hot Water Appliance
0 = No conventional water heater; 1 = Conventional gas or electric heater
Possible Cultural Factor: Municipal versus self-provisioning
On the one hand, if income is the constraint,  those without hot water appliances could be more interested in buying a hot water appliance if they had more income. On the other hand, if people are satisfied with heating water in the traditional way and do not highly value „convenience“, or if the infrastructure makes conventional heaters dangerous or unsuitable they might be likely to prefer some other consumer goods.  Those with appliances may way to upgrade, or, having experiences problems, may want to abandon them.
VTth
Time to Heat Water in Minutes
(Pearsons r's for this are all low)
Range: 1 minute to 180 minutes, Mean: 32 minutes, SD 32.213, N = 463
Infrastructure
Assumption would be that those families which must spend longer heating water would tend to rank a modern heating appliance higher; however in this community, given family size and tendency to unplug heaters,  electric heaters can take as long or longer than stoves.
VLt
Length of Time in Community
1 = less than a year, 2 = 1 to 5 years, 3 = 6 to 10 years, 4 = 11 to 20 years, 5 = 21 to 40 years, 6 = more than 40 years
Cultural Factor
VTE
Total Expenses (NOT divided by Family Size) (proxy for income)
LE 215.00 -  LE 13,649 (Mean 1,410 LE, Median 1,149 LE)
Household attribute
Here we use only Total Expenses as the Proxy for income so that we can also look at Family Size independently and because the value range might have more explanatory power than dividing by family size (under the assumption that in many consumption bundles each additional child may not actually cost the same amount).
VFS
Family Size
1 to 30 (Mean = 5.41, Median 5.0)
Household attribute
Larger families place an extra burden on the person having to prepare bath water in a traditional way because of limitations on the size vessel, and thus quantity of water, that can be heated. Also in large families there would be competing uses for a stove – cooking and bathing times would  have to be negotiated and some sacrifices made.  Larger families might wish for a more convenient dedicated hot water system.
VWrk
Employment: Work of Head of Household
0 = Uncertain, 1 = Certain
Household attribute
See Table 3




All Pearson's are under .4 for these variables.


Model 4:Logit, using observations 1-463
Dependent variable: VWouldBuyHeater


Coefficient
Std. Error
z-stat
p-value

const
-0.9421
0.645743
-1.4589
0.14458

VEth
1.46679
0.409637
3.5807
0.00034
***
VTlt
-0.0862385
0.315847
-0.2730
0.78482

VWaterHeater
-2.31947
0.434744
-5.3352
<0.00001
***
VTth
-0.00394684
0.00546999
-0.7215
0.47058

VLt
-0.190747
0.102782
-1.8558
0.06348
*
VTE
-8.46595e-05
0.000120716
-0.7013
0.48311

VWrk
0.995266
0.31101
3.2001
0.00137
***
VFS
-0.0296812
0.0472782
-0.6278
0.53013


Mean dependent var
 0.172786

S.D. dependent var
 0.079628
McFadden R-squared
 0.271849

Adjusted R-squared
 0.229617
Log-likelihood
-155.1748

Akaike criterion
 328.3496
Schwarz criterion
 365.5891

Hannan-Quinn
 343.0097


Number of cases 'correctly predicted' = 394 (85.1%)
f(beta'x) at mean of independent vars = 0.080
Likelihood ratio test: Chi-square(8) = 115.867 [0.0000]

      Predicted
                     0     1
  Actual 0  370    13
              1   56    24

Excluding the constant, p-value was highest for variable 2 (Vtlt)


Model 5 (Model 4OmitVariables):Logit, using observations 1-463
Dependent variable: VwouldBuyHeater

Sequential elimination using two-sided alpha = 0.10

 Dropping VTlt             (p-value 0.785)
 Dropping VFS              (p-value 0.544)
 Dropping VTth             (p-value 0.462)
 Dropping VTE              (p-value 0.477)
Convergence achieved after 7 iterations


Coefficient
Std. Error
z-stat
Slope*
const
-1.20702
0.579368
-2.0833


VEth
1.31349
0.375868
3.4945
0.107907

VWaterHeater
-2.43458
0.404369
-6.0207
-0.232344

VLt
-0.207054
0.100413
-2.0620
-0.0164267

VWrk
0.991547
0.302857
3.2740
0.0853889


Mean dependent var
 0.172786

S.D. dependent var
 0.079335
McFadden R-squared
 0.268150

Adjusted R-squared
 0.244688
Log-likelihood
-155.9632

Akaike criterion
 321.9264
Schwarz criterion
 342.6150

Hannan-Quinn
 330.0709


*Evaluated at the mean
Number of cases 'correctly predicted' = 394 (85.1%)
f(beta'x) at mean of independent vars = 0.079
Likelihood ratio test: Chi-square(4) = 114.29 [0.0000]

 Predicted
                    0     1
  Actual 0  367    16
              1   53    27

Comparison of Model 4 Original and Model 5 (Model 4 Omit Variables):

  Null hypothesis: the regression parameters are zero for the variables
    VTlt, VTth, VTE, VFS

  Likelihood ratio test:
    Chi-square(4) = 1.57684, with p-value = 0.812949
  Of the 3 model selection statistics, 3 have improved.

NO CHANGE IN NUMBER OF CASES CORRECTLY PREDICTED WHEN MODEL SIMPLIFIED.

Equation for Model 5 (Model 4 Omit Variables):

^VWouldBuyHeater = -1.21 + 1.31*VEth - 2.43*VWaterHeater - 0.207*VLt + 0.992*VWrk
                  (0.579) (0.376)     (0.404)             (0.100)     (0.303)

n = 463, R-squared = 0.268
(standard errors in parentheses)

The strongest predictor of whether a family ranks a conventional hot water heater 1st (most preferred if discretional income available) seems to be the presence of a conventional hot water heater. The negative coefficient suggests that if one has a hot water heater one is unlikely to want another one.  The second strongest predictor is the ETHNICITY – the Zabaleen are positively correlated with desire for a hot water heater.


Tuesday, September 15, 2009

Thinking out loud

I don't know what it is, but I have trouble thinking when I'm not thinking out loud. I suppose it is the evanescent nature of thought -- Wikipedia (my democratic friend for definitions of the people by the people for the people) states: "An evanescent wave is a nearfield standing wave with an intensity that exhibits exponential decay with distance from the boundary at which the wave was formed." I guess that I feel that evanescent brain waves are fleeting to the point of existential terror. These days as I organize my bookshelf and flip through my library reading notes and insights I made in the margins years and years ago I think, "wow, did I really write that? Did I really think that? Was that me? Thank God I wrote it down!". Today the internet and the blogosphere gives me a chance to increase the distance from the boudary at which the brain wave was formed beyond the margins of books or sheets of paper in a notebook. I can "think out louder" than ever before, and I can search my previous thoughts using Google. In fact by posting my meandering thoughts on the blogosphere I can find the thread of my ideas faster on the internet than I could flipping through my notebooks or filecards. So it is very useful. And perhaps some unfinished thought or process of mine can inform somebody else's evolving puzzle and we can share collective intelligence...

Today's thought is how to create (per my adviser's advice)

"the mathematical model specifications that layout the operationalization of my hypotheses (i.e., they should be in the form:

dep variable = f(independent variables)"

My task:

* make 2 tables for each hypothesis/model specification that includes
Table 1: dependent variable correlations (should only include those dependent variables that are measures for each hypothesis)
column 1 dependent variable list (if you have only 1 dependent variable then you don't need this table)
column 2 variables that are highly correlated with each of those variables (at r>=0.4)

Table 2: independent variable correlations (should only include those independent variables that are included in your independent variable list from your mathematical model specification)
column 1 independent variable list
column 2 independent variables that are highly correlated with each of those variables (at r>=0.4)"


Where to begin?


I had started my journey into this field looking at Jaffe and Stavins equations for the energy paradox in America and hoping I could apply it to the Egyptian case. Several years later I realized I didn't have enough time in this lifetime and this advanced age and with all my responsibilities to master the mode of thinking that would enable me to think mathematically. Oh well, maybe in another life. But I still have to come up with an equation for my hypotheses.

Jaffe and Stavins (1994) developed formal equations to help solve "the energy paradox and the diffusion of conservation technology" and modified them for retrofit situations:

minPV(T) = ∫g(kij, ÎĽijt)*e-rtdt + w* ∫g(kij, ÎĽijt)* e-rtdt + [L(CiT, ViT) – XiT]* e-rt + ∫Dit*e-rtdt

where minPV = the desire on the part of homeowner to minimize costs for three elements: the present discounted value of annual energy costs from the present to the time of adoption of the energy saving technology,
the PV of annual energy costs after the adoption,
and the PV of the one-time cost of adoption of the energy saving technology.

The other variables are:

T= the time of adoption (installation);
g(.) = function that relates elements of kij to annual fuel expenditures;
kij = vector of current and expected future values of observable characteristics of the home (size, type of heating appliance), and region (price of fuel, climate, average income and education);
ÎĽijt = an unobserved factor affecting energy use;
e = base of natural logarithms;
r = real market rate of interest (discount rates);
w = index of average quantity of energy used by the technology relative to energy consumption if the technology were not used (0<=1); L(.) = a function that generates the 'effective cost' of installation from the engineering cost and the prevalence of use of the technology; CiT = engineering estimate of purchase and installation cost of adoption of the technology; ViT = the fraction of retrofit candidates in jurisdiction i that have adopted the technology by time T; XiT = subsidy of tax credit in jurisdiction i for adopting the technology; Dit = dummy variable set to unity if jurisdiction i has regulation in year T requiring the technology be installed. "By formulating the problem this way, we are assuming that if the homeowner is not risk-neutral, her attitude toward risk is such that the riskiness of the investment can be captured by appropriate adjustment of the interest rate. Because of the possibility that the technology may be significantly cheaper in the future (either because of a technological change or 'epidemic learning'), this is not a 'yes/no' decision like that of the builder; the homeowner must decide at what time (if any) to perform the retrofit installation." (p.106) (Jaffe and Stavins 1994: 104; 106; 107).




Clearly in the case of the Cairo poor (and in the low income neighborhood in California where we are performing our Green Job Training with Frank DiMassa Utility Consulting) we are in a "retrofit situation" in which the homeowner (or renter) must decide at what time (if any) to perform the retrofit installation.


Thus the dependent variables we are modelling -- 1), presence of a conventional market purchased hot water heater that uses market supplied gas or electricity and 2) willingness to pay for an improved hot water system that uses sunlight as its principle fuel -- in effect represent minPV(T) because in either case a decision was made (1) or will be made (2) to satisfy "the desire on the part of homeowner to minimize costs for three elements:
  1. the PV of the one-time cost of adoption of the energy saving technology.
  2. the PV of annual energy costs after the adoption
  3. the present discounted value of annual energy costs from the present to the time of adoption of the energy saving technology
In my WTP section of my survey of the Cairo poorI disaggregated these by asking for a reasonable downpayment figure (what they felt the one-time cost of adoption should be) and a monthly payment (an estimate of reasonable annual energy costs after adoption) and a duration of the payments (in a way giving respondents a chance to estimate their annual energy costs from the present to the time of adoption). Unsurprisingly the average figure after 0's and non-responses were taken out was quite close to the cost of installing and using a conventional system. Hot water is apparently worth, on average, what one can buy it for.

But there are other dimensions to this, as an average figure does not reflect households for whom improved hot water is worth quite a bit more and those for whom it is worth much less. Still economics doesn't deal well with individual preferences (this is not its domain) and policy seeks to address the average.

An equation is sought that can model the determinants of hot water system choice. In my data set I do not have the independent variables that Jaffe and Stavins consider. I certainly do not have interest rates. What I have are demographic measures of income and income proxies such as various appliances and the presence of ceramics in a bathroom (a indicator of how much people have invested into the bathroom and how 'finished' it is), and I have information about infrastructure -- presence of hot and cold water pipes for example. I also have some information about behavior (seasonal use of hot water) and cultural attributes (such as whether a family has installed a European or a "balady" style toilet). I also have cultural/regional information that seems to play a large role in hot water system choice -- whether or not the respondent comes from the recent migrant Zabaleen sub-culture of Manshiyet Nasser or from the historical urban crafts sub-culture of Darb Al Ahmar. I also have information about risk perception of various technologies.

In my original hypotheses I sought to demonstrate that residents of both communities would demonstrate a higher WTP for hot water services with attributes they can control (pay-as-you-go, flexible-use, ability to match consumption to income uncertainty) because as an entrepreneurial class adept at incremental self-provisioning members of the poor community may not be strongly motivated by climbing the putative “energy ladder” but by obtaining the optimal end-good which is currently under-provided. This hypothesis would be fulfilled partially if it could be demonstrated that income (which we determined to be similar in the two communities) was not a strong determinant of either the use of a conventional system or WTP for an improved system. Other factors -- infrastructural and cultural -- should weigh more heavily. What I discovered is that in fact one community with stronger 'incremental' and 'self-provisioning' characteristics seemed to differ significantly from the other.

Equations that describe my hypotheses would be of the linear form y = mx + b:

For the existence of a conventional heater:

ECH = m(a(Inc) + b(Inf) + c(Pref) + d(Comm) ) + C

where ECH is "Existence of Conventional Heater"
m is the slope of the line
a(Inc) is the amount of predictive power of the income
b(Inf) is the amount of predictive power of infrastructure
c(Pref) is the amount of predictive power of Preferences
d(Comm) is the amount of predictive power of the particular type of Community.
C is the y-intercept.

Similarly for Willingness to Pay:

WTP = m(a(Inc) + b(Inf) + c(Pref) + C

The equation should be the same but the variables should have different weights.

Hypothesis 1 was

H1: Despite current decision not to use consumer surplus for hot water heaters, residents with no service DO value hot water and are willing to pay for it but use of their consumer surplus may be constrained by cross-price elasticities and market failures or, relative to their income cielings, they may find conventional heaters to be inferior goods.

This hypothesis would be supported if it could be shown that while income per se might have little effect, infrastructure and other household attribute sets and preferences seem to be key determinants with higher predictive power. It would also be supported if we can show that there are residents with no conventional heaters who are still willing to pay for improved hot water. The data from the Winter and Spring surveys combined can do this. The spring survey, which contains income information and information about hot water pipes (infrastructure) and ceramics in bathroom (investment in bathroom as an important space) would also be used to test this hypothesis. This hypothesis is more general.

Hypothesis 2 was

H2: Ceterus paribus, residents of Manshiyat Nasser/Darb El Ahmar are more likely to adopt hot water service technologies (i.e. will rank higher) that can hedge the uncertainty of their income streams, i.e. that offer the possibility of voluntary service suspension when income is low and that can be controlled by heads of households so that the yielding of other family member to the temptation of uncontrolled use does not put the family into financial difficulty

But since the data from the Winter Survey showed no correlation between type of work of the household (certain/uncertain) and heater type or WTP and the survey couldn't capture all the nuances of how people estimate their income streams relative to their purchases the hypothesis should be revised to something that can be supported by the data. What emerged from the study is significant differences between the two communities being studied with regards to hot water choices, attitudes and WTP. As this is a conclusion of the study it should be reflected in the hypotheses.

A better second hypothesis that is testable with the data is:
H2: Ceterus paribus, amongst adjacent poor populations with similar income constraints, job uncertainties and market choices, hot water system choices and willingess to pay for improvements are more likely to be influenced by cultural factors related to previous patterns of municipal dependence or self-provisioning. It is hypothesized that the Zabaleen, as recent minority immigrants to the city who experience greater marginalization (exit) from municipal services are less likely to adopt conventional sources of hot water that tie them into a relationship with uncertain municipal structures, but should nonetheless have the same WTP as Darb Al Ahmar residents with a long association with city provided amenities (loyalty).

As the data show a higher WTP among the Zabaleen yet a weak correlation with presence of absence of a conventional heater (which is significantly different between the two communities) the predictive power of community in this model becomes more manifest.

Inc would be the monthly reported income in the spring survey in one model, and presence of a Black and White or Color TV (BW TV owners show negative correlations).
Inf would be proxied by "hpb" (hot water pipes in bathroom)
Pref would be proxied by Ceramics in bathroom and "presence of a European style toilet" (an indicator of cultural assimilation to modern urban norms.
Comm is the community affiliation.


And here is how things are shaping up in the next step thanks to my advisors advice:

For H1: the following would be the mathematical expression of the hypothesis, with the focus on whether infrastructure investment and household preferences matter, and how, controlling for household attributes (such as income and education)

Demand for hot water = f(household attributes, investment in household infrastructure, household preferences, presence of conventional heaters)

For H2: the following would be the mathematical expression of the hypothesis, with the focus on ethnicity, length of time lived in the community, and source of hot water (municipal vs. self-provisioning)

Demand for hot water = f(ethnic dummy, cultural factors, municipal vs self-provisioning, income, employment, other household attributes)

In both empirical analyses, we are aiming to explain the variation in demand for hot water.

T he next step is to devise a table that identifies the measures for:
H1:
* dependent variables that measure demand for hot water (include construct - are the variable responses dummies?, categorical?, numerical?)
* independent variables that measure household attributes, investment in household infrastructure, household preferences, presence of conventional heaters (include construct - are the variable responses dummies?, categorical?, numerical?)

For H1, it is recommended that we run the summer and winter survey data separately since we're focused more on infrastructure and preferences

H2:
* dependent variables that measure demand for hot water (should be the same as above; include construct - are the variable responses dummies?, categorical?, numerical?)
* independent variables that measure ethnicity (dummy variable), cultural factors, municipal vs self-provisioning, income, employment, other household attributes (include construct - are the variable responses dummies?, categorical?, numerical?)

For H2, it is recommended that we combine the two survey data sets as we're interested in ethnicity, length of time in settlement, and source of hot water (only include variables that are in both surveys).

Once we have a list of variables for both H1 and H2, then we go back to the correlation tables and see what's going on there in terms of high correlations within the dependent variable list and the independent variable lists.

We need a list of measures/variables that match the categories listed in the specifications without including any low response variables.


Since reported income values had very low Pearsons r's does that mean we drop it from the model?

No, not necessarily - just because income isn't highly correlated doesn't mean it should be thrown out.

We should however throw out highly correlated independent variables (we need to choose one and throw out the other highly correlated variables from the model, but then substitute them back in to check the robustness of the models - these other models should be put in an Appendix.


Same with employment? Or do we keep them in despite the poor correlationn? We eliminate anything that had an r less than .4 right?

As above, we need to be sure your independent variables are not highly correlated before developing our final list of independent variables.

Also, are low response variables ones that have a small N? What constitutes low (i.e. if the sample in one community with winter and spring combined was 460 households but only 219 responded to the question (as in the WTP question) does that mean we don't look at it?

Yes, this would be a candidate for dropping out that variable - the reason for this is that with a multivariate model, having only 219 responses to that question means that 219 is the most responses that would be included in the model (if there are missing responses, those respondents are taken out of the list for the multivariate model even though they may have responded to other variables included in the analysis).

And what happens when the r was .4 or above for one community but not for the other?

For the models where each community survey is analyzed separately, you would leave out the highly correlated variables for that survey data.

Also, for H2, if we are going to combine both surveys we can't include income because it wasn't in both (only in the spring) and one of the key cultural indicators (presence or absence of a European toilet) was only in the Winter survey. Yet there is more statistical strength in combining both. Perhaps we don't need to have those variables in as we can show that the community itself is the strongest predictor of WTP?

We can't include any variables that are not included in both surveys, so unfortunately, those variables only in one survey would need to be left out.








I was aware as I started doing the surveys that I couldn't really capture demand for hot water per se.  I lost a lot of time working with a few households trying to figure out ways to quantify the amount of hot water used, thinking that I need a quantitative metric.  I moved into the slums into an apartment building that had no hot water where the landlady was boiling water on the stove. I knew that the landlady was using approximately 20 liters of hot water each time she bathed because that was the size bastila (heating bucket) on her stove. But I couldn't get from her how many times she filled it because she would say "yanee, kaza marra" which means "you know, as often as I need to". And as bathing habits are very personal I couldn't get better answers from anyone else.  I went out and bought a 10 liter bastila and a 20 liter bastila and a small portable gas heating stove (one-eye) from the local vendor and experimented for a month with my wife, seeing how much we used (to get a sense of personal hot water demand) but found it varied depending on how much time we had in a given day and how dirty we felt or what the temperature or humidity outside was.  If we felt it too much of a hassle to wait to heat the water we used a smaller quantity. When we had more time we heated more. The perceived convenience or inconvenience on any given day determined the quantity consumed, hence our "demand for hot water" and this is almost unknowable when dealing with the dynamics of entire families. Since there is no submetering for hot water in Egypt the way there is for Germany, there is no way to know how much hot water is being used.

Thinking myself clever, I built a solar hot water system on the roof and plumbed it to our apartment and to the landlady's  and installed a hot water meter for both  of our apartments.  I then knew we had a fixed quantity of hot water each day -- 200 liters, stored on the roof.  My wife and I decided to keep using the portable stove so that we could thus measure only the hot water used by the landlady.  I could simply go on the roof and read the meter each day.  But what I quickly learned was that once she had convenient hot water at the turn of a tap she tended to use it all.  I would come home and all 200 liters would be used -- 10 times what she reported using when I asked her.  And that is when we discovered through observation that she was washing the floors with it, inviting relatives over to bathe etc.  So hot water demand had a certain elasticity depending on its convenience.  But we still couldn't say "those who have appliances use more than those who have to heat on the stove" because we found that many families unplugged their electric heaters to keep their prices low, and that electric heaters of the normal 40 to 50 liter size take a half an hour to an hour to heat up and are considered very inconvenient.  Gas appliance heaters are the most convenient (instant on-demand hot water and the least expensive) but we couldn't find much really hard evidence that families were using more hot water as a result, although the data was suggestive; one of the problems is that, due to perceptions of gas heaters being dangerous (since I was nearly blinded by a gas heater steam explosion in one of the apartments we lived in I understand; it might have been safe when originally installed, but the diaphragm and pipes had gone bad with time) they weren't very popular and the sample size was low.

So I realized we can't really get at "hot water demand" at all.  Thus, I concluded that all survey data could really show was "demand for hot water appliances" and willingness to pay for "more convenient, safer hot water". I can't even really say that we are looking at demand for more convenient hot water per se because respondents reported that electric heaters and gas heaters had their own problems and we verified that by installing an electric heater for a year and testing it (the danger of electrocution in our ungrounded building made us abandon it after repairing it for the second time when the bad water quality made the heating element explode), and living in an apartment with a gas heater and testing that for a year. I began to agree with some of our neighbors that "since you heat water on the stove to cook anyway, might as well just heat a little more to bathe with."    I wrote in my conclusion chapter:

"Our goal in chapters \ref{Chapter:Lenses} and \ref{Chapter:WTP} was to provide preliminary evidence on the relative importance of income, infrastructure (a question of both prices and information) , and preferences in explaining the fact that some households have no convenient hot water source. We began by showing that even within quite narrow income categories, the fraction of households purchasing hot water heaters is seldom close to either one or zero. We concluded that income cannot be the whole story for why some households are lacking in hot water heaters."

Now as I work on the model I get confused as to what it means to say "demand for hot water" and even "demand for convenient hot water".   So once again the original intent and wording  of my hypotheses is hanging me up, because I see all the complexities and they overwhelm me. 

I should be saying that my dependent variables are "demand for conventional market supplied dedicated water heaters" and "WTP for a hypothetical safe, convenient and reliable dedicated water heater" since that is exactly what the data shows, right?



It would seem that a dummy variable could be constructed that puts together both the existing demand for a conventional hot water heater AND the WTP for an improved hypothetical system because in both surveys we see a higher demand for conventional systems and a lower WTP among Darb Al Ahmar families and a lower demand for conventional systems yet a higher average WTP for the hypothetical among Zabaleen families.

In this case presence of conventional heater is not an independent variable but a dependent variable. It is more akin to the hypothetical system in that its presence indicates an existing WTP for an existing market good with perceived costs and benefits.  Both electric and gas heaters CAN be made safe and reliable if one is willing to pay all the ancillary and infrastructural costs (better electrical wiring and grounding, leak proof pipes, safe installation area, well-functioning gas regulators and diaphragms made from high quality parts, finished walls and floors with leak-proof ceramics, safe and hygenic water storage and on-demand pumps for the times when water is cut in the community, UPS or other forms of electrical power back-up for the times when electricity is cut in the community, routine servicing and maintenance, understanding of how to set thermostats and replace them etc.), and it is assumed that when families forego these appliances it is because they have made the rational decision that the costs of making them function in a convenient, reliable and safe manner outweigh the benefits.



In my spring survey I do have a question where I ask the maximum WTP for a conventional system and a question about the WTP for the hypothetical system, because I had thought that through.

If I can restate my hypotheses to reflect these realities  the functions would be:

For H1:



Demand for conventional hot water appliance  = f(household attributes, investment in household infrastructure, household preferences)

Demand for hypothetical hot water appliance = f(household attributes, investment in household infrastructure, household preferences)




They could conceivably be combined into:

Demand for dedicated hot water appliance with contingent attributes = f(household attributes, investment in household infrastructure, household preferences)

For H2: the following would be the mathematical expression of the hypothesis, with the focus on ethnicity, length of time lived in the community, and source of hot water (municipal vs. self-provisioning)

Demand for conventional hot water appliance = f(ethnic dummy, cultural factors, municipal vs self-provisioning, income, employment, other household attributes)
Demand for hypothetical hot water appliance =  f(ethnic dummy, cultural factors, municipal vs self-provisioning, income, employment, other household attributes)

They could conceivably be combined into:

Demand for dedicated hot water appliance with contingent attributes = = f(ethnic dummy, cultural factors, municipal vs self-provisioning, income, employment, other household attributes)

Update:

After much debate we've decided on dumping the spring survey for now - it looks like we have good data for the winter survey (though it may overestimate the demand for hot water?).

For the ranking of appliances/extras given more money, this is what we will try to do - hopefully, each of the appliances was coded as a separate variable (getting a rank per respondent). If not (i.e., we coded a variable in a different way to accommodate all rankings in one variable somehow), then we need to recode/redefine the "hot water heater" variable to be the ranking of the respondent of the hot water heater (but recode as a dummy so that 1 is the 1st ranking and 0 is the 2nd or 3rd or below ranking - leave blanks for this question blank for the recode).
 
So we need 4 tables (2 for each research question, and 1 each using the two different dependent variables) using non-correlated independent variables (remember to follow the mathematical specification for the list of independent variables to include:

For H1: the following would be the mathematical expression of the hypothesis, with the focus on whether infrastructure investment and household preferences matter, and how, controlling for household attributes (such as income and education)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(household attributes, investment in household infrastructure, household preferences, presence of conventional heaters)

For H2: the following would be the mathematical expression of the hypothesis, with the focus on ethnicity, length of time lived in the community, and source of hot water (municipal vs. self-provisioning)

Demand for hot water (presence of hot water appliances or ranking of hot water heater) = f(ethnic dummy, cultural factors, municipal vs self-provisioning, expenditures as income proxy, employment, other household attributes)

So now we need to construct  the 4 tables.

How to include construct?

"Our next step is to develop an appropriate multiple regression model in order to study the relationship of WTP for an improved water heating system with infrastructure and cultural attributes such that we capture both the separate influence of each of these explanatory variables net of the other and their interaction effect on WTP. To be able to use the regression model we need to construct dummy variables since our explanatory variables are categorical in nature.

Let DI and DC be the dummy variables for Infrastructure and Cultural Preferences.

DI = 0 if infrastructure is ceramic in bathroom = false (not present)
= 1 if infrasture is ceramic in bathroom = true (present)

Monday, September 14, 2009

Tests of Electricity Consumption of Hot Water Heaters in Egypt

Moon Beach Electric hot water heater tests.

1:05 p.m. Wednesday after recovering from food poisoning.
Kilowatt meter was 3.188 KW
at 25 piastres per kWh cost per day = 1 LE
Month: 29.70 LE

1:10 p.m. reset meter to 0

10 minute shower uses all 40 liters of hot water
1.130 KW gets water to 52 degrees C in 1 hr.
5 minutes of water use drops the temperature from 52 degrees C to 33 degrees C.
Reset meter at 5 p.m.

5:30 p.m. Wednesday
30 minutes of heating consumes 602 Watts
Gets water to 47 degrees C.
After 3 minutes of shower it is 31 degrees C.

7:00 p.m. Wednesday
After 90 minutes of heating shut off at 1.575 Kw

Experiment at 7:35 p.m. 19:35 = 1.710 Kwh
19:39 = 1.784 kwh
19:51 = 2.034
21:30 = 3.218
12:45 a.m. = 3.460
9:40 a.m. Thursday = 3.914

consumed 2.204 KwH maintaining temp over night in 14 hours
157 Watts per hour. This means in 24 hours it would consume 3.7 KwH just on standbye. That would cost 94 piastres.

Thursday morning at 10:01 a.m. the meter reads 3.914
We take shower
at 10:05 a.m. it reads 3.995 Kwh
Rises to 4.118 KwH by 10:15
By thursday at 1 p.m. it reads 6.169 KwH.

Thursday:
Hotel Water at Moon Beach
Hot water temp is 67 degrees C at 1:55 p.m.
Cold water temp is 22 degrees C
At 2 p.m. hot water temperature is 54 degrees C.
Drain down test. Unplug the hot water heater and let the shower run:
Start at 67 degrees at 1:55 p.m.
2:00 p.m.: After 5 minutes: 54 degrees
2:01 p.m.: 1 minute later: 44 degrees
2:02 p.m.: 1 minute later 35 degrees
2:03 p.m.: 1 minute later 32 degrees
2:04 p.m. 1 minute later: 29 degrees

2:05 p.m. 26 degrees
2:13 p.m. 23.5 degrees
2:16 p.m. 22 degrees -- the cold water start temperature after 21 minutes of use.

2:35 p.m. Plug in the hot water heater (had to use aluminum foil because outlet was loose. God provided outside the door of the neighbors hotel room on a plate of used food!)

1097 Watts (initially 1137 for spike start up)
2:41 using 1120 Watts
2:56 using 1106 watts
3:06 still heating using 1115
3:28 still heating using 1103 watts
3:43 still heating using 1113 watts (total consumed 1.294 KwH after 1 hour, so the heater consumes about 1.3 Kw per hour)
3:45 using 1330 and still heating.
3:52 it has consumed 1.460 KwH and still heating
I leave the room.
I reaturn at 6:05. Heater is off (thermostat switched it off, must be at a real high setting).
The meter shows total consumption of 2.345 KwH. It must have turned itself off after about 2 hours.
The max watts used was 1218.

At 6:10 PM I shower for five minutes.
This turns on the heating element.
After 5 minutes it says 90 W and it stays on.
6:15
6:20 says 192 Watts after the heater stays on. It seems to be using about 1 watt every 3 seconds or so.
It is consuming 20 Watts in 60 seconds.
At 7:00 p.m. the heating element is still on, and it has used 1 Kw.
At 8:00 it turns off, reading 1.635 KWh

At 9:30 P.m. without any additional use at all the meter reads 1.748 Kwh.
At 11:50 p.m. without any additional use of the hot water the meter reads 1.856 KwH
At 12:30 a.m. it reads 1.969 KwH.
At 2:40 a.m. it reads 2.040 KwH.

It appears that due to thermal losses from what must be very poor insulation in the Olympic Water heater tank, along with a very high thermostat setting, the heater in standbye mode keeps cycling on and off all night just to maintain its high temperature (far too hot for a shower, but as we saw it quickly cools down, giving at best a 5 minute shower from its 40 or 50 liter tank, the normal size for most of Cairo).

At 9:30 a.m. when we wake up the heater is on and the consumption is now 2.543 KwH! It is on for 10 minutes and then shuts itself off. That means that since 8:00 p.m. the previous night, in a 13 and a half hour period it has consumed .908 Kw (almost a kilowatt) just in standbye overnight (about 67 Watts per hour). That means if you go away for 24 hours your water heater, if you don't unplug it, will consume a minimum 1.6 KWh just maintaining its thermostat setting. But the previous day it was consuming 157 watts per hour. Must depend on the starting temperature. No wonder the majority of Egyptians in my study area unplug their heaters.

Of course they could open them up and change the thermostat setting to 40 degrees, but very few people know how or have the confidence to do this.

The total cost for the amount of current used for taking one shower a day and keeping the heater on so it is every ready (2.543 KwH) is 58 piastres at 25 piasters per KW at 67 watts per hour and 93 piastres when consuming 157 watts per hour. This is why the average on the meter read about 1 LE per day. For the bourgeoisie, at subsidized electric rates, this is tolerable. A half a pound to a pound a day, which is about 10 to 20 cents a day, this is nothing. but if you are earning a dollar or two a day then 1/10th of your salary is consumed for a shower. This is intolerable.