26 August 2011

THEOMETRIKA 2: CANADIAN CATHOLIC MSM WORD/PHRASE SEARCH STATISTICS, INCLUDING APPLICATION OF THE MANN-WHITNEY U-TEST

I. Previously in Theometrika 1 searches were performed at the CCCB site to ascertain the frequency usage of commonly used Catholic-related words/phrases for whatever text issued from Star Chamber headquarters. As it turned out, not unexpectedly, the linguistics were skewed to more leftist/bureaucratic sprechen. Orthodox-inclined words/phrases, be they related to Church teaching and moral/life matters came in a clear second place. Remember people, Conciliarism is existent therein to which SV2 lefty language is part and parcel and things aren't changing anytime soon. Excluding a pan-Canadian episcopal dismissal ordered by Cardinal Ouellet, or a rapid 100 km shift in the North American continental plate, it's business as usual in Ottawa. Accordingly, the miters must still be rattled so as to wake up the boys from their apostatic slumber. Judging by the recent past (e.g. CCODP abortion scandal), rattling would seem a futile undertaking. Still, it's worth a try. Anyhow, I like to annoy people. It's a lot of fun.

II. The same type of analysis is effected in this post except that the spotlight is on the Canadian Catholic MSM. Before we get going, let's assign some acronyms for simplicity's sake: WCR=Western Catholic Reporter, BCC=B.C. Catholic, CR=Catholic Register, PM=Prairie Messenger, SL=Salt and Light Media, CI=Catholic Insight.[1] Regarding the latter, Fr. de Valk and his team at CI are not members of the club because of the periodical's straightforward orthodoxy. Yet it is included for comparison purposes. It wouldn't even be much of a stretch to assume CI to be a benchmark or "standard" as it is Catholic orthodoxy per se which we are attempting to gauge for this exercise in phraseological metrics. Professional Catholics likely would disagree with this assumption, and quite vehemently I might add. Too bad. The big guys all regularly mention one another or will in some way or another link to each other's websites, but to the exclusion of CI, which speaks voluminously. Nonetheless, facts are facts and there's nothing like the cold, harsh jolt of numerical reality to again demonstrate that the aggiornamento commencing circa 1965 has worked against authentic Catholicity, let alone being a multi-decadal fiasco. Justifiably, it could now be argued that with the last few sentences judgment has been pronounced prior to exploratory data analysis. Granted. However, the first two paragraphs of this post were written after the numbers were crunched so yours truly already knows the results.
III. The internal search engine at the CCCB was employed for Theometrika 1. This time a Google "exact phrase" search was used to scan particular urls of BCC, WCR, PM, CR, SL and CI. Both heterodox-inclined (left, liberal, bureaucratic) and orthodox-inclined (traditional Church teaching, moral/life issues) words/phrases are respectively grouped in notes [2] and [3] for the reader's perusal. The words/phrases selected are the mostly the same as in Theometrika 1 save the removal of some generic words, so to speak, which could go "either way" as they are often evoked by both sides (these words are God, Jesus, Mary, family, love, hate, mercy). The first part of this analysis involves frequency counts, subdivided into the following five categories: (i) Greater Than 1000 Mentions, (ii) 501 to 1000 Mentions, (iii) 101 to 500 Mentions, (iv) 51 to 100 Mentions, and (v) Less Than or Equal To 50 Mentions. The websites for BCC, WCR, PM, CR, SL and CI vary as to their textual content so total numbers will vary widely from website to website. This is why it is also important to consider percentages of one word/phrase group relative to another. Moreover, the Mann-Whitney U-Test (described below) is another tool that can be utilized to check if differences between word/phrase group frequencies are statistically significant. As before, word/phrase context and website logistics will influence results. However, this is somewhat compensated for as six separate sites are analyzed. More on this later. The searches were performed on August 20.
IV. Right. Let's begin. We start on the West Coast...


Greater Than 1000 Mentions: None apply.
501 to 1000 Mentions: 101 to 500 Mentions: None apply.
101 to 500 Mentions: government (229), schools (181), education (176), rights (175), poor (153), pro-life (127), Rosary (125), meeting (120), poverty (103).
51 to 100 Mentions: abortion (97), commission (95), committee (94), euthanasia (88), document (82), health care (81), social justice (77), environment (70), dialogue (68), economic (64), report (62), pro-abortion (59), same sex (52).

Less Than or Equal To 50 Mentions:
crime (44), refugee (41), session (40), immigration (37), aboriginal (36), interfaith (29), administration (26), United Nations (25), morality (25), economy (23), slavery (23), structures (22), ecumenism (21), catechism (21), publication (20), morals (20), chastity (18), life issues (16), Magisterium (14), homosexual (9), right to life (9), motherhood (9), contraception (8), unions (6), gay (6), climate change (5), devotions (5), fatherhood (4), racism (3), NFP (2), fetal (2), global warming (1), stem cell (1), papal authority (1), heretic (1), cloning (0), RU-486 (0), Winnipeg Statement (0).

It is refreshing to see the inclusion of "pro-life" and "Rosary" in the most mentioned category. Still, they didn't beat "rights" and "education". In the 51-to-100 category, life/moral words do okay, but so does "committee" and "social justice". The Less-Than-50 category seems a hodgepodge, yet leftoid words are still the most mentioned therein. I've always been undecided about the loyalty of BCC. It's a long-running periodical (established 1931) but the overall low frequency count suggests that not much content has been uploaded. Note well that the Archdiocese of Vancouver insignia is plastered on the website's header for all to see, so we can make a reasonable guess as to what's happening over there in terms of chancery office influence. Working with the content we do have, the total mentions for all the words/phrases used here is equal to 2951. Of these, 2232 word/phrases are left/bureaucratic-inclined with 719 being traditional/moral-inclined, corresponding to a 76% : 24% split in favour of the former. Interesting.
Eastward to Alberta...

Greater Than 1000 Mentions: catechism (9100), government (2060), poor (1860), education (1830), schools (1690), rights (1310), meeting (1270), poverty (1040).
501 to 1000 Mentions: abortion (946), committee (917), economic (914), environment (749), report (696), commission (668), health care (654), social justice (647), dialogue (628), pro-life (537).
101 to 500 Mentions: Rosary (492), document (460), same sex (412), aboriginal (366), economy (359), morality (358), session (317), administration (296), crime (292), euthanasia (292), United Nations (257), gay (233), structures (230), publication (229), homosexual (204), chastity (204), interfaith (190), unions (183), ecumenism (178), immigration (174), right to life (162), slavery (152), refugee (145), contraception (133), pro-abortion (132), climate change (119).
51 to 100 Mentions: morals (100), global warming (95), life issues (92), stem cell (92), Magisterium (75), cloning (71), devotions (69), racism (65), motherhood (61), fetal (53).
Less Than or Equal To 50 Mentions: fatherhood (36), RU-486 (12), heretic (11), NFP (10), papal authority (9), Winnipeg Statement (5).
The greatest number of mentions (more than 1000) are clearly left/bureaucratic. Like the BCC results, "rights" and "education" frequencies rank high. The anomaly here is "catechism", ringing in at a spectacular 9100 mentions (on August 20). If you check the Google site search for this word, there seems to be an inordinate number of links to a series of articles on the catechism by WCR's editor. Also, frequency counts include links to WCRs old site (recently given a facelift). The word "abortion" is mentioned 946 times and "pro-life" 537 times for the 501-to-1000 category, yet the greater remainder of word/phrases lean left-bureaucratic. The total mentions for all the words/phrases used here is equal to 34941. Of these, 21040 word/phrases are left/bureaucratic-inclined with 13901 being traditional/moral-inclined, corresponding to a 60% to 40% split, respectively. If the anomalous "catechism" is removed, the split becomes 81% : 19%. Interesting.
Time now to set course for the Flatlands...

Greater Than 1000 Mentions: None apply.
501 to 1000 Mentions: None apply.
101 to 500 Mentions: government (349), rights (347), education (268), poor (232), schools (214), meeting (209), dialogue (193), committee (167), report (163), poverty (154), abortion (154), environment (125), commission (121), economic (116), health care (102), document (101).
51 to 100 Mentions: crime (99), pro-life (86), aboriginal (80), social justice (80), session (79), gay (71), ecumenism (62), euthanasia (59), same sex (58), morality (58), interfaith (56), United Nations (54), publication (45).

Less Than or Equal To 50 Mentions:
Rosary (44), immigration (41), administration (38), economy (37), refugee (33), catechism (33), pro-abortion (32), structures (31), racism (29), homosexual (29), chastity (24), life issues (23), climate change (21), slavery (21), contraception (21), unions (19), morals (15), Magisterium (12), right to life (11), stem cell (10), motherhood (10), global warming (8), devotions (8), cloning (4), fetal (3), fatherhood (2), heretic (1), papal authority (0), RU-486 (0), NFP (0), Winnipeg Statement (0).

No surprise here. Remember: Novocaine Pete is in charge of this operation. He's really into dissent n'stuff. Most of the most mentions are left/bureaucratic-inclined. The category of least mentioned are dominated by moral/life issue words. The total mentions for all the words/phrases is equivalent to 4462. Of these, 3694 word/phrases are left/bureaucratic-inclined with 768 being traditional/moral-inclined, corresponding to a 83% : 17% split. Like I said, no surprise.
Our final destination is Toronto...


Greater Than 1000 Mentions: None apply.
501 to 1000 Mentions: report (860), poor (808), meeting (806), government (496).

101 to 500 Mentions:
dialogue (463), education (453), pro-life (434), schools (388), abortion (379), poverty (335), rights (332), economic (326), document (304), chastity (267), Rosary (266), commission (248), immigration (234), committee (229), life issues (227), euthanasia (217), catechism (179), environment (175), session (171), structures (171), ecumenism (153), health care (139), crime (125), United Nations (121), stem cell (121), interfaith (113), fatherhood (110), economy (106).

51 to 100 Mentions: social justice (99), pro-abortion (99), morals (96), slavery (93), morality (88), devotions (88), publication (85), administration (80), refugee (66), right to life (61), same sex (60), motherhood (54).

Less Than or Equal To 50 Mentions:
climate change (46), aboriginal (41), homosexual (41), contraception (41), Magisterium (35), gay (23), unions (22), global warming (19), NFP (14), racism (8), cloning (5), fetal (5), heretic (5), papal authority (1), Winnipeg Statement (1), RU-486 (0).

The highest mention category (101-to-500) has no traditional wordings, just bureaucracy stuff. There seems to be a particular fascination with the word "report". The 101-to-500 category has life issue words, though note that "social justice" ranks equally with "pro-abortion". The total mentions for all the words/phrases is 2951. Of these, 2232 word/phrases are left/bureaucratic-inclined with 768 being traditional/moral-inclined, corresponding to a 76% : 24% split. Very interesting. Even more interesting is a separate search analysis conducted by the author. The word "Rosica" was found to only occur 40, 10 and 5 times at WCR, PM and CI, respectively. However, it popped up at a significantly higher 485 times at BCC and 286 times at CR. The explanation for this discrepancy? Likely it relates to the fact that, unlike WCR, PM and CI, only BCC and CR have Salt and Light video streams embedded at their websites. So there is a special friendship with those two. The data also show that, at the SL site, the word "Rosica" was mentioned 5840 times. "God" and "Jesus" scored lower, at 4310 and 3250 times, respectively. Sorry, couldn't help bringing that scrumptious factoid to light.

Greater Than 1000 Mentions: education (5590), rights (4870), government (3500), schools (2950), poor (2490), meeting (2080), abortion (1690), poverty (1640), pro-life (1500), report (1460), economic (1400), dialogue (1260), committee (1160), environment (1080), commission (1070).
501 to 1000 Mentions: social justice (850), euthanasia (805), health care (762), refugee (716), economy (679), United Nations (678), document (640), interfaith (624), same sex (590), Rosary (518), gay (508).

101 to 500 Mentions:
crime (478), immigration (458), aboriginal (453), session (361), morality (333), life issues (327), administration (314), publication (283), unions (282), climate change (278), homosexual (265), pro-abortion (262), catechism (255), ecumenism (236), structures (222), contraception (213), stem cell (211), right to life (203), chastity (193), morals (188), slavery (165), racism (151), motherhood (113).

51 to 100 Mentions: global warming (96), Magisterium (84), devotions (77), cloning (53).
Less Than or Equal To 50 Mentions: fetal (36), fatherhood (25), heretic (20), NFP (7), papal authority (6), Winnipeg Statement (5), RU-486 (0).
"Education" tops the list at a whopping 5590 (predictably), with "rights" and "government" not far behind. Most mentions, greater than 1000, are leftist-inclined. Out of 15 words/phrases, only two are traditionally-inclined, i.e. "pro-life" and "abortion". Like WCR, "global warming" at CR came in significantly at 96 (95 times at the former). The total mentions for all the words/phrases is 47763. Of these, 39276 word/phrases are left/bureaucratic-inclined with 8487 being traditional/moral-inclined, corresponding to a eyebrow raising 82% : 18% split at "Canada's Catholic News Source". This, too, is unsurprising considering the editorial board at CR.

Greater Than 1000 Mentions: abortion (1390), rights (1300), publication (1170), education (1150), government (1030).
501 to 1000 Mentions: homosexual (895), gay (801), schools (711), euthanasia (689), same sex (645), ecumenism (635), pro-life (632), report (510).
101 to 500 Mentions: document (475), pro-abortion (472), meeting (460), morality (458), contraception (417), commission (392), committee (352), poor (272), unions (264), dialogue (259), catechism (243), economic (223), United Nations (212), Magisterium (211), crime (204), environment (181), health care (160), morals (157), administration (143), poverty (142), social justice (142), chastity (115), right to life (114), structures (102), Winnipeg Statement (102).
51 to 100 Mentions: stem cell (97), economy (95), immigration (76), racism (71), slavery (63), Rosary (54).
Less Than or Equal To 50 Mentions: cloning (50), NFP (45), life issues (44), fetal (40), motherhood (37), refugee (29), fatherhood (25), interfaith (22), session (20), aboriginal (8), papal authority (8), devotions (6), heretic (5), climate change (4), RU-486 (4), global warming (2).
"Abortion" is the most mentioned for the assigned "standard" periodical in this analysis, but so are "rights", "education" and "government", as with the MSM remainder. Given this it could be argued that the similarity between CI's most mentioned and those at BCC, WCR, PM, SL and CR makes this analysis null and void. There may be some bearded baby boomers out there in cyberland rapturously gnashing their teeth at this result. Perhaps one of them is now yelling: "TH2, you smarmy schmuck, you condescending jackass, you... you... you irreverent rascal, you are massaging the data and manipulating results to fit a preconceived orthodox ideology". Not so fast, my hippie heretics. If we mine the data even deeper it is discovered that surface appearances are deceptive. Note that the total mentions for all the words/phrases is 18365. Of these, 10879 word/phrases are left/bureaucratic-inclined with 7486 being traditional/moral-inclined, corresponding to a 58% : 42% split. Note further that, unlike the Catholic MSM data, these percentages are more evenly distributed, i.e. approaching an equal division of 50% to 50%. One might even call them "balanced". Further substantiation of this point necessitates the production of some pretty pictures or eye candy or whatever the kids these days call graphs.
V. DATA VISUALIZATION. Below are a series of histograms evidencing the left-liberal bias in word/phrase usage by the Canadian Catholic MSM. Histograms with the red bars (on left side) denote left-bureaucratic words/phrases and those blue traditional-moral (on right side) for a specific website. The vertical axes are frequency counts and numbers on the horizontal axes correspond to the individual words/phrases discussed above (the so-called "bin"). For the BCC, WCR, PM, SL and CR histograms it is visually obvious that left-bureaucratic words/phrases are most frequently used. Yet it is also evident that the frequency counts between left-bureaucratic and traditional-moral words/phrases are more or less evenly distributed at CI. [Click on images to enlarge/clarify]





VI. MANN-WHITNEY U-TEST. Also known as the "Wilcoxian Rank Sum Test", this is a simple but effective statistical method to determine whether or not there is a significant difference between the medians of two data samples, in this case between the frequency counts of left-bureaucratic and traditional-moral words/phrases (the median is the "middle value" of a data set when numbers are ranked in order of magnitude). Mann-Whitney U-Test is a nonparametric test, meaning that it isn't limited by the characteristics of the population sample, such as being normally distributed (think "bell curve" shape in plotted data). The test can also be applied to groups with differing sample sizes. The Null Hypothesis (Ho) is that the samples are extracted from a common population and thus there should be no consistent difference in the medians between two sets of values (i.e. no bias). There are three possible Alternative Hypotheses (Ha) for this test (i.e. bias): (i) both samples come from populations with different mean ranks, (ii) both samples come from populations with different mean ranks, where the mean rank of X is greater than the mean rank of Y, or (iii) both samples come from populations with different mean ranks, where the mean rank of Y is greater than the mean rank of X. Here we are interested, as the above analysis attests, whether left-bureaucratic words/phrases are more frequently used than traditional-moral phrases, so one of the latter two Ha's are open to us. Procedures: (i) assign identifier (e.g. X, Y) to each sample from both groups, (ii) rank all the samples, (iii) put samples back in their original groups, and (iv) count rankings for both groups. Two equations are then used to calculate the U-statistic:


where, say, n1 is the number of traditional-moral words/phrases and n2 is the number of left-bureaucratic words/phrases.[5] R1 and R2 are the corresponding sums of the ranks. Calculated values (Ucalc) are then compared corresponding critical values (Ucrit) in a table. If Ucalc > Ucrit, then Ho is valid. Otherwise, if Ucalc < Ucrit then Ho is rejected and Ha becomes valid.
Here are the results:
BC Catholic: Ucalc = 140, which is less than Ucrit = 282 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is a statistically significant difference in the frequency usage between the two groups. Inference: biased toward left-bureaucratic words/phrases.

Western Catholic Reporter:
Ucalc = 110.5, which is less than Ucrit = 270 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is a statistically significant difference in the frequency usage between the two groups. Inference: biased toward left-bureaucratic words/phrases.


Prairie Messenger:
Ucalc = 96.5, which is less than Ucrit = 282 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is a statistically significant difference in the frequency usage between the two groups. Inference: biased toward left-bureaucratic words/phrases.


Salt + Light:
Ucalc = 184, which is less than Ucrit = 282 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is a statistically significant difference in the frequency usage between the two groups. Inference: biased toward left-bureaucratic words/phrases.


Catholic Register: Ucalc = 119, which is less than Ucrit = 282 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is a statistically significant difference in the frequency usage between the two groups. Inference: biased toward left-bureaucratic words/phrases.

Catholic Insight: Ucalc = 319, which is greater than Ucrit = 282 at the 5% level of significance (P=0.05), meaning that, with a 95% certainty, there is not a statistically significant difference in the frequency usage between the two groups. Inference: not biased toward either left-bureaucratic or traditional-moral words/phrases.

VII. PRINCIPAL FINDINGS.
The statistical analysis above evidences that, for frequency counts of sampled words/phrases at the five main Canadian Catholic media outlets (BCC, WCR, PM, SL, CR), 76% to 83% of these were left/bureaucratic-inclined, leaving only 17% to 24% traditional/moral-inclined. In contrast, Catholic Insight showed a more balanced distribution, with 58% left/bureaucratic-inclined and 42% traditional-moral inclined. The reliability of these percentages was bolstered after application of the Mann-Whitney U-Test. It was also found that, regardless of website, the most mentioned words were "government", "education" and "rights". The least mentioned (if at all) were "cloning", "RU-486", "NFP", "papal authority" and "Winnipeg Statement", excepting at Catholic Insight which yielded a count of 102 for the latter (CI's highest overall mention was "abortion")
.
VIII. Like any such study, there are caveats. Legitimate questions would be: To what degree do the words/phrases reflect a specific subject of concern or a writer's bent to either orthodoxy or heterodoxy? To what degree are the word/phrases employed in this analysis representative of the contemporary Catholic lexicon? How does the Google search engine and related IT matters affect results? The search data are available above for anyone to use to explore these issues. That's what exploratory data analysis is all about... Right? But I think the results herein speak for themselves. Incidentally, click here to subscribe to Catholic Insight ;)

NOTES / REFERENCES 1. The Interim and LifeSite News were purposely excluded from this analysis since their prime focus is on life/family issues and that they do not necessarily define themselves as "Catholic". Recall: as an adjective the word "catholic" means universal, all-encompassing, wide variety, etc. 2. Heterodox-inclined words/phrases (left, liberal, bureaucratic): commission, document, dialogue, session, structures, immigration, education, publication, unions, administration, report, committee, crime, rights, government, interfaith, refugee, ecumenism, poverty, aboriginal, meeting, economic, economy, social justice, environment, poor, United Nations, health care, schools, racism, climate change, slavery, global warming. Sample size: n=33. 3. Orthodox-inclined words/phrases (traditional Church teaching, moral/life issues): abortion, catechism, homosexual, euthanasia, life issues, stem cell, right to life, Rosary, cloning, same sex, morality, contraception, morals, Magisterium, fatherhood, gay, pro-life, papal authority, RU-486, pro-abortion, chastity, devotions, motherhood, NFP, fetal, heretic, Winnipeg Statement. Sample size: n=27. 4. H.B. Mann and D.R. Whitney, "On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other", The Annals of Mathematical Statistics, March 1947, vol. 18, no. 1, pp. 50-60. A simpler description of the Mann-Whitney U-Test is given in D. Ebdon, Statistics in Geography (Oxford: Basil Blackford Incorporated, 1985), pp. 58-61. 5. A Ucrit table with sample sizes greater than 30 could not be found by author. Thus n=33 for left-bureaucratic words/phrases was reduced to n=30. The least 3 mentioned words per website were excluded from calculations, which turned out to be "global warming", "climate change", "racism", "unions" and "aboriginal". Also, the anomalous "catechism" was excluded when Mann-Whitney was applied to WCR (thus n=27 becomes n=26).
I like numbers. They're super duper !

Share/Bookmark

6 comments:

AllenT said...

1 of my questions is to what extent are the "left-bureaucratic words/phrases" that pop up on Catholic Insight there because  CI's witers are critiquing the left's take on them?

TH2 said...

To a very large extent. That you picked up on that once again proves to me that Al is one of the most discerning people I know in the blogosphere. As I once said at your blog some time ago, you are an "eagle eye". The average 80:20 slant for the remaining outlets evidenced here indicates a "preoccupation with" and/or "assent to" what I generalize as "left-bureaucratic".

Fluffy Kerpuffle said...

What - no calculus?  You fail me, TH2. I really had derived that calculus was integral to this post. 

In my spare time, I'm going to do these same searches for Vogue, Cosmopolitan, Mother Earth News, and Time.  And maybe Al Jazeera.

TH2 said...

No calculus, though I was initially thinking of performing a Markov Chain analysis, with probability matrices etc. Decided against it. Mann-Whitney U was deemed satisfactory.

Lola Really said...

I remember as a kid watching Carl Sagan's 'Cosmos'.  He said that Mathmatics was the language of the Universe.  He was an Athiest, but I figured God must be pretty good with numbers.

TH2 said...

Carl Sagan: Really good at numbers, really really really bad at theology.

Post a Comment