DRAFT Extraverted Women Prefer Java (and Intraverted Women Do Not): A Preliminary Study of Myers Briggs Types and Programming Language Preference Ronald P. Loui, Ph.D. University of Illinois-Springfield rloui2@uis.edu Abstract In a study of 200 Master’s students in CS, several conclusions can be drawn with respect to programming language preference and Myers-Briggs personality type. The most significant, and robust, is the preference for Java among extraverted females and dislike of Java among intraverted females. Men display a much weaker difference. One implication is that a curriculum that starts in one language or that requires one programming paradigm is more likely to lose women than men. Introduction It is well known that many programmers have a strong preference for one programming language, or a strong dislike. Many programmers will defend one paradigm, such as object oriented programming, while others will attack it. A common way of investigating strong personal preference is to look for frequency distinctions among personality types in a population. Perhaps the best-known way of “sorting temperaments” is based on Jung’s theory, as adapted by Katharine Briggs and Isabel Briggs Myers, over a half century ago. It remains one of the longest-lasting theories of personality, and is especially pertinent because it aims at differing styles of information processing and decision making. Myers Briggs Type Indicators (MBTI) have been correlated with all sorts of professional competence, including a long tradition in software engineering (pair programming / Katira et al. 2004, code review / Cunha and Greathead 2007, team compatibility / Gorla and Lab 2004) and in student learning and retention (e.g., Koubek et al., 1985, Felder 1988). MBTI has been used in the design of systems even before it was popular as a framework for understanding software engineers and programming students. L. F. Capretz and his collaborators have been studying these issues for decades (e.g., Capretz 2003). MBTI is so ubiquitous that it is not reviewed here. The interested reader can find numerous references in the popular literature, or can consult any of the references included (e.g., Westbrook 1988). It is important to note that the main conclusion of this paper has to do with gender and extraversion, and the extraversion/intraversion trait is part of several different personality theories, not just the Myers Briggs/Jung theory, e.g., Eysenck, Big Five and HEXACO. Hence, some of the conclusions are specific to MBTI personality theory, but the most important conclusion is not. (There is actually more agreement on the existence of an E/I personality difference than there is agreement on whether to use the terms extraversion/intraversion, or the terms extroversion/introversion; the author defers to the original authors and uses the former, though the latter are more popular.) For those unfamiliar with the MBTI E/I dichotomy, the essential question is whether a person expends "mental energy" with large groups of people, and whether a person is outwardly directed toward social gatherings or constructs. For example, a President might be very good at public speaking, and greeting large crowds of strangers, but may still take greater pleasure in solitary activities, or have a need to "recharge" with close family after large events. A doctor might possess a coolness toward people, and might tend toward professional reserve in social gatherings, but he could still be extraverted if standing in the community were extremely important to him. The other MBTI dimensions refer even more explicitly to information-processing styles in their characterizations. This Study This study looks at 200 students enrolled in CSC570, an on-campus graduate readings seminar (alternately on cyberwarfare and computer surveillance) populated mainly with Computer Science Master’s students from India, and CSC368, an online systems programming language course, populated with upper class majors in computer science. Significantly, 52 students were female, which is fairly large for a homogeneous group of computer scientists. For these courses, over a third of the Indian Master's students were female (44 females, 122 males). Among non-Indian students, almost a third were female (7 females, 26 males). The Indian student population was mostly equal in age (almost all were about 23 years old), background (all were CS or EE undergraduate majors who chose the CS Master's program over the MIS), and origin (almost all were from Hyderabad and recently arrived in this country). Polling was conducted over three semesters (nine months). The study had two different styles of MBTI determination. Using our first method (2014 spring and summer semesters), students were asked questions which were scored, and scores were thresholded to make determinations. Using our second method (2014 fall semester), students were instructed in the MBTI dimensions and asked directly, which dimensions best fit their preferences. This difference in method is used to study robustness with respect to method of determination. Other useful partitions of the population included by-class, by-country, and by-gender, in addition to by-academic-year. In all cases, a binary preference was sought (E vs. I, N vs. S, T vs. F, J vs. P), instead of an identification with one of the four main groups (NF Idealist, NT Rational, SP Artisan, SJ Guardian), or with one of the sixteen full-defined (four-dimension) types. When scores did not display a strong bias, the dimension was recorded as borderline, e.g., E/I. This is an important point, since borderline personality dimensions appear in measurement systems despite being disclaimed by theory. In our cohort, 37 students had at least one borderline dimension, 13 had at least two, and 4 had three borderline dimensions. One student had all four dimensions recorded without preference, and was excluded as non-contributory. This left 163 students with completely sorted MB types (all four dimensions determined). The Java vs. Scripting preference was never recorded as borderline (many students expressed ability and enjoyment in both styles, but ties were recorded as a Scripting preference). The study was originally designed to discover if there might be a preference for scripting among P-types. The data at first suggested that NT types preferred scripting, though the significance of difference disappeared with more data. Unable to establish any significant personality-basd preference for scripting (though see observations below), the main determination related to gender and intraversion/extraversion emerged with no prior expectation. In fact, this data set was not even intended for gender-based study, and many Indian genders had to be determined using google on the first names (the surveys did not explicitly ask for gender). Main Determination Among female extraverts, 25 preferred Java, while 7 preferred a scripting language (awk, perl, php, javascript, tcl; ruby and python were not offered as options). Among female intraverts, 8 liked Java while 10 preferred a scripting language (2 women were E/I borderline). With Fisher’s exact test, the two-tailed p value is less than .028. One-tailed is p<.018. Chi-squared with Yates correction two-tailed p<.036, one-tailed p<.018, and without Yates correction, two-tailed < .016, one-tailed <.008. The proportions are clearer: AMONG FEMALE EXTRAVERTS: PREFER JAVA: 25/32 = 78% AMONG FEMALE INTRAVERTS: PREFER JAVA: 8/18 = 44% Also reversing the conditionalization, AMONG FEMALES who PREFER JAVA: EXTRAVERTS: 25/33 = 76% AMONG FEMALES who PREFER SCRIPTING: INTRAVERTS: 10/17 = 59% Among men, a weak preference for Java is there, but not significant. 11 were borderline. Male extraverts preferred Java 55 to 23. Intraverts preferred Java 39 to 20. So both male E and I groups preferred Java. Hence, among AMONG MALE EXTRAVERTS: PREFER JAVA: 55/78 = 71% AMONG MALE INTRAVERTS: PREFER JAVA: 39/59 = 66% Also, AMONG MALES who PREFER JAVA: EXTRAVERTS: 55/94 = 59% AMONG MALES who PREFER SCRIPTING: INTRAVERTS: 20/43 = 47% In raw frequency, contingency table form: FEMALES JAVA SCRIPTING E 25 7 I 8 10 MALES JAVA SCRIPTING E 55 23 I 39 20 And in percentages: FEMALES JAVA SCRIPTING E .50 .14 I .16 .20 MALES JAVA SCRIPTING E .40 .17 I .28 .15 Robustness to Personality Subcategories The categories with the sizable differences for females show some consistency, whether restricting personality types by one, two, three, or four dimensions. * denotes a trivial sample size of 1 or 2 below: One dimension (as reported above): FEMALE E: 78% PREFER JAVA (25/32) FEMALE I: 22% PREFER JAVA (8/18) Two dimensions: FEMALE EP: 92% PREFER JAVA (11/12) FEMALE FP: 92% PREFER JAVA (11/12) FEMALE EF: 87% PREFER JAVA (13/15) FEMALE ES: 85% PREFER JAVA (17/20) ... FEMALE IJ: 62% PREFER SCRIPTING (5/8) FEMALE IN: 60% PREFER SCRIPTING (3/5) Three dimensions: FEMALE ESP: 100% PREFER JAVA (7/7) FEMALE EFP: 100% PREFER JAVA (7/7) FEMALE ENF: 100% PREFER JAVA (3/3) FEMALE NFP: 100% PREFER JAVA (3/3) FEMALE ITJ: 100% PREFER JAVA (2/2)* ... FEMALE ITP: 100% PREFER SCRIPTING (1/1)* FEMALE IFJ: 60% PREFER SCRIPTING (5/5) Four dimensions: FEMALE ISTJ: 100% PREFER JAVA (1/1)* FEMALE INTJ: 100% PREFER JAVA (1/1)* FEMALE INFP: 100% PREFER JAVA (1/1)* FEMALE ESFP: 100% PREFER JAVA (5/5) FEMALE ESTP: 100% PREFER JAVA (2/2)* FEMALE ENFP: 100% PREFER JAVA (2/2)* FEMALE ENFJ: 100% PREFER JAVA (1/1)* ... FEMALE ISTP: 100% PREFER SCRIPTING (1/1)* FEMALE INFJ: 100% PREFER SCRIPTING (1/1)* FEMALE ISFJ: 75% PREFER SCRIPTING (3/4) Except for the trivial sample sizes of 1 and 2, and the FP/NFP groups (which do not speak to the E/I dichotomy), all of the sizable preferences for Java are among extravert subcategories, and all of the sizable preferences for Scripting are among intravert subcategories. Consider that E, EP, EF, ES, ESP, EFP, and ENF are among the top subgroups that express a preference for Java, with nontrivial sample sizes. Meanwhile, I, IJ, IN, ITP, IFJ, and ISFJ all prefer scripting, with nontrivial sample sizes. In fact, all of the eight fully described extraverted types show a preference for Java among females: ESTP: 2/2 PREFERS JAVA: 100%* ESFP: 5/5 PREFERS JAVA: 100% ENFP: 2/2 PREFERS JAVA: 100%* ENFJ: 1/1 PREFERS JAVA: 100%* ESTJ: 4/5 PREFERS JAVA: 80% ESFJ: 5/7 PREFERS JAVA: 71% ENTP: 2/3 PREFERS JAVA: 67% ENTJ: 2/3 PREFERS JAVA: 67% Robustness To Subsamples How robust is the E/I FEMALE difference for JAVA PREFERENCE with respect to discernible subsamples? For Indian students, 79% (22/28) of female extraverts preferred Java compared to female intraverts who were indifferent at 50% (7/14). For non-Indian students, 67% (2/3) of female extraverts preferred Java, compared to female intraverts, only 25% (1/4) of whom preferred Java. For the first half of the study, 67% (8/12) of female extraverts preferred Java compared to female intraverts, only 29% (2/5) of whom preferred Java. For the second half of the study, 85% (17/20) female extraverts preferred Java, compared to female extraverts, only 55% (6/11) of whome preferred Java. Although the samples are trivial, even the females in systems programming (as opposed to our large cyberwarfare and the large computer surveillance seminars) hinted at bias: 67% (2/3) of female extraverts preferred Java while only 33% (1/3) of female intraverts preferred Java. This is a remarkable consistency of the discovered preference among subsamples. Other Observations Combining males and females, there are some other reportable results in addition to the main finding. Taking each type, compatible pairs of types, trios, and fully specified four-dimensional types, we can list the varying rates of preference for scripting (or subtract each from 100% to get preference for Java): Preferred Scripting: 39% of I (Intraverts), 35% of N (Intuitives), 34% of P (Perceivers), 32% of T (Thinkers), 31% of J (Judgers), 30% of F (Feelers), 29% of S (Sensors), 27% of E (Extraverts). (All 8 unit-types shown.) Among pairs of dimensions, preferred scripting: 47% of IN, 42% of IT, 40% of NP, 39% of TP, 39% of IJ, 38% of IP, 37% of FJ, 37% of NT, 36% of IS, ... 27% of EF, 25% of ET, 25% of TJ, 25% of FP, 25% of EJ, 24% of ES, 23% of NF. (16 of 24 pairs shown, illustrating highest and lowest.) Among trios of dimensions, preferred scripting: 55% of INP, 55% of NTP, 43% of INT, 43% of NFJ, 42% of ITP, 42% of ISJ, 41% of IFJ, 38% of ETP, 37% of ENT, 36% of ITJ, 33% of INF, 33% of INJ, ... 21% of ESJ, 20% of ETJ, 19% of EST, 19% of IFP, 17% of ENF, 14% of NFP. (18 of 32 trios shown, illustrating highest and lowest.) Among fully specified types, preferred scripting: 67% of INTP, 67% of INFJ, 50% of ENTP, 50% of ISTJ, 38% of ISTP, 36% of ISFJ, 31% of ENTJ, 31% of ESFJ, 30% of ESTP, 29% of ESFP, 25% of ENFJ, 20% of INFP, 18% of ISFP, 14% of ESTJ, 12% of ENFP, and 0% of INTJ. (All 16 quads shown.) Some of the interesting observations had nothing to do with the preference for one programming paradigm or another, but simply the personality distribution among a population of technical specialists. Many papers have reported on different rates of personality types for computer scientists relative to the general population (e.g, Westbrook 1988, Hardiman, 1997, Capretz 2003). Here is the result for our sample: IDEALIST infj 0.02 infp 0.03 enfj 0.02 enfp 0.05 RATIONAL intj 0.02 intp 0.02 entj 0.08 entp 0.04 ARTISAN isfp 0.07 istp 0.10 esfp 0.13 estp 0.06 GUARDIAN istj 0.05 isfj 0.09 estj 0.13 esfj 0.10 So 12% of our sampled students were NF (Idealists), 16% were NT (Rationals), 36% were SP (Artisans), and 37% were SJ (Guardians). Expressed as a ratio to one set of published rates among the general population (which can vary quite a bit depending on author; here the rates are from Myers et al., 1998): IDEALIST infj 1.23 infp 0.70 enfj 0.98 enfp 0.61 RATIONAL intj 0.88 intp 0.56 entj 4.43 entp 1.15 ARTISAN isfp 0.77 istp 1.82 esfp 1.52 estp 1.43 GUARDIAN istj 0.42 isfj 0.62 estj 1.55 esfj 0.80 The study had 4.43x more ENTJ students than the general population and fewer than half as many ISTJ. Some of this disproportion could be the result of so many immigrant Indians, not just attributable to being Master's in Computer Science students. But among the small non-Indian cohort, we see some of the same biases (beware small sample sizes here, especially for the 0.00 ratios). IDEALIST infj 2.90 infp 1.98 enfj 0.00 enfp 0.00 RATIONAL intj 4.14 intp 0.00 entj 4.83 entp 0.00 ARTISAN isfp 0.49 istp 2.42 esfp 0.51 estp 1.01 GUARDIAN istj 0.75 isfj 0.95 estj 2.00 esfj 0.35 Some have claimed that IS students have difficulty with programming (A. B. Woszczynski et al.). MBTI theory would seem to suggest that ISFP types would have difficulty, but not ISTP types. Our data appears to give empirical support this finer conclusion: ISTP is over-represented among our students by 82% (1.82x), while ISFP are under-represented at 0.77x, and the ratios are more compelling among non-Indian students. It would be especially important not to suggest that a whole group of people would have trouble programming, when in fact only a specific subgroup has a disinclination. Discussion and Further Study Why should extraverted and intraverted people disagree over their preference for Java over scripting, or vice versa? My colleague who carries large advanced systems teaching loads, Lucinda Caughey, suggests that it is the object oriented paradigm, naming regimen, and expectation of code reuse that might be at work. The study deliberately excluded python and ruby from the list of scripting languages in our surveys. But we included javascript, php, and perl, all of which can make extensive use of objects, or almost none. There was also a question about C++, which was used as a prototypical non-scripting language. Perhaps by asking specifically about each programming language, one would find a progression of preference rates that had explanatory value. The author had originally thought that the automatic initialization and implicit coercion would appeal to P types, while strong typing would appeal to J types. But the P/J difference (34-to-31 % among combined males and females) was the smallest observed, after the I/E (39-to27 %), N/S (35-to-29 %), and T/F (32-to-30 %). Greater insight might be had by asking more explicitly about the desirability of language individual features, such as associative arrays, or pragmatic considerations, such as integrated development environments and run-time performance. Our surveys included two questions asking how well the student knew the language, and how confident the student was in that language. It would be important to make sure that every student who expresses a preference understands the programming language issue. If the source of the E/I difference is based on libraries, user-identifiers/naming, and reuse, it would appear in accord with MBTI theory. Using a short, personal, polysemous name, "$x", instead of a more globally meaningful, proper, embeddable and acceptable name, "Fall2014_CSC570_testScores", fits the intravert who does not want anyone else looking at the code, while the latter fits the extravert, who codes with the expectation of external review. Anecdotally, the author's best early awk and perl programmers proclaim their intraversion on facebook, while the memorable best Java programmers made excellent teaching assistants. Why should a more pronounced preference be observed among females? The author does not have a hypothesis here for a cause, though the implications of its effect are eye opening. Simply put, any insistence that all students (or employees, or contractors) program in a single paradigm will drive out as much as half of the female population. This would happen regardless of which paradigm was forcefully imposed. The men would be less sensitive to a monoculture, but a full utilization of women would require multiple programming paradigms. Further study is recommended. Although our 52 women among 200 CS Master's students is a healthy sample size, it would be possible to do a crowd study, use social networks for volunteer self-reporting, or simply circulate a memo around a large company like Google to indicate concordance or discordance. Testing for gender-based differences in the data was motivated by seeing a first year female student who was frustrated with Java and wanted to drop out of computer science. The author advised the student to take javascript, then decide whether she liked or disliked programming. Instead, she changed majors. Perhaps she was simply an intravert. Acknowledgements The author would like to thank Naga Durga Jakka and Lakshmisoumya Chillamcherla for help with background reading, and Swathi Nandhibatla for some data interpretation. Lucinda Caughey has been especially helpful with her hypotheses. References C. Bishop-Clark and D. D. Wheeler, The Myers Briggs personality type and its relation to computer programming, Journal of Research on Computing in Education 26:3, 1994. L. F. Capretz, Personality types in software engineering, Intl Journal of Man Machine Studies 58:2, 2003. L. F. Capretz and F. Ahmed, Making sense of software development and personality types, IT Professional 12:1, 2010. L. F. Capretz and F. Ahmed, Why do we need diversity in software engineering? ACM SIGSOFT Software Engineering Notes 35:2, 2010. H. G. Chen and R. P. Vecchio, Nested IF-THEN-ELSE constructs in end-user computing: personality and aptitude as predictors of programming ability, Intl Journal of Man Machine Studies 36:6, 1992. L. S. Corman, Cognitive style, personality type, and learning ability as factors in predicting the success of the beginning programming student, ACM SIGCSE Bulletin, 1986. A. D. Cunha and D. Greathead, Does personality matter? An analysis of code review ability, CACM 50:5, 2007. W. R. Feeney and J. Hood, Adaptive man/computer interfaces: information systems which take account of user style, ACM SIGCPR, 1977. N. Gorla and Y. W. Lam, Who should work with whom? Building effective software project teams, CACM 47:6, 2004. R. M. Felder and L. K. Silverman, Learning and teaching styles in engineering education, Engineering Education 78:7 (1988). W. Halliburton, M. Thweatt, N. J. Wahl, Gender differences in personality components of computer science students: a test of Holland’s congruence hypothesis, SIGCSE Bulletin, 1998. L. T. Hardiman, Personality types and software engineers, IEEE Computer 30:10, 1997. P. E. Jones and R. E. Wall, Computer experience and computer anxiety: two pilot studies, Technical Report ED 275 315, US Dept of Education, 1985. N. Katira, et al., On understanding compatibility of student pair programmers, ACM SIGCSE Bulletin 36:1, 2004. R. J. Koubek, W. K. LeBold, G. Salvendy, Predicting performance in computer programming courses, Behaviour and Information Technology, 4:2, 1985. L. E. Meunier-Cinko, Gender differences in cooperative computer-based foreign language tasks, UAZ Dissertation, 1993. I. B. Myers, M. H. McCaulley, N. L. Quenk, A. L. Hammer, MBTI Manual, Palo Alto: Consulting Psychologists Press, 1998. M. L. Pope, A comparison of personality traits of computer programmers and computer technicians using the CPI, MBTI, and Strong, UCSF Dissertation, 1988. J. Teague, Personality type, career preference, and implications for computer science recruitment and teaching, Proceedings of the 3rd Australasian Conference on Computer Science Education, 1998. D. Varona, L. F. Capretz, Y. Pinero, A. Raza, Evolution of software engineers’ personality profile, ACM SIGSOFT 37:1, 2012. L. H. Werth, Predicting student performance in a beginning computer science class, CACM 18:1, 1986. P. Westbrook, Frequencies of MBTI types among computer technicians, Journal of Pyschological Types, 15:49, 1988. K. L. Whipkey, Identifying predictors of programming skill, ACM SIGCSE Bulletin, 1984. A. B. Woszczynski, H. M. Haddad, A. Zgambo, An IS student’s worst nightmare: programming courses, Proceedings of the Southern Association of Information Systems Conference, 2005. Author Bio Ronald P. Loui, Ph.D. has taught computer science at Washington University in St. Louis for twenty years, and at the University of Illinois in Springfield for three years. He is a graduate of Harvard and the University of Rochester.