Published: 2020-09-18

Violence Prediction on Somatization and Emotional Self Awareness with Machine Learning Methods

Haliç Üniversitesi, Psikoloji Bölümü, İstanbul
Haliç Üniversitesi, Matematik Bölümü, İstanbul
violence somatization emotional self awareness machine learning şiddet somatizasyon duygusal öz farkındalık makine öğrenmesi


This study is intended to predict the violent victimization of individuals through the classification algorithms of supervised learning, one of the methods of machine learning through somatization and emotional self-awareness concepts, and 149 (27%) male and It consists of a total of 552 participants, including 403 (73%) women. Personal Information Form, Somatization Scale and An Emotional Self Awareness Scale-10 (A-DÖFÖ-10) were used as data collection tools in the study. K-Nearest Neighbor, support vector machines, Naive Bayes and logistics regression were used, one of the classification algorithms frequently used in machine learning; the performance comparison of the relevant classers was made according to the model performance criteria. Given accuracy and f1-score values, the best classification performance was derived from Logistics Regression with 0.74 accuracy and 0.82 f1-score value. Accordingly, it is possible to say that the methods of machine learning through somatization and emotional self-awareness concepts can be estimated at a certain rate of accuracy of the victimization of violence of individuals.

Introduction and Objective

The importance of violence, which is as old as human history, continues concerning individuals and societies today. While the Turkish Linguistic Society defines violence as "the degree of power, intensity, stiffness", "speed", "power arising from a movement" and "brute force", etymologically it is related to the word “şidda(t)” in Arabic and has the meaning of toughness, rigidness and hardship (1,2).

World Health Organization (WHO) defines violence as the intentional use of physical force or power, threatened or actual, against oneself, another person, or against a group or community that either results in or has a high likelihood of resulting in injury, death, psychological harm, maldevelopment, or deprivation (3,4). It is seen in the literature that there are different types of violence. The WHO has divided violence into three categories as follows: self-directed violence, interpersonal violence and collective violence. Self-directed violence; while it involves self-harm and suicidal behavior, interpersonal violence is seen as family and spouse/partner, group violence, and collective violence is seen as social, economic and political (5,6).

The emergence of acts of violence is classified as physical violence, sexual violence, psychological/emotional violence and neglect/deprivation (4). Physical violence is defined as any kind of attack that harms the integrity of someone else and makes them suffer; It includes several methods, such as slapping, punching, kicking, pushing, biting, bending his arm, squeezing his throat, injuring with a cutting or piercing tool, torturing, burning with fire or boiling water (7). All aggressive behaviors with sexual content are evaluated under the heading "sexual violence" (8), which includes behaviors, such as forcing sexual intercourse with physical strength, threats and intimidation, performing degrading sexual acts, and taking away the right to take measures to protect against sexually transmitted diseases (9). Psychological/emotional violence has been defined as repeated or multiple forms of acts, such as verbal humiliation, following, controlling, limiting one's communication with others and threats (10). Negligence can be defined as the deprivation of the basic needs of the person, such as food, clothing and warming (11). In addition to this classification, it is also necessary to mention economic violence. Economic violence may involve behaviors, such as knowingly stealing money or resources, harming the economic well-being of a partner, controlling the financial situation, pleading for money, not buying basic needs and sabotaging business performance (12).

There are many negative effects of interpersonal violence, both physically and mentally. Physically it has consequences ranging from insignificant injuries to brain damage or even death. There may be possible cases of a decrease in self-confidence, depression, anxiety disorder, post-traumatic stress disorder, sleep disorders, eating disorders, alcohol and substance abuse, self-injury and suicidal behaviors, attention problems and learning difficulties (13, 3). Somatization also appears as a mental result of interpersonal violence.

Somatization can be defined as the expression of mental distress and psychosocial stress with physical symptoms rather than being emotional and cognitive (14). Many theories have been proposed to explain the emergence of somatization. According to one of these, negative childhood experiences contribute to the development of somatization behavior (15). The presence of physical illness or illness behavior in family members and the experiences of the individual about attracting interest and love from the environment through physical complaints and the presence of secondary gains are also factors that support somatization. Especially due to the traumatic experiences of childhood, emotions that cannot be verbalized due to the individuals’ limited ability to express their emotional lives are expressed through physical symptoms (14). At this point, it is necessary to consider the concept of emotional self-awareness. Emotional self-awareness requires focusing attention on emotions, thinking on emotional experiences, and making general evaluations about emotions. Individuals who lack the ability to recognize and make sense of their emotions experience difficulties in managing and coping with negative emotions because they cannot evaluate their emotions correctly (16).

In recent years, it has been observed that machine learning methods, one of the phases of artificial intelligence, have been used in research in many fields, including Psychology, Psychiatry and Forensic Sciences. When looking at studies using machine learning methods, Oh, Yun, Hwang & Chae (2017) predicted suicide in over 573 participants. Their findings showed that the general accuracy rate of the model used was found to be 93.7% in one month, 90.8% in one year and 87.4% in lifelong suicide attempts (17). In the studies of Chekroud et al. (2016), the prediction rate was found to be 59.6% in one of the models used for the treatment of depression and 59.7% in the other model (18). In Yöntem and Adem’ (2019) study, when the polynomial distribution findings in the Support Vector Machines (SVM) model are examined, it can be seen that automatic thoughts can predict the level of alexithymia to a great extent. This finding shows that it is useful for the treatment of alexithymia within the scope of cognitive-behavioral therapies (19).

Machine learning is defined as programming computers to increase their performance using sample data or past experience (20). Methods based on machine learning consider the interaction between data units and are also used in classification, diagnosis and protective measures by making statistical inferences (21,22). Supervised learning, which is one of the methods based on machine learning, is used to predict a feature. The property you want to predict can be a category or a numerical value. For this, a relationship between different properties and the target value is investigated through the utilization of previously observed and known data set (23).

This study aims to estimate the violence victimization of individuals with the classification algorithms of supervised learning, which is one of the methods of machine learning through the concepts of somatization and emotional self-awareness. In this context, it is believed that this could work as a guide for the applications to primary healthcare institutions that enables ease in receiving responses regarding the physical complaints and emotional self-awareness of the individuals, and that accelerates the processes to recognize the violent victimization, to identify and inform the judicial authorities and to start the treatment process. This study also aims to provide a resource for future work.

Findings and Methods

The participants of this study were contacted after the approval of the Haliç University Ethics Committee dated 31.01.2020 and numbered 8. The participants were informed about this research before the scales were applied and it was stated that the participation was on a voluntary basis.

The total number of participants was 12,823,598 people. According to the 2018 Turkish Statistical Institute (TÜİK) data (24), they are young adults between the ages of 18-30. When sampling through a simple random sampling method from the universe, 95% confidence interval and 5% error margin was decided to be sufficient for analysis with at least 385 participants. The final research was conducted with 552 people.

Personal Information Form, Somatization Scale and An Emotional Self-Awareness Scale-10 (A-DÖFÖ-10) were used as data collection tools in this study.

Personal Information Form: Personal Information Form was prepared by the researchers to determine the demographic characteristics of the participants. This form includes questions about some variables, such as gender, age, marital status and education level.

Somatization Scale: The Somatization Scale was taken from the items related to the somatization disorder of the Minnesota Versatile Personality Inventory (MMPI). The scale consisted of 33 items in total and was evaluated by Dülgerler (2000). Internal consistency reliability coefficient of the somatization scale (Kuder Richardson-20) 0.83, test-retest reliability coefficient 0.996, test half technique (Split-Half), 1st half alpha value 0.8810, 2nd half alpha value 0.8439, SCL-90-R scale correlation with the scale (Pearson Moments Product correlation coefficient) was found to be 0.80 (Each item in the 25th scale has two choices: “right” or “wrong.” 1-4-5-6-7- 10-11-19-20-21-22-23-26-27- 32-33 is given 1 point when the answer is "right", 0 point when the answer is "wrong", 2-3-8-9-12-13 -14-15-16-17-28-24-25-28-29-30-31 is given 1 point when the answer “wrong”, and 0 when the answer is “correct.” Total score is obtained by the sum of the correct and wrong answers. The scores obtained from the scale vary between 0-33. The increase in the total score indicates that the symptoms of somatization are high. According to these data, the somatization scale was determined to be a valid and reliable scale (25).

An Emotional Self-Awareness Scale-10 (A-DÖFÖ-10): According to the reliability and item analysis results for the scale consisting of 10 items, the internal consistency reliability coefficient was calculated as 0.85 in both the female and the male group. The scale is a 5-point Likert type and the range of scores varies between 10 - 50 and the high score indicates the high level of skill for reading and noticing emotions. (16).

In this study, a supervised learning method, one of the machine learning methods, was used. Using the input values, the classification algorithms of the supervised learning method and the victims of violence were estimated. In addition, using the confusion matrix, deductions were made by calculating the accuracy, precision and recall rates. The programming language used for machine learning is Python, and the medium used to write the codes is designated as Spyder in Anaconda.

Classification Algorithms used in this study are as follows:

i. K-Nearest Neighbor (KNN)

The K-Nearest Neighbor rule is one of the non-parametric classifiers. In the K-Nearest Neighbor algorithm, the class of a new sample is determined by calculating the distance from the samples in the current sample, based on a specified k value. The algorithm is expressed as follows:

First of all, the new sample whose class is to be determined is calculated by determining the distance from the other samples. Then the calculated distances are listed. The smallest figure is selected. Finally, voting is done to determine the class of the new sample.

Let’s choose as a sample space, for an arbitrary sample of , , shows ’s value of the attribute .

the function is used as the distance function (Euclidean distance) for the algorithm in this study. In addition, majority voting was used as the voting in this study, where the most recurring class was the researched class (K = 19 was chosen) (26).

ii. Naive Bayes

Naive Bayes algorithm is a statistical method based on calculating the probability of the effect of each attribute on the result. This algorithm is expressed as follows:

m times sample set, observation matrix consisting of n

times attribute and m times data set sample space class value and if sample taken from sample set and unclassified data set is presented below;

Unclassified sample class is calculated by the equation:

Gaussian method was used in this study.

In the equation was assigned the mean, and as standard deviation,

iii. Logistic Regression

Regression Analysis is used to measure relationships between two or more variables, providing descriptive and inferential statistics. The main purpose of the Logistic Regression Analysis is to establish an acceptable model that can define the relationship between dependent and independent variables to achieve the best match with the least variable.

In this study, binary logistic regression, which is the two-category state of the dependent variable, was used. The Logistic Regression Model is expressed as follows:

are the elements of the design matrix, are the elements showing the successes observed in each population, N is the total number of populations, M is the total number of observations, are number of observations for i. population and β being parameter vector log-likelihood function is expressed as

’s are resolved by taking the first order derivative being equal to zero for each β of this function.

iv. Support Vector Machines (SVM)

Support Vector Machines use the margin as a criterion. Model parameters are written as the weighted sum of the effects of a subset of learning examples, and these effects are defined by an application-specific similarity kernel.

The Logistic Regression Model is expressed as follows:

Let the sample , if , if and , .Then parametres

meet the conditions. In the case of linear separation, these two classes of data can be separated by a hyperplane. The aim is to choose the hyperplane that will make the classification error the smallest. For this, a valid w and w_0 values should be determined.

The solution under the constraint of the problem (8) gives the best values of . (20).

In this study, the linear kernel function is used in this algorithm.

Model Success

In this study, the Confusion Matrix was used to measure the success of classification algorithms. The definitions and formulas of the confusion matrix are as follows:

The confusion matrix is roughly; it is a matrix with predicted values on one axis and actual values on the other axis.

The confusion matrix consists of the expressions True Positive, True Negative, False Positive and False Negative.

In fact, if the class classifies the positive model as positive, this condition is named as true positive and false if it is classified as a false negative.

In fact, if the model classifies the negative as a negative model, this is called true negative, if it is classified as positive, it is called false positive.

Table.1 Complexity matrix
Forecast Forecast
Real True Positive(TP) False Negative(FN)
Real False Positive(FP) True Negative(TN)

Accuracy: The ratio of the number of correctly classified samples to the total number of samples.

Precision: The accuracy rate in the observations that the model classifies in the positive group.

Sensitivity (Recall): The accuracy rate of the model in observations that are in the positive group.

f1-score: The f1-score value found by calculating the harmonic average of precision and sensitivity is also used as a model performance indicator (23).

f1-score = 2. (Accuracy. Sensitivity) / (Accuracy + Sensitivity) (13)


The research group in this study consisted of 552 participants, of which 149 (27%) were men and 403 were (73%) women. The ages of the participants ranged from 18 to 30 (x̄ = 20.69, SD = 2.68). The average age of male participants was 21.87 ± 2.90, and the average age of female participants was 20.25 ± 2.46.

The findings obtained in this study showed that 309 (77%) of the female participants and 82 (55%) of the male participants were exposed to some type of violence.

I. Data Set

This study consisted of 552 data. Each of these data has 52 features consisting of answers to the questions of the Personal Information Form, Somatization and Emotional Self-Awareness Scales. The cleaned data set inputs, in which the Machine Learning Algorithms would be used are in the size of (552,52), outputs, on the other hand, were in the size of (552,1) (Dataframe), which consisted of the victims of violence.

II. Evaluation

The data set used in this study consisted of 552 data. 33% (183 data) of this data set was reserved for testing, 67% (369 data) for training. Then, the input data in the training and test cluster was standardized. KNN, Naive Bayes, SVM and Logistic Regression models were created with standardized training data. Confusion matrices for each model were obtained from the test data to P-value, showing the number of people who had been subjected to violence and N value showing the number of people who had not been subjected to violence.

Table 2. Confusion matrices from models
KNN Real P 113 16
N 44 10
SVM Real P 106 23
N 30 24
NAİVE BAYES Real P 83 46
N 21 33
Logistic Regr. Real P 109 20
N 27 27

Accuracy, precision, sensitivity and f1-score values were obtained for each model with data in confusion matrices. Considering the accuracy and f1-score values, the best classification performance was obtained from Logistic Regression with 0.74 accuracy and 0.82 f1-score value.

Table 3. Model success values
Method Accuracy Precision Sensitivity f1-score
KNN 0.67 0.72 0.88 0.79
SVM 0.71 0.78 0.82 0.80
Naive Bayes 0.63 0.80 0.64 0.63
Logistic Regr. 0.74 0.80 0.84 0.82

Discussion and Conclusion

This research was conducted to estimate the victims of violence with the classification algorithms of supervised learning, which is one of the methods of machine learning through the concepts of somatization and emotional self-awareness. K-Nearest Neighbor, support vector machines, Naive Bayes and logistic regression were used, which are frequently used in machine learning. Performance comparison of the related classifiers was made according to model performance criteria. When these performance criteria are examined, the highest values are obtained from logistic regression.

An average accuracy rate between 70% and 74% is measured in a meta-analysis study (Singh, Grann & Fazel, 2011) (27). Blair, Blattman and Hartman (2015) concluded in their research that their models using 2008 risk factors predicted 88% of the violence in 2012 (28). Menger, Scheepers and Spruit (2018) described the 78% accuracy rate that they had reached in assessing the risk of violence as promising (29). In case of this study, a 74% accuracy rate is observed as the estimated rate of violence.

In line with the findings of this research, it is possible to say that, in accordance with the literature, somatization, which is a psychological result of violence, and the lack of skills to recognize and make sense of emotions, the classification algorithms of supervised learning, which is one of the methods of machine learning through emotional self-awareness concept, can be estimated at a certain accuracy rate. To our knowledge, although there is no study on the prediction of violence through the concepts of emotional self-awareness and somatization in the literature, there is a parallel accuracy rate with studies having similar purposes. In this context, during the applications of physical complaints to health institutions, with the knowledge that it is possible to obtain answers about the emotional self-awareness of people, the method could be used to help identify whether they are the victim of violence. At the point of being noticed, it may be possible to have more detailed interviews about the violence experience, to make a diagnosis and to start the treatment process as soon as possible. If there is a need to report to judicial authorities, the legal process can be accelerated. Furthermore, it will also be possible to identify and prevent those who are at risk of victimization. In this respect, this research focuses on the importance of prevention strategies.

Considering the frequency of violence events both in the world and in our country, as well as because there are barriers that prevent individuals to verbalize and/or report when they are victims of violence, it is significant to realize and make necessary interventions. Violence can only be prevented through active consciousness, determination of needs and application of necessary procedures.

In this study, violence is evaluated only on emotional self-awareness and somatization dimensions. In the future, it is suggested that conducting research on other observable results of violence on the subject proves beneficial. Furthermore, increasing the number of data and samples can help increase the accuracy rate estimation.

The present study aims to reveal the victims of violence, to contribute to both future studies and prevention activities and to be guiding. However, 73% of the individuals participating in this study were female, and 27% were male, making it challenging to compare between genders. In future research on the subject addressed here, it would be appropriate to have a close number of male and female participants. It is also assumed that it would be appropriate to include different age groups. Additionally, it is suggested that while this research focuses only on whether the victim is exposed to any type of violence, future work on different types of violence will also be useful.

Finally, we should note that the contribution of machine learning together with technological developments to our life will contribute to many areas, including Psychology, Psychiatry and Forensic Sciences.


1. Türk Dil Kurumu. Şiddet. erişim tarihi: 13.12.2019.
2. erişim tarihi: 13.12.2019.
3. Güleç H, Topaloğlu M, Ünsal D, Altıntaş M.(2012) Bir kısır döngü olarak şiddet. Psikiyatride Güncel Yaklaşımlar, 4(1):112-137.
4. World Health Organization (2002). World report on violence and health. Geneva: WHO. World Health Organization
5. Mil, H.İ. ve Şanlı, S. (2015). Sporda Şiddet ve Medya Etkisi: Bir Maçın Analizi. Elektronik Sosyal Bilimler Dergisi, Güz-2015 Cilt:14 Sayı:55 ss:231-247.
6. Karslı, N.(2016). Psiko-sosyal Açıdan Şiddet ve Çözüm Yolları. Dinbilimleri Akademik Araştırma Dergisi. 16(3):63-89
7. Özgentürk, İ. , Karğın , V. ve Baltacı , H (2012). Aile İçi Şiddet ve Şiddetin Nesilden Nesile İletilmesi. Polis Bilimleri Dergisi Cilt:14(4):55-77.
8. Kayı, Z., Yavuz, M. F., & Arıcan, N. (2000). Kadın Üniversite Gençliği ve Mezunlarına Yönelik Cinsel Saldırı Mağdur Araştırması. Adli Tıp Bülteni, 5(3), 157-163.
9. Krantz, G.& Garcia-Moreno, C. (2005). Violence against women. J Epidemiol Community Health. 59 (10): 818-821. 10.1136/jech.2004.022756.
10. Leithner, K., Assem-Hilger, E., Naderer, A., Umek, W., Springer-Kremser, M. (2009). Physical, sexual, and psychological violence in a gynaecologicalpsychosomatic outpatient sample: prevalence and implications for mental health. Eur J Obstet Gynec Reprod Biol; 144: 168–72
11. Akdemir, P ., Görgülü, A., Çınar, Y . (2008). Yaşlı İstismarı ve İhmali. Hacettepe Üniversitesi Hemşirelik Fakültesi Dergisi , 15 (1) , 68-75 . Retrieved from
12. Davis, M. (2018) The Intersection of Intimate Partner Violence Perpetration, Intervention and Faith. Arts & Sciences Electronic Theses and Dissertations. 1524.
13. Okan İbiloğlu, A. (2012) Aile İçi Şiddet. Psikiyatride Güncel Yaklaşımlar-Current Approaches in Psychiatry;4(2):204-222.
14. Kesebir S (2004) Depresyon ve Somatizasyon. Klinik Psikiyatri, Ek 1:14-9.
15. Stuart, S. & Noyes, R. Jr.(1999) Attachment and interpersonal communication in somatization. Psychosomatics;40:34-43.
16. Tatar, A., Özdemir, H., Çelikbaş, B., & Özmen H. E. (2018). A Duygusal Öz Farkındalık Ölçeği’nin Geliştirilmesi ve Klinik Olmayan Örneklemde Duygusal Öz Farkındalığın Kaygı ve Depresyondaki Rolünün İncelenmesi. Social, Mentality and Researcher Thinkers Journal, 4(13), 793-806.
17. Oh, J., Yun, K., Hwang, J-H. and Chae, J-H. (2017) Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales. Front. Psychiatry 8:192.
18. Chekroud, A.M., Zotti, R.J., Shehzad, Z., Gueorguieva, R., Johnson, M.K., Trivedi,M.H., Cannon, T.D., Krystal, J.H. & Corlett, P.R. (2016) Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 3, 243–250.
19. Yöntem, M. ve Adem, K. (2019). Otomatik Düşüncelere Makine Öğrenme Yöntemlerinin Uygulanması ile Aleksitimi Düzeyinin Tahmini. Psikiyatride Güncel Yaklaşımlar , 11 () , 64-78 .
20. Alpaydın, E. (2018). Yapay Öğrenme (4.Baskı). Boğaziçi Üniversitesi Yayınevi.
21. Uyulan, Ç., Tekin Ergüzel, T. ve Tarhan, N. (2019) Elektroensefalografi Tabanlı Sinyallerin Analizinde Derin Öğrenme Algoritmalarının Kullanılması. The Journal of Neurobehavioral Sciences: 6(2): 108-124.
22. Yılmaz Akşehirli, Ö., Ankaralı H, Aydın D, Saraçlı Ö. (2013) Tıbbi Tahminde Alternatif Bir Yaklaşım: Destek Vektör Makineleri. Türkiye Klinikleri Biyoistatistik Dergisi;5(1):19-28.
23. Arslan, İ. (2019). Python ile Veri Bilimi (1. Baskı). Pusula 20 Teknoloji ve Yayıncılık.
24. TÜİK (2018) Türkiye İstatistik Kurumu İstatistikleri. erişim tarihi: 17.02.2020.
25. Dülgerler, Ş. (2000). İlköğretim okulu öğretmenlerinde somatizasyon ölçeğinin geçerlik ve güvenirliği. Ege Üniversitesi Sağlık Bilimleri Enstitüsü, Yüksek Lisans Tezi, İzmir.
26. Balaban, M. E., Kartal E., (2015). Veri Madenciliği ve Makine Öğrenmesi (1.Baskı). İstanbul: Çağlayan Kitabevi