IDENTIKASI JENIS FILE PADA ARTEFAK DIGITAL MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBOR

Ihsan Fawzan
Ahmad Luthfi


DOI: https://doi.org/10.29100/jipi.v10i2.6263

Abstract


Permasalahan yang dihadapi dalam penelitian ini berkaitan dengan adanya isu yang timbul akibat kerusakan file digital dalam konteks hukum, serta kontribusi penelitian ini dalam mengatasi permasalahan tersebut. Virus, malfungsi sistem, dan malware menjadi beberapa penyebab terjadinya file rusak sehingga menghambat akses menuju data penting dalam proses hukum. Teknik  yang sesuai dalam menganalisis konten file dan mengidentifikasi pola menggunakan algoritma untuk mengatasi masalah yaitu menggunakan teknik content-based. Penelitian ini memanfaatkan algoritma K-Nearest Neighbor dalam machine learning untuk mendeteksi jenis file pada file yang rusak. Penelitian yang mengkaji tentang identifikasi  jenis file sudah pernah dilakukan sebelumnya, namun masih menggunakan dataset lama yaitu GovDocs yang dirilis pada tahun 2009 sehingga perlu adanya penelitian yang menggunakan dataset baru. Penelitian ini memperbarui dataset GovDocs ke dalam NapierOne, yang berkontribusi pada peningkatan aksesibilitas data yang relevan untuk analisis. Machine learning digunakan dalam penelitian ini untuk mengklasifikasikan data dan berhasil meningkatkan keterbacaan dokumen meskipun tanpa informasi header atau footer. Selain itu, penelitian yang penulis lakukan dalam mengidentifikasi jenis file ambigu dalam artefak digital menggunakan K-Nearest Neighbor memperoleh hasil yang tinggi dengan tingkat akurasi mencapai 86%. Secara keseluruhan, studi ini berkontribusi pada peningkatan aksesibilitas dan keandalan bukti digital dalam konteks hukum, khususnya terkait file yang mengalami kerusakan.

Keywords


Artefak Digital; file Corrupt; Forensik Digital; Identifikasi Jenis file; K-Nearest Neighbor

Full Text:

PDF

Article Metrics :

References


“Kejahatan Siber di Indonesia Naik Berkali-kali Lipat | Pusiknas Bareskrim Polri.” https://pusiknas.polri.go.id/detail_artikel/kejahatan_siber_di_indonesia_naik_berkali-kali_lipat (accessed Jun. 16, 2023).

R. A. Nettles, C. Merulla, and S. Warzala, “Data Manipulation: Attacks and Mitigation – CSIAC,” 2019. https://csiac.org/articles/data-manipulation-attacks-and-mitigation/ (accessed Jul. 01, 2024).

P. Taylor, B. Security, M. Brenton, and Z. G. I. Security, “WhisperGate, Software S0689 | MITRE ATT&CK®,” 2024. https://attack.mitre.org/software/S0689/.

“PowerDuke, Software S0139 | MITRE ATT&CK®,” 2020. https://attack.mitre.org/software/S0139/.

E. Millington, “REvil, Software S0496 | MITRE ATT&CK®,” 2014. https://attack.mitre.org/software/S0496/.

K. Karampidis and G. Papadourakis, “File Type Identification - Computational Intelligence for Digital Forensics,” J. Digit. Forensics, Secur. Law, vol. 12, no. 2, 2017, doi: 10.15394/jdfsl.2017.1472.

S. K. Konaray, A. Toprak, G. M. Pek, H. Akçekoce, and K. Deniz, “Detecting File Types Using Machine Learning Algorithms,” 2019 Innov. Intell. Syst. Appl. Conf., pp. 1–4, 2019.

A. Dubettier, T. Gernot, E. Giguet, and C. Rosenberger, “File type identification tools for digital investigations,” Forensic Sci. Int. Digit. Investig., vol. 46, p. 301574, 2023, doi: 10.1016/j.fsidi.2023.301574.

Y. Wang, Z. Su, and D. Song, “File Fragment Type Identification with Convolutional Neural Networks,” in Proceedings of the 2018 International Conference on Machine Learning Technologies, 2018, pp. 41–47, doi: 10.1145/3231884.3231889.

D. J. Hickok, D. R. Lesniak, and M. C. Rowe, “File Type Detection Technology,” 38th Midwest Instr. Comput. Symp., pp. 23–28, 2005.

A. Bhat, A. Likhite, S. Chavan, and L. Ragha, “File Fragment Classification using Content Based Analysis,” ITM Web Conf., vol. 40, p. 03025, Aug. 2021, doi: 10.1051/itmconf/20214003025.

L. Hiester, “File Fragment Classification Using Neural Networks with Lossless Representations Networks with Lossless Representations,” Undergrad. Honor. Theses, pp. 1–32, 2018, [Online]. Available: https://dc.etsu.edu/honors/454.

M. Al Neaimi, H. Al Hamadi, C. Y. Yeun, and M. J. Zemerly, “Digital Forensic Analysis of Files Using Deep Learning,” in 2020 3rd International Conference on Signal Processing and Information Security (ICSPIS), 2020, pp. 1–4, doi: 10.1109/ICSPIS51252.2020.9340141.

G. Mittal, P. Korus, and N. Memon, “FiFTy: Large-Scale File Fragment Type Identification Using Convolutional Neural Networks,” IEEE Trans. Inf. Forensics Secur., vol. 16, no. 1, pp. 28–41, 2021, doi: 10.1109/TIFS.2020.3004266.

K. Vulinovic, L. Ivkovic, J. Petrovic, K. Skracic, and P. Pale, “Neural networks for file fragment classification,” in 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2019 - Proceedings, 2019, pp. 1194–1198, doi: 10.23919/MIPRO.2019.8756878.

M. Masoumi, A. Keshavarz, and R. Fotohi, “File fragment recognition based on content and statistical features,” Multimed. Tools Appl., vol. 80, no. 12, pp. 18859–18874, May 2021, doi: 10.1007/s11042-021-10681-x.

S. K. Venkata and A. Green, “Computational Intelligence to aid Text File Format Identification,” pp. 1–14, 2019, [Online]. Available: http://eprints.rclis.org/38969/.

F. Wang, T. Quach, J. Wheeler, J. B. Aimone, and C. D. James, “Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 10, pp. 2553–2562, Oct. 2018, doi: 10.1109/TIFS.2018.2823697.

C. Zhang and R. Green, “Communication security in internet of thing: Preventive measure and avoid DDoS attack over IoT network,” Simul. Ser., vol. 47, no. 3, pp. 8–15, 2015.

K. Nguyen, D. Tran, W. Ma, and D. Sharma, “Decision tree algorithms for image data type identification,” Log. J. IGPL, vol. 25, no. 1, pp. 67–82, 2017, doi: 10.1093/jigpal/jzw045.

M. E. Haque and M. E. Tozal, “Byte embeddings for file fragment classification,” Futur. Gener. Comput. Syst., vol. 127, pp. 448–461, Feb. 2022, doi: 10.1016/j.future.2021.09.019.

S. R. Davies, R. Macfarlane, and W. J. Buchanan, “NapierOne: A modern mixed file data set alternative to Govdocs1,” Forensic Sci. Int. Digit. Investig., vol. 40, p. 301330, Mar. 2022, doi: 10.1016/j.fsidi.2021.301330.

F. De Gaspari, D. Hitaj, G. Pagnotta, L. De Carli, and L. V. Mancini, “Reliable detection of compressed and encrypted data,” Neural Comput. Appl., vol. 34, no. 22, pp. 20379–20393, 2022, doi: 10.1007/s00521-022-07586-7.

S. Yu, S. Zhou, L. Liu, R. Yang, and J. Luo, “Detecting malware variants by byte frequency,” J. Networks, vol. 6, no. 4, pp. 638–645, 2011, doi: 10.4304/jnw.6.4.638-645.

G. Y. Kim, J. Y. Paik, Y. Kim, and E. S. Cho, “Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis,” J. Comput. Sci. Technol., vol. 37, no. 2, pp. 423–442, 2022, doi: 10.1007/s11390-021-0263-x.

N. Singh and S. S. Khurmi, “ByteFreq: Malware clustering using byte frequency,” 2016 5th Int. Conf. Reliab. Infocom Technol. Optim. ICRITO 2016 Trends Futur. Dir., pp. 333–337, 2016, doi: 10.1109/ICRITO.2016.7784976.

X. Huang, L. Ma, W. Yang, and Y. Zhong, “A Method for Windows Malware Detection Based on Deep Learning,” J. Signal Process. Syst., vol. 93, no. 2–3, pp. 265–273, 2021, doi: 10.1007/s11265-020-01588-1.

T. Suzuki, “Experimental Comparison of ASCII Art Extraction Methods : a Run-Length Encod ing based Method and a Byte Pattern based Method,” vol. 8, no. 2, pp. 57–68.

N. F. B. A. KADIR, “Statistical Byte Frequency Analysis for Identifying Jpeg,” no. January, 2015.

A. Kumar, K. S. Kuppusamy, and G. Aghila, “FAMOUS: Forensic Analysis of MObile devices Using Scoring of application permissions,” Futur. Gener. Comput. Syst., vol. 83, pp. 158–172, Jun. 2018, doi: 10.1016/j.future.2018.02.001.

T. Xu, M. Xu, Y. Ren, J. Xu, H. Zhang, and N. Zheng, “A File Fragment Classification Method Based on Grayscale Image,” J. Comput., vol. 9, no. 8, pp. 1863–1870, 2014, doi: 10.4304/jcp.9.8.1863-1870.