Get answers from your peers along with millions of IT pros who visit Spiceworks.
Join Now

We have software that's about 4 years old and it can only store files to 1 partition. The software runs on Windows 2012 R2, call it SERVER1, and the files are stored on a different server, SERVER2, packaged in a 60TB VHDX file. I believe it was originally a 40TB VHDX file (in 2018), but we've had to expand it to fit more files over the years. All drives are NTFS, and the VHDX is mapped from SERVER1 D: through iSCSI to the target on SERVER2.

We've noticed lately that some of the files saved to the D: drive have started getting corrupted. It doesn't seem to matter when they were uploaded - we've noticed files go bad from 2018 and files from just a few weeks ago. Primarily large MP4 files, but also PDFs and perhaps other files types. You can copy them off the server, but they fail to open. I've tried PDF & MP4 repair programs, but the files are essentially unreadable. Sometimes the backed-up files are readable, but sometimes they're also corrupted.

What could be causing the file corruptions? Thousands of files are created or accessed on the D: drive nearly every day. I have run chkdsk on each partition on SERVER1 and only had 1 small error (on the C: drive, for a directory we don't even use). The D: drive is 46% fragmented, but after running defrag on it over the weekend, it didn't even get past 3%, so I've cancelled the job for now.

Any suggestions to determine how these files are getting corrupted? We reboot the server monthly and have realtime Symantec Endpoint Protection scanning all drives.

Thanks!


Spiceworks Help Desk

The help desk software for IT. Free.

Track users' IT needs, easily, and with only the features you need.

7 Replies

· · ·
MCEStaff
Datil
OP
MCEStaff This person is a Verified Professional
This person is a verified professional.
Verify your account to enable IT peers to see that you are a professional.

McMurray Computer Experts is an IT service provider.

60TB! Are you sure? What is the underlying storage?

That is, on SERVER2 how many drives, how are they arranged, what controls them?

Instead of performing a CHKDSK against drive D: from SERVER1, you probably should disconnect SERVER1 and run integrity checks directly on SERVER2.

0
· · ·
jjdiubaldi
Sonora
OP
jjdiubaldi

Yep, 60TB. SERVER2 has 12x10TB drives in a RAID5. PERC card. There are several servers connected to SERVER2, each with their own VHDX drive.

What kind of integrity checks should I run, specifically?

0
· · ·
Fessor
Datil
OP
Fessor

Those files should probably be stored on a SAN Storage device with some form of backup. I can't imagine that you have viable backups if you can't even defrag the file system. Having everything in a single vhdx drive is a little bit insane. Just an opinion. And if you say that you can't afford to provide something better, that is the insane part. You probably have bigger problems that you are currently experiencing. Your data must not be that important.

0
· · ·
RogueRabbit
Jalapeno
OP
RogueRabbit This person is a Verified Professional
This person is a verified professional.
Verify your account to enable IT peers to see that you are a professional.

The hard limit for VHDX files is 64TB and the current one you are using is sitting at 60TB. I'm already seeing a looming crisis for you on the horizon here...

Defragmenting virtual drives is a bad idea for two reasons (at least): It messes with the file allocation tables on both the physical disks' logical volumes and the virtual disks, the RAID controller is going to go absolutely crazy with the amount of data that needs to be re-written. (keep in mind that it also has to generate new CRC checksums for each file that is moved twice, and the file allocation tables for two file systems need to be updated at the same time to ensure consistency throughout)

Having multiple virtual disks open and connected via iSCSI also appears to be aggravating the situation. Even more so if the virtual disks are dynamically expanding. Your data corruption is most likely caused by out of sync file allocation tables which is resulting in physical data of one file being overwritten by another files' data. One way to confirm this is to take a corrupt file and a working backup copy of it and compare the raw contents of the files.

To be honest, the only solution for this scenario and the huge amount of data involved is a SAN with fabrics and HBA's. A single partition will still be presented to Server 1, but no virtual disks will be involved. This will also give you further expansion capabilities beyond the 64TB limits and simplify the setup which will result in less or no data corruption.

1
· · ·
Supaplex
Ghost Chili
OP
Supaplex This person is a Verified Professional
This person is a verified professional.
Verify your account to enable IT peers to see that you are a professional.
Data Storage expert
27 Best Answers
485 Helpful Votes

I have a couple of ideas and suggestions regarding your setup:

First of all, make sure the iSCSI targets are not simultaneously connected to different machines. In this case, the corruption you are mentioning would be obvious.

Microsoft iSCSI Target Server is old, legacy, and slow technology that should be avoided in the production. If you have to use iSCSI on top of Windows Server, replace the built-in feature with a free 3rd party product https://www.starwindsoftware.com/starwind-virtual-san-free. An even better option would be switching to network shares and SMB protocol for that purpose.

Consider using ReFS instead of NTFS. It has some issues but would allow you to enable integrity checks to detect and potentially prevent data corruption.

Running RAID5 on top of 10TB disks is a horrible idea. As soon as your disks will start to fail and the first one dies initiating a rebuild, the chances of a second disk failure during the rebuild and complete data loss become extremely high.

0
· · ·
jjdiubaldi
Sonora
OP
jjdiubaldi

Thank you for the feedback. I'm trying to weigh my options.

0
· · ·
Adom (Aryson Technologies)
Cayenne
OP
Adom (Aryson Technologies) This person is a Verified Professional
This person is a verified professional.
Verify your account to enable IT peers to see that you are a professional.

Brand Representative for Aryson Technologies

You can use Virtual Machine Data Recovery Software to repair corrupted NTFS VHDX Storage Drives. The Aryson Virtual Machine Data Recovery is a perfect solution that repairs VHD, VDI, VMDK, and VHDX files successfully.

https://www.arysontechnologies.com/virtual-machine-data-recovery.html

0
Oops, something's wrong below.