10 Replies

  • Look at the network traffic with wireshark and also filter out all but what is going to/from the tape drive.

    I'm not suggesting wireshark to look into the data itself, but much more to give it a chance to color-code the network layers and let you see the bad things in black and red - negative things.        
    I'm trying to advise use wireshark to look at the network traffic for what it will tell you about errors, ip fragmentation, maybe competing MAC address ARPs etc.           
    Also how are your port statistics for errors, retransmits, dropped packets on the switch or in netstat -s output?
    These are the kinds of things that I'm thinking could be in play and it really isn't going to generate an error or complaint in a log because all these issues are about the network trying to deliver your data and its tcp so it will happen, but there might be a lot more effort on the network. 
    Was this post helpful? thumb_up thumb_down
  • Kindly correct if I am wrong....

    If you are using BE with Tape drive (or tape library) via HBA card (eSATA or eSAS), the issue would likely not be between the server & tape drive but between the BE server and source or storage cache in the BE server.
    - Check the resource used on the BE server during one of the larger jobs (CPU, RAM, Storage) to determine if the bottleneck is on the BE server
    - Check the details of each job to determine if steps in the jobs are the issue (reads, writes, verification etc)

    Then you need to describe the BE jobs as if you have created one or few huge jibs with several sources or many jobs where each job only have 1 or 2 sources ?
    - I have 3 Domain Controllers, so 1 job that have 2 DC and 1 job for 3rd DC
    - For ERP servers that have application server & SQL server, I would configure 2 jobs instead of lumping into 1 job
    - File servers would also have 1 job per server

    If you suspect that it is your network, then with only 3 switches (I also assume that you are not using POE, else you need to check total power usage of POE ports) 
    a. how many servers do you have (how many ports are used, from the above, I only based assumptions all are physical servers and not VMs)
    b. how many users are there (how many ports are used for users)
    c. how are the switches arranged or used ?
    - do you practice core switch (for servers, edge network appliances like routers & firewalls & to cascade to distribution switches)*
    - do you use LCAP cascading cables (eg 1 or 2 SFP cables from core switch to each distribution switch)*
    - do you use multiple NICs per server and how are they teamed (especially for the BE server)
    *Please do not cascade from switch 1 to switch 2 to switch 3

    I cannot confirm if the N1500 series are 24/48 with 4 SPF or the last 4 ports are combo ports. If they are combo ports & you use SFP, you cannot use ports 21-24 or 45-48 of the respective switches as they would cause conflicts with the "combo SFP"..

    Was this post helpful? thumb_up thumb_down
  • Such a significant change in speed points at a specific cause - most obvious is a lan connection issue to change to a hard limit of 100Mbps..

    Have you done a simple backup throughput test? create a job to just one off backup a largish amount of data so you expect it to take approx 1 hour. test. does it backup at 465MB/min or nearer the original 2000+MB/min? Test this backing up data from the local backup server (no lan involved) and from a remote server. If you get high throughput locally and slow from LAN then it is a network issue.

    2200MB/min is approx 300Mbps which is a bit average if you have gigabit LAN and fast disk - but might be the limit of your tape system (you don't give specs). 645 is approx 100Mbps so an obvious thing to check is if a 100Mbps network link is being used. The LAN server to server copy seems to indicate this is the limit.

    Check each LAN connection - from the backup server to switch - make sure it is connected at 1Gbps full duplex. check all servers. Are they all connected to the same switch?

    Pepper graySpice (1) flagReport
    Was this post helpful? thumb_up thumb_down
  • Jim Peters​ Thanks for the response.  After we did the copy test the first thing I thought of was testing the switches and speeds.  However, I am not versed in wireshark at all.  I did download due to other posts and I tried to learn to use it but I was not successful.  Can you give me a good link or video Link?  I did do my own searches and they are not really for beginners.  Some start with a file already captured etc.  I am starting from scratch.

    Was this post helpful? thumb_up thumb_down
  • Before analysing packets, you should determine if there is a network limit - and if there is a basic fix such as 100/1000 Mbps.

    You could spend days looking at captures, it won't show you what is wrong if it is something like 100m link. If it were massive packet loss/retransmits maybe - but your speed is so specifically close to 100Mbps it is very likely to be the physical LAN to nics or other.

    Was this post helpful? thumb_up thumb_down
  • Trying to answer all questions:

    @ adrian  - I did some testing with Veritas and BE, saving different data from the local server and others servers.  We found the local had good speed but when we tested different servers the speed went down to 645 MBs.  Doesn't seem to matter what server we try to backup.

    I have 10 physical servers and about 45 users. I setup the switches as DELL advised me to do.  I have the SFP cables setup going from first port to the second port on each, connecting all.  One NIC in use per server.

    @ m@ttshaw  - we did those tests with Veritas.  This is how we got to the network issue.  I do not feel it is BE.  I don't know my limit of the tape drive unit but I used to get about 2,300 MB/min for the tape backups.

    The servers are all connected to the same switch due to location - but I also did some copy/speed test to pc's - got the same results.

    Was this post helpful? thumb_up thumb_down
  • I would start with the links to the servers or between switches, start with the most common link between everything.  Like m@ttshaw suggested.  You either have a link running at 100Gbps​ or something running at half-duplex.  If it's the latter of the two it's probably a bad cable or bad port, could also be a misconfiguration if each side isn't set to auto-negotiate.  

    While Wireshark is very useful tool, if you haven't used it before I would move it down the list of options for troubleshooting in this case.  Understanding what each packet is doing is not something that will be easy to just Google and get an answer that makes sense to you.  I would verify the ports have proper speeds first, check the switch logs for any errors or message than can provide insight.  

    If the connection between servers is going Server <-> Switch <-> Switch <-> Server, can you plug a server or laptop into one of the switches and test?  Server <-> Switch <-> Server/Laptop, this will potentially tell you if the link between the switches is the issue.

    Was this post helpful? thumb_up thumb_down
  • Is your switch smart or managed?  If so, log into it and see what speed the ports are actually connecting at.  If it's a dumb switch, the color of the link LEDs may show the link speed.  If there are any VM(s) in play here, ensure the network properties within the VM(s) is showing the correct speed.

    Was this post helpful? thumb_up thumb_down
  • Maria9537​ What is the commonality of the issue here?  When you are testing speeds to PCs, what device are you testing from?  Is it all tests from one server/backup appliance? Or are you testing from different servers to the PCs, is the switch the servers are connected to the one constant in all your scenarios?  

    Check the switch for resources utilization, memory and CPU.  

    Was this post helpful? thumb_up thumb_down
  • ThaneKrios I think l you had the right idea.  In the end we found some but not all servers speed on the switch had changed to 100 instead of being at 1,000.  How this happened I don't know but I had to reset them each to get the Gigabit speed back and they went down for a minute.  Very odd.  Not sure if the switch is faulty or what.  Time will tell.  Also, only one differential backup ran so far and it was faster.  So a full back will be the test to see if the backup issue is fixed.  Wonder if we find other things have sped up to.  I will try to do at least one more followup just in case this is not the fix. Thanks for the help.

    Was this post helpful? thumb_up thumb_down

Read these next...