Tuesday 2 December 2014

EMC Data Warehouse Fast Track MS SQL 2014


So I started to read the white paper produced by EMC about their "new" storage VNX5600 series and how good it is for Data Warehouse using SQL Server 2014. Almost immediately I began to doubt the BS they are trying to sell.

This paper was written in (September) Q4 2014, talks about a next generation storage, yet the whole paper is based on storage technologies that ended being next generation in 2013.
  • EMC storage uses 8GB Fiber channel - There is already 16GB FC that has been available for over a year. 
  • They use 10K SAS disks - Right now, the best performance disks are SSD based, and that is the next generation enterprise storage.
The paper references something that I am unable to locate. Data Warehouse Fast Track for Microsoft SQL Server 2014 (DWFT 2014). There is currently only DWFT 2012 on the Microsoft website(Dec 2014).


Storage Reference

EMC Storage Cache - 2TB

Disk Configuration:
4 x 300 RAID ? (Storage OS)
4 x 300 RAID 10 (LOGS) 600 GB Usable Storage
2 x 300 RAID ? (Staging area) 300GB Usable Storage [600MAX]
60 x 300 RAID 5(x12) (Data & TempDB) 14.4TB Usable Storage

In a real world, a storage that can host 14TB of databases, will need a lot more than 600GB for Logs.


Database configuration

Database is configured to be only 1TB.
There is inconsistency in configuration with the data files, part of (table 5) shows that there are 12 files on a partition, the other part of the table shows same partition actually configured with only 8 files. Also in the notes, it states that 1 file per Lun, then 2, until 7 files per Lun, which doesn’t make sense.


SQL Server Memory Configuration

They quote SQL Server 2014 best practices, but they are not referencing the link anywhere. They say it should be set to no more than 92% which sounds right but they have allocated only 117GB out of 256GB which is 45%. That leaves over 50% of memory to the OS, in my view this is a waste of memory as under normal circumstances the OS wouldn't need that much memory.



SQL Server MAXDOP Configuration

Max degrees of parallelism (a.k.a MAXDOP) in SQL Server is used to control the maximum number of threads per execution plan operators work. MAXDOP does not restrict the number of CPU’s/Threads used by single batch or Query.
In this scenario there appears to be two CPUs with 10 cores each, so the MS recommendation should be a setting equal to the number of cores per CPU I.e 10 not 20 as shown which is the overall number of cores.

Also no recommendation for cost threshold for parallelism the minimum  which is the cost of a query before SQL Server will consider dividing the work across multiple cores The default of 5 is usually to low (a much higher value for DW solutions is often touted) and only after the workload is understood can this and the MAXDOP settings be properly set

Their results

Their measured throughput is 2,697 MB/Sec on a single server connected to a dedicated storage and with 2TB of Cache and only 1TB of actual data (whole database fits in to Cache).   


Conclusion

The general throughput (2,697 MB/Sec) looks ok, until you realise that the numbers are likely to look very different and much lower if the database was actually 25TB (as they quote with the compression features).

There is no concurrency, 1 Server, 1 storage 1 database 1TB of data at a cost of £200,000(rough estimation). How much more money will you spend trying to figure out why your storage is nowhere near the quoted speed figures?

Setting the server to waste 50% of memory? Thanks, but no thanks!

Part of a Data Warehouse is the ETL process, that could run every night and re-populate data, so the considerations need to be also for writes not just reads.


Verdict

 White Paper?...NO! More of a Wipe Paper!

:)