Software:DiffEngineX

From HandWiki
DiffEngineX
DiffEngineX Icon.png
Developer(s)DiffEngineX LLC
Initial releaseFebruary 1, 1996
Written inC-sharp
Operating systemWindows
TypeData comparison
LicenseClosed Source
Websiteflorencesoft.com
Screenshot showing Excel workbook row alignment.

DiffEngineX compares two Excel spreadsheets and reports their differences.[1][2][3][4] It compares cells (formulae, constants and calculated values), comments, defined ranges (names) and Visual Basic for Applications (VBA) macros. Before a cell-by-cell comparison, similar rows and columns between two spreadsheets have to be aligned by the insertion of blank rows/columns. Otherwise, identical cells between two workbooks will be missed. It runs on Microsoft Windows.

Excel Spreadsheets

A spreadsheet is essentially a grid made up of rows and columns. New rows and columns can be inserted into a workbook with existing content. Row insertion pushes all the content below it down. Column insertion pushes content after it to the right.

Excel workbooks support both relative and absolute cell references, which refer to other cells. References allow formula to use the content of other cells in their calculations. An absolute reference refers to a specific cell with a fixed row and column reference. This reference never varies when copied into other cells. A relative reference refers to a cell a certain displacement from the current cell. The cell referred to can change when the formula is copied or displaced into other cells.

Compare Excel Spreadsheets / Worksheets: The Problems

One of the problems when comparing Excel workbooks is taking into account new row and column insertion. The same content in two workbooks could have different row and column coordinates. Additionally, relative references may have new cell coordinates if they use A1 (e.g. =A1) rather than R1C1 (e.g. =R[-1]C[-1]) notation. The cell reference A1 means one cell up and one cell to the left when found in cell B2. If this cell reference is displaced into cell C3 it becomes B2. In R1C1 notation, the cell reference R[-1]C[-1] remains the same wherever it is moved to.

Before spreadsheet comparison DiffEngineX aligns rows and columns to ensure no spurious differences are reported. Blank rows and columns are inserted to align similar content. The goal is to ensure similar content has the same row and column numbers in both the workbooks being compared, otherwise not only could you end up comparing the wrong cells with each other, but even if you match up the right cells, they will be difficult to compare properly due to A1 reference notation.

Compare Excel Workbooks - Not Just The Visible Cells

Tools that compare Excel spreadsheets not only have to contend with comparing the worksheet cells, but also cell comments, defined ranges (names) and embedded Visual Basic for Application (VBA) macros. Each cell can either contain a formula or a constant, such as a date, number or string. If the cell contains a formula, DiffEngineX allows the user a choice between comparing the actual text of the formula or its calculated value.

Alignment Of Matched Rows And Separation Of Unmatched Rows

Workbooks can contain rows of database-like data rather than formulae. If the rows are pre-sorted, DiffEngineX can correctly compare them when row alignment is selected. DiffEngineX requires the user to specify columns that act as unique row identifiers. Any pair of rows with the same identifiers will be aligned between the two workbooks being compared. For these cases, DiffEngineX must insert blank rows not just to align similar rows, but to ensure mismatched rows (with different identifiers) end up paired with blanks. This makes it easier to spot the addition of new rows and the deletion of old rows, as their matching row will always be empty.

Excel's Maximum Row Limit and Running Time

The launch of Excel 2007 increased the maximum row limit to 1,048,576. DiffEngineX's row alignment algorithm runs in m log n time, where m and n refer to the number of rows in the two spreadsheets being compared. Typically longest common subsequence problem algorithms run in quadratic time and as such would be ill-suited to comparing spreadsheets with hundreds of thousands of rows.

References

  1. CNET Editors' review [1], 27 February 2012. Retrieved on 11 December 2012.
  2. Chambers, Chris, Martin Erwig, and Markus Luckey. "SheetDiff: A tool for identifying changes in spreadsheets." IEEE Int. Symp. on Visual Languages and Human-Centric Computing.[2], September 2010. Retrieved on 11 December 2012.
  3. Harutyunyan, Anna, et al. "Planted-model evaluation of algorithms for identifying differences between spreadsheets." engr.oregonstate.edu [3], 2012. Retrieved on 11 December 2012.
  4. McKay, S. Kyle. Reducing Spreadsheet Errors. No. ERDC-TN-EMRRP-EBA-03. Corps Of Engineers Washington DC Ecosystem Management And Restoration Research Program [4], September 2009. Retrieved on 11 December 2012.

External links

See also