Online Chat

Use the window below to chat with me (if I'm online ...)

Use the edit nick field above to let me see your name.

cazh1: on Business, Information, and Technology

Thoughts and observations on the intersection of technology and business; searching for better understanding of what's relevant, where's the value, and (always) what's the goal ...

Sunday, March 11, 2007

Excel vs. RDBMS: Choosing the Technology, Winning the Arguments

Excel vs. RDBMS: Choosing the Technology, Winning the Arguments

Businesses large and small, private and public, for-profit and not, commonly control critical business processes using the EIE platform (which means Everything in Excel - always good for a laugh in your next PowerPoint - jpmacl).

Folks in the business get used to the power and control they have with spreadsheets, and who can blame them? Excel is …

  • … fast and flexible
  • … easy to learn
  • Everybody has a copy
  • I don't have to go through IT

Remember, most of these folks have business degrees, powerful home computers, do their personal finances and taxes electronically, and have been exposed to data processing concepts since their undergrad days. Excel is a phenomenally powerful platform for expressing numerical ideas, and finance folks [especially] are quite capable and creative. Remember, you're only a good $25 book away from opening up the guts and creating … (gulp) … code!

It's not that IT doesn't like Excel; in fact, all of these points are why we like to empower the business, putting Excel on every desktop. However, as spreadsheet applications become increasingly critical to the business, the problems begin to appear.

Power brings problems

Spreadsheets can blur the lines between report writer and database.

  • I love to use Excel as a quick-and-dirty ETL tool and list manipulator; for data sets of reasonable size, I can hack together some Excel formulas to clean up dirty data, sort to eliminate dupes and identify missing rows, split addresses into city, state, zip, etc.
  • Spreadsheets provide limitless ability to control the look and feel of the content, with an eminently usable tool (much easier than most report writers). The spreadsheet gives me the added "power" of formulas at the cell level, so my reports come out exactly like I want them to.
    • I'm using air quotes to note that cell-specific formulas can be helpful in spreadsheets, but typically cannot be replicated in a database, as they break 4th normal form.

Excel automation can be problematic as well. Most spreadsheet developers know enough about macros and formulas to be dangerous; the typical complex spreadsheet is rife with hardcoded links, named file paths, ad hoc logic structures, and recorded macros with a characteristic lack of error trapping and documentation. Cell macros can be difficult to read and debug, and VBA is typically recorded macros with tweaks (unreadable spaghetti code)

Sustainable solutions need to be flexible and extensible, facilitate ongoing maintenance by multiple individuals, and stand up to ongoing use in changing environments. Scalable solutions can handle large volumes of data and large numbers of users without a fundamental re-architecture.

Spreadsheets aren't well suited beyond low volume, departmental solutions; Excel …

  • doesn't handle large data: Single spreadsheets are limited to 65,536 rows and 256 columns, and really wants to load the whole thing into memory at once.
  • is disconnected from source data: Establishing an electronic link to a large database, via ODBC or direct connection, is often a bit too technical for most non-IT people to figure out. There are many ways to get data into Excel, from the mundane (applications that provide "open in Excel" buttons, or export to CSV tools) to the ridiculous (rekeying printed reports).
  • has no version control: Most companies don't have effective collaboration environments, and the instant someone forwards a copy through email, there is always a threat of …
  • … [enables] multiple versions of the truth: With no automated link to the data, it's common to see different data sets in use (too much trouble to get an update). With no version control, the printouts don't reflect whose macros and formulas are we looking at.
  • is not geared towards multiple users: You can manually handle 2-3 authors by talking a lot (who's got it? I got it!), maybe a small group of 10-15 by stashing your spreadsheet in a shared folder and letting the file system control who can have it open, but these solutions are inelegant and limiting.

Choosing the Technology

Most of the time, these problems can are solved by manual intervention - dedicating people to manipulate the data feeds, monitor the versions, and massage the look/feel of the output. When this effort becomes a royal pain, the call goes out to IT for "automation". This is when the unstoppable, high-speed, fast-twitch RAD environment on my PC runs into the lumbering wall of Information Architecture, Enterprise Systems design, and Sarbox-driven access controls of Corporate IT.

The tough conversations between business and IT often come when working to systematize, automate, and scale these point solutions; the challenges above create the need, but the solutions can be time consuming, and often introduce organizational change that this more difficult than the technology pieces. Problems in these conversations manifest in a couple of ways:

  • The business team hands IT a spreadsheet, and says "make the report(s) look like this"
  • The IT team develops an estimate (in weeks) for the project, and the spreadsheet guru scoffs; Why is this so hard, I can do this over the weekend in Excel?
  • The requirements call for "business rules", like conditional cell values, that break 4th normal form. IT says no, but the business says yes - "it's a fundamental requirement"

Is Excel evil?

Are these challenges reason enough to ban Excel from the building? Of course not - the technology is empowering for the business and liberating for IT - when used correctly …

  • … for quick solutions where IT resources are constrained
  • … for prototyping and requirements gathering: Project teams should encourage the use of excel for modeling
  • … for final-format reporting: Lots of detailed IT requests can come from end users fiddling with sorts, column order, fonts, etc. If it fits with the report delivery requirements, consider giving final look/feel control to the business by using Excel to create reports
  • … for low-volume, "fast-twitch" analytics: Many of these "solution requirements" are just different data sets - but aggregated from multiple sources and/or multiple years. Use the fancy IT platforms to do the aggregation, but use Excel for the query / analytics tool

Creating win-win situations with Excel

Project managers and systems analysts need to invest a little time into these conversations; we need to solve these problems but must be aware of organizational change (the Best Solution will probably be different than what we currently use) as well as budget limitations (we don't have unlimited resources or time to develop the Perfect Solution).

1. Outside of any specific project, IT should work to empower the business with as much direct, automated access to source data as is practical. Document how to create ODBC connections, provide effective meta-data (think views with master data values, not cryptic foreign keys views, pointing to non-transactional data warehouses / extract that separate potential runaway queries from production databases).

2. Before any project, invest some time discussing Requirements, and the difference between what I have, what I want, and what I need.

  • NB: This should be clearly understood before you move on to …

3. Build buy in to the solution, and give credibility to the work already accomplished, by using the Excel solution as a basis for what the end should look like; a jump start on the requirements, business rules, data sources and flows, etc.

Excel is a local optimization, that can be quite effective until the scope of the solution expands to cover a large part of the business.

<< blog home