Data Clump (Code Smell)

From HandWiki

In object-oriented programming, "data clump" is a name given to any group of variables which are passed around together (in a clump) throughout various parts of the program. A data clump, like other code smells, can indicate deeper problems with the program design or implementation. The group of variables that typically make up a data clump are often closely related or interdependent and are often used together in a group as a result. A data clump is also known as a specific kind of class-level code smell that may be a symptom of poorly written source code.

Refactoring Data Clumps

In general, data clumps should be refactored. The presence of data clumps typically indicates poor software design because it would be more appropriate to formally group the different variables together into a single object, and pass around only this object instead of the numerous primitives. Using an object to replace a data clump can reduce the overall code size as well as help the program code to remain better organized, easier to read, and easier to debug.

The process of removing data clumps runs the risk of creating a different type of code smell (a data class, which is a class that only stores data and does not have any methods for actually operating on the data); however, the creation of the class will encourage the programmer to see functionality that might be included here as well.[1] [2]

In object-oriented programming, the purpose of objects is to encapsulate both relevant data (fields) and operations (methods) that can be performed on this data.[3] The failure to group fields together into a true object can discourage the association of relevant actions.

A long list of parameters/variables does not necessarily indicate a data clump; it is only when the various values here are intimately and logically related that their presence is considered a data clump. Although such cases are rare, it is possible for a method to legitimately take half a dozen or more completely unrelated parameters that could not be cleanly turned into a single object. This, however, suggests that the method is trying to do far too much and would be better broken into multiple methods, each of which is responsible for a smaller piece of the overall responsibility. This beckons as another opportunity for refactoring to be used in order to improve the quality of the code.

Refactoring to eliminate data clumps does not need to be done by hand. Many modern fully featured IDEs have functionality (often labeled as "Extract Class") that is capable of performing this refactoring automatically or nearly so. This can decrease the cost and improve the reliability of the refactoring, thus enabling otherwise reluctant developers to do so expediently.

Example

Naturally, data clumps can exist in any object-oriented programming language. The example below was chosen simply because of its simplicity in scope and syntax.

In Java

public static void main(String args[]) {
    String firstName = args[0];
    String lastName = args[1];
    Integer age = new Integer(args[2]);
    String gender = args[3];
    String occupation = args[4];
    String city = args[5];
    welcomeNew(firstName,lastName,age,gender,occupation,city);
}

public static void welcomeNew(String firstName, String lastName, Integer age, String gender, String occupation, String city){
    System.out.printf("Welcome %s %s, a %d-year-old %s from %s who works as a%s\n",firstName, lastName, age, gender, city, occupation);
}

In the previous example, all of the variables could be encapsulated into a single "Person" object, which could be passed around by itself. Additionally, the programmer may then recognize that the welcomeNew method would be better associated with the Person class, and could then come up with other relevant actions associated with the Person. For instance, the code could be refactored and expanded as follows:

public static void main(String args[]) {
	    String firstName = args[0];
	    String lastName = args[1];
	    Integer age = new Integer(args[2]);
	    String gender = args[3];
	    String occupation = args[4];
	    String city = args[5];

	    Person joe = new Person(firstName,lastName,age,gender,occupation,city);
	    joe.welcomeNew();
	    joe.work();
	    
	}
	private static class Person{
		/* All parameters have been moved to the new Person class where they are properly grouped and encapsulated */
		String firstName;
	    String lastName;
	    Integer age;
	    String gender;
	    String occupation;
	    String city;
	    
	    public Person(String firstName, String lastName, Integer age, String gender, String occupation, String city){
	    	this.firstName = firstName;
	    	this.lastName = lastName;
	    	this.age = age;
	    	this.gender = gender;
	    	this.occupation = occupation;
	    	this.city = city;
	    }
	    
	    /* Existing functionality relating to the data can also be incorporated into the new class, reducing the risk of scope collision */
	    public void welcomeNew(){
		    System.out.printf("Welcome %s %s, a %d-year-old %s from %s who works as a%s\n",firstName, lastName, age, gender, city, occupation);
		}
	    /* Additionally, the new class may be an opportunity for new functionality to be added */
	    public void work(){
	    	System.out.printf("This is %s working hard on %s in %s", firstName, occupation, city);
	    }
	    
	}

Although this has increased the length of the code, now the single Person can easily be passed around as one object, rather than as a variety of (seemingly unrelated) fields. Additionally, this gives the opportunity to move associated methods into the class so that they can easily operate upon individual instances thereof. These methods no longer require passing around a tedious list of parameters, as they are instead stored as instance variables upon the object instances themselves.[4]

References

  1. Fowler, Martin. "DataClump". https://martinfowler.com/bliki/DataClump.html. Retrieved 10 February 2017. 
  2. "Data Clumps". https://sourcemaking.com/refactoring/smells/data-clumps. Retrieved 10 February 2017. 
  3. Kindler, E.; Krivy, I. (2011). Object-Oriented Simulation of systems with sophisticated control. International Journal of General Systems. pp. 313–343. 
  4. "What's the difference between a class variable and an instance variable? - Programmer and Software Interview Questions and Answers" (in en-US). Programmer and Software Interview Questions and Answers. http://www.programmerinterview.com/index.php/c-cplusplus/whats-the-difference-between-a-class-variable-and-an-instance-variable/.