When we talk about refactoring we’re usually talking about small code changes that don’t affect the external behaviour of the code but improve readability and/or maintainability of the code. These two things often go hand in hand and by focusing on improving readability we often end up with code that is more maintainable.
Why should I refactor my code?
When I am refactoring I usually start by making changes to improve the readability of the code. The time and effort spent on it you (or some other developer) get back the next time you have to work on that code. Then you have more readable code that requires less time to figure out how it works and what intentions you had when you wrote that code. The result is a more maintainable code base, easier to bug fix and add functionality to.
When do I find time to refactor?
The best time to refactor is when you have all of the pieces of the code in your head and fully understand how it works. While bug fixing/adding new functionality you have spent the time and effort to understand the code you are working on and all you need is one more iteration where you refactor and then test. Refactoring should be a part of every feature or bug you work on and when you do time estimates (that is, if you and your team do time estimates) you should take time for refactoring into account.
Some developers never seem to find time to refactor their code. Pressured by deadlines and deliveries, simply making it work is enough. They then check in their code and tell themselves that they’ll find time later on to refactor it. I do this sometimes, and in most cases that time never comes. One simple rule that constantly comes up when talking about refactoring is to always check in a piece of code cleaner than when you checked it out. This rule is sometimes refered to as The Boy Scout Rule.
A walk through of some basic refactoring
To demonstrate this I created this overly simplified code sample that I start with and then apply a few code refactorings with the intention of improving readability of the code. But bear in mind that code readability is subjective. What one person thinks improves the readability of the code, others might find make the code more complicated. With that in mind, all of the changes I make here are my attempt to make this code more readable using known refactorings.
public class DataAccess
{
internal static void InsertPost(string title, string content, string type, string username, DateTime createdDate)
{
if (!AccessSystem.CanCreatePost(username))
{
throw new Exception(string.Format("user {0} not allowed to create a new post!"));
}
//validate post
if (string.IsNullOrEmpty(title))
{
throw new Exception("title must be set");
}
else if (string.IsNullOrEmpty(content))
{
throw new Exception("Content must be set");
}
else if (type != "post" && type != "page")
{
throw new Exception("Type must be either post or page");
}
//insert the post in the database
int id = DataAccess.InsertPost(title, content, type, username, createdDate);
if (id > 0)
{
//if this was a post, we want to publish it right away
if (type == "post")
{
DataAccess.PublishPost();
}
else if (type == "page")
{
DataAccess.RebuildPageStructure();
}
}
else
{
throw new Exception("Failed inserting post");
}
}
}
Above we have about 40 lines of C# code. The method starts by performing an access check, then validates the input data, inserts the post into the database and, depending on the post type, either publishes it or rebuilds some page structure. If you have experience with back-end C# code, you have probably seen similiarly structured code before. Lets walk through the refactorings we can apply to this code to make it more readable.
-
Extract method. This is probably my most used refactoring. By taking a few lines of code and extracting them to a suitably named method we have more readable code and have documented our intentions. This techinque also eliminates the need for one-line comments describing what your code does, instead you create a method with a descriptive name and your intentions are clear. Applying this code refactoring on our code we created four new methods.
-
Create a class for the parameter object. The previous
InsertPost
method had five parameters, four of which belonged to the post being created and one not. By doing this it becomes clear which properties belong to the post and that theusername
is only used for the access check. -
Introduced the
PostType
enum, representing possible post types. We no longer rely on “magic” strings for comparison with the string property we previously used. -
Changed the return type of the
InsertToDatabase
method. Before that method returned an int value and we would check if that was greater than zero to indicate if the insert was successful. This assumes our data store only uses positive values for ids and returning a positive int is equal to a successful operation. Now we return anInsertResult
object which has two properties,Success
which is a bool andId
which is an int. We check theSuccess
property to see if the insert was successful and the resulting code becomes cleaner and more readable.
public class DataAccess
{
internal static void InsertPost(Post post, string username)
{
VerifyThatUserIsAllowedToCreatePosts(username);
ValidatePost(post);
bool success = InsertToDatabase(post, username);
if (success)
{
DoPostInsertActions(post);
}
}
private static void VerifyThatUserIsAllowedToCreatePosts(string username)
{
if (!AccessSystem.CanCreatePost(username))
{
throw new Exception(string.Format("user {0} not allowed to create a new post!"));
}
}
private static void ValidatePost(Post post)
{
if (string.IsNullOrEmpty(post.Title))
{
throw new Exception("title must be set");
}
else if (string.IsNullOrEmpty(post.Content))
{
throw new Exception("Content must be set");
}
}
private static void DoPostInsertActions(Post post)
{
if (post.Type == PostType.Post)
{
DataAccess.PublishPost();
}
else if (post.Type == PostType.Page)
{
DataAccess.RebuildPageStructure();
}
}
private static bool InsertToDatabase(Post post, string username)
{
InsertResult result = DataAccess.InsertPost(post, username);
if (!result.Success)
{
throw new Exception("Failed inserting post");
}
else
{
return result.Success;
}
}
}
public class Post
{
public string Title { get; set; }
public string Content { get; set; }
public PostType Type { get; set; }
public DateTime CreatedDate { get; set; }
}
public enum PostType
{
Post,
Page
}
What have we gained?
We now have a few small methods, each with its own responsibility. The main method, InsertPost
, is only seven lines of code and reading through it you can easily understand what it does and the flow. We don’t need any additional comments to indicate what that code does. Reading that method is like reading a (short) story. Compare this method to the code we started with, where you probably need to scan through the code more than once to fully understand what it does. This improvement in readability comes in handy when we revisit this code to do some bug fixing or add some feature, then it is clear what code we need to look at since we have broken the code into smaller methods. Readability and maintainability +1!
Photo credit: Gufudalsá, Iceland. Own photo.