# Sunday, 23 January 2011
Salesforce Administrators and Developers are routinely required to manipulate large amounts of data in a single task.

Examples of batch processes include:
  • Deleting all Leads that are older than 2 years
  • Replacing all occurrences of an old value with a new value
  • Updating user records with a new company name

The tools and options available are:
  1. Browser-based Admin features / Execute anonymous in System Log
  2. Data loader
  3. Generalized Batch Apex (Online documentation)
  4. Specialized Batch Apex
Option 1 (Admin Settings) represents the category of features and tools available when directly logging into Salesforce via a web browser. The transactions are typically synchronous and subject to Governor Limits.

Option 2 (Data Loader) provides Admins with an Excel-like approach to downloading data using the Apex Data Loader, manipulating data on a local PC, then uploading the data back to Salesforce. Slightly more powerful than browser-based tools, doesn't require programming skills, and subject to web service API governor limits (which are more generous). But also requires slightly more manual effort and introduces the possibility of human error when mass updating records.

Option 3 (Generalized Batch Apex) introduces the option of asynchronous batch processes that can manipulate up to 50 million records in a single batch. Doesn't require programming (if using the 3 utility classes provided below in this blog post) and can be executed directly through the web browser; but limited to the general use cases supported by the utility classes. Some general purpose batch Apex utility classes are provided at the end of this article.

Option 4 (Specialized Batch Apex) requires Apex programming and provides the most control of batch processing of records (such as updating several object types within a batch or applying complex data enrichment before updating fields).

Batch Apex Class Structure:

The basic structure of a batch apex class looks like:

global class BatchVerbNoun implements Database.Batchable<sObject>{
    global Database.QueryLocator start(Database.BatchableContext BC){
        return Database.getQueryLocator(query); //May return up to 50 Million records
    }
  
    global void execute(Database.BatchableContext BC, List<sObject> scope){       
        //Batch gets broken down into several smaller chunks
        //This method gets called for each chunk of work, passing in the scope of records to be processed
    }
   
    global void finish(Database.BatchableContext BC){   
        //This method gets called once when the entire batch is finished
    }
}
An Apex Developer simply fills in the blanks. The start() and finish() methods are both executed once, while the execute() method gets called 1-N times, depending on the number of batches.

Batch Apex Lifecycle

The Database.executeBatch() method is used to start a batch process. This method takes 2 parameters: instance of the batch class and scope.

BatchUpdateFoo batch = new BatchUpdateFoo();
Database.executeBatch(batch, 200);
The scope parameter defines the max number of records to be processed in each batch. For example, if the start() method returns 150,000 records and scope is defined as 200, then the overall batch will be broken down into 150,000/200 batches, which is 750. In this scenario, the execute() method would be called 750 times; and each time passed 200 records.

A note on batch sizes: Even though batch processes have significantly more access to system resources, governor limits still apply. A batch that executes a single DML operation may shoot for a batch scope of 500+. Batch executions that initiate a cascade of trigger operations will need to use a smaller scope. 200 is a good general starting point.

The start() method is called to determine the size of the batch then the batch is put into a queue. There is no guarantee that the batch process will start when executeBatch() is called, but 90% of the time the batch will start processing within 1 minute.

You can login to Settings/Monitor/Apex Jobs to view batch progress.


Unit Testing Batch Apex:
The asynchronous nature of batch apex makes it notoriously difficult to unit test and debug. At Facebook, we use a general Logger utility that logs debug info to a custom object (adding to the governor limit footprint). The online documentation for batch apex provides some unit test examples, but the util methods in this post use a short hand approach to achieving test coverage.

Batch Apex Best Practices:
  • Use extreme care if you are planning to invoke a batch job from a trigger. You must be able to guarantee that the trigger will not add more batch jobs than the five that are allowed. In particular, consider API bulk updates, import wizards, mass record changes through the user interface, and all cases where more than one record can be updated at a time.
  • When you call Database.executeBatch, Salesforce.com only places the job in the queue at the scheduled time. Actual execution may be delayed based on service availability.
  • When testing your batch Apex, you can test only one execution of the execute method. You can use the scope parameter of the executeBatch method to limit the number of records passed into the execute method to ensure that you aren't running into governor limits.
  • The executeBatch method starts an asynchronous process. This means that when you test batch Apex, you must make certain that the batch job is finished before testing against the results. Use the Test methods startTest and stopTest around the executeBatch method to ensure it finishes before continuing your test.
  • Use Database.Stateful with the class definition if you want to share variables or data across job transactions. Otherwise, all instance variables are reset to their initial state at the start of each transaction.
  • Methods declared as future are not allowed in classes that implement the Database.Batchable interface.
  • Methods declared as future cannot be called from a batch Apex class.
  • You cannot call the Database.executeBatch method from within any batch Apex method.
  • You cannot use the getContent and getContentAsPDF PageReference methods in a batch job.
  • In the event of a catastrophic failure such as a service outage, any operations in progress are marked as Failed. You should run the batch job again to correct any errors.
  • When a batch Apex job is run, email notifications are sent either to the user who submitted the batch job, or, if the code is included in a managed package and the subscribing organization is running the batch job, the email is sent to the recipient listed in the Apex Exception Notification Recipient field.
  • Each method execution uses the standard governor limits anonymous block, Visualforce controller, or WSDL method.
  • Each batch Apex invocation creates an AsyncApexJob record. Use the ID of this record to construct a SOQL query to retrieve the job’s status, number of errors, progress, and submitter. For more information about the AsyncApexJob object, see AsyncApexJob in the Web Services API Developer's Guide.
  • All methods in the class must be defined as global.
  • For a sharing recalculation, Salesforce.com recommends that the execute method delete and then re-create all Apex managed sharing for the records in the batch. This ensures the sharing is accurate and complete.
  • If in the course of developing a batch apex class you discover a bug during a batch execution, Don't Panic. Simply login to the admin console to monitor Apex Jobs and abort the running batch.


Utility Batch Apex Classes:

The following batch Apex classes can be copied and pasted into any Salesforce org and called from the System Log (or Apex) using the "Execute Anonymous" feature. The general structure of these utility classes are:
  • Accept task-specific input parameters
  • Execute the batch
  • Email the admin with batch results once complete
To execute these utility batch apex classes.
1. Open the System Log

2. Click on the Execute Anonymous input text field.

3. Paste any of the following batch apex classes (along with corresponding input parameters) into the Execute Anonymous textarea, then click "Execute".


BatchUpdateField.cls
/*
Run this batch from Execute Anonymous tab in Eclipse Force IDE or System Log using the following

string query = 'select Id, CompanyName from User';
BatchUpdateField batch = new BatchUpdateField(query, 'CompanyName', 'Bedrock Quarry');
Database.executeBatch(batch, 100); //Make sure to execute in batch sizes of 100 to avoid DML limit error
*/
global class BatchUpdateField implements Database.Batchable<sObject>{
    global final String Query;
    global final String Field;
    global final String Value;
   
    global BatchUpdateField(String q, String f, String v){
        Query = q;
        Field = f;
        Value = v;
    }
   
    global Database.QueryLocator start(Database.BatchableContext BC){
        return Database.getQueryLocator(query);
    }
   
    global void execute(Database.BatchableContext BC, List<sObject> scope){   
        for(sobject s : scope){
            s.put(Field,Value);
         }
        update scope;
    }
   
    global void finish(Database.BatchableContext BC){   
        AsyncApexJob a = [Select Id, Status, NumberOfErrors, JobItemsProcessed,
            TotalJobItems, CreatedBy.Email
            from AsyncApexJob where Id = :BC.getJobId()];
       
        string message = 'The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.';
       
        // Send an email to the Apex job's submitter notifying of job completion. 
        Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
        String[] toAddresses = new String[] {a.CreatedBy.Email};
        mail.setToAddresses(toAddresses);
        mail.setSubject('Salesforce BatchUpdateField ' + a.Status);
        mail.setPlainTextBody('The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.');
        Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });   
    }
   
    public static testMethod void tests(){
        Test.startTest();
        string query = 'select Id, CompanyName from User';
        BatchUpdateField batch = new BatchUpdateField(query, 'CompanyName', 'Bedrock Quarry');
        Database.executeBatch(batch, 100);
        Test.stopTest();
    }
}
BatchSearchReplace.cls
/*
Run this batch from Execute Anonymous tab in Eclipse Force IDE or System Log using the following

string query = 'select Id, Company from Lead';
BatchSearchReplace batch = new BatchSearchReplace(query, 'Company', 'Sun', 'Oracle');
Database.executeBatch(batch, 100); //Make sure to execute in batch sizes of 100 to avoid DML limit error
*/
global class BatchSearchReplace implements Database.Batchable<sObject>{
    global final String Query;
    global final String Field;
    global final String SearchValue;
    global final String ReplaceValue;
   
    global BatchSearchReplace(String q, String f, String sValue, String rValue){
        Query = q;
        Field = f;
        SearchValue = sValue;
        ReplaceValue = rValue;
    }
   
    global Database.QueryLocator start(Database.BatchableContext BC){
        return Database.getQueryLocator(query);
    }
   
    global void execute(Database.BatchableContext BC, List<sObject&> scope){   
        for(sobject s : scope){
            string currentValue = String.valueof( s.get(Field) );
            if(currentValue != null && currentValue == SearchValue){
                s.put(Field, ReplaceValue);
            }
         }
        update scope;
    }
   
    global void finish(Database.BatchableContext BC){   
        AsyncApexJob a = [Select Id, Status, NumberOfErrors, JobItemsProcessed,
            TotalJobItems, CreatedBy.Email
            from AsyncApexJob where Id = :BC.getJobId()];
       
        string message = 'The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.';
       
        // Send an email to the Apex job's submitter notifying of job completion. 
        Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
        String[] toAddresses = new String[] {a.CreatedBy.Email};
        mail.setToAddresses(toAddresses);
        mail.setSubject('Salesforce BatchSearchReplace ' + a.Status);
        mail.setPlainTextBody('The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.');
        Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });   
    }
   
    public static testMethod void tests(){
        Test.startTest();
        string query = 'select Id, Company from Lead';
        BatchSearchReplace batch = new BatchSearchReplace(query, 'Company', 'Foo', 'Bar');
        Database.executeBatch(batch, 100);
        Test.stopTest();
    }
}
BatchRecordDelete.cls:
/*
Run this batch from Execute Anonymous tab in Eclipse Force IDE or System Log using the following

string query = 'select Id from ObjectName where field=criteria';
BatchRecordDelete batch = new BatchRecordDelete(query);
Database.executeBatch(batch, 200); //Make sure to execute in batch sizes of 200 to avoid DML limit error
*/
global class BatchRecordDelete implements Database.Batchable<sObject>{
    global final String Query;
   
    global BatchRecordDelete(String q){
        Query = q;   
    }
   
    global Database.QueryLocator start(Database.BatchableContext BC){
        return Database.getQueryLocator(query);
    }
   
    global void execute(Database.BatchableContext BC, List<sObject&> scope){       
        delete scope;
    }
   
    global void finish(Database.BatchableContext BC){   
        AsyncApexJob a = [Select Id, Status, NumberOfErrors, JobItemsProcessed,
            TotalJobItems, CreatedBy.Email
            from AsyncApexJob where Id = :BC.getJobId()];
       
        string message = 'The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.';
       
        // Send an email to the Apex job's submitter notifying of job completion. 
        Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
        String[] toAddresses = new String[] {a.CreatedBy.Email};
        mail.setToAddresses(toAddresses);
        mail.setSubject('Salesforce BatchRecordDelete ' + a.Status);
        mail.setPlainTextBody('The batch Apex job processed ' + a.TotalJobItems + ' batches with '+ a.NumberOfErrors + ' failures.');
        Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });   
    }
   
    public static testMethod void tests(){
        Test.startTest();
        string query = 'select Id, CompanyName from User where CompanyName="foo"';
        BatchRecordDelete batch = new BatchRecordDelete(query);
        Database.executeBatch(batch, 100);
        Test.stopTest();
    }
}