Main page > Java > Code examples

Geolocation: look up a computer's geographic location from its IP address

This article presents a small Java program which can be studied by beginners. It reads IP addresses from a text file, looks up the locations of the computers associated with these addresses and prints information like city, region and country as well as longitude and latitude to standard output. Finding out the (approximate) location of a computer is called geolocation. The program is small and its usage and inner workings are discussed in detail. This article demonstrates the following topics:

Target audience

The emphasis of this page as part of the code examples section is not so much on solving a hard programming problem but on interfacing: (1) using other people's code (via an API) and (2) outputting data in a way which is easy to process by others (in this case: tab-separated text). In order to come up with an efficient solution, it is often important to do research on existing code and to avoid the NIH (“not invented here”) syndrome.

The program aims at Java beginners who want to study a simple program which does something interesting. Explanations are very verbose so that they can be followed by Java newbies.

If you're an experienced Java developer and interested in adding geolocation functionality to an application of yours, the program can also serve as quick code example on how to use Maxmind's geolocation API.

IP addresses

Computers connected to the Internet typically have an IPv4 address which is written in dotted quad notation, four numbers from 0 to 255 with a dot between each consecutive pair; example: 1.2.3.4. A typical source for such addresses is the log file a Web server creates, storing one file request per line. You can learn more about log file formats at the Apache HTTP server documentation. However, in the context of this little tutorial, it is only important that the first part of each line is an IP address followed by at least one space. Here's an example line from a real access_log file:

1.2.3.4 - - [25/Jan/2008:02:44:35 -0900] "GET /robots.txt HTTP/1.1"
  200 44 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US;
  rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"

Actually, it was one long line and has been spanned over three lines in order to be more readable. The only thing needed for this program is the address 1.2.3.4—everything after it will be ignored by the program. If you don't have such a text file, copy the example line from above and replace 1.2.3.4 with some real IP addresses, e.g. your own to test how accurately you can be located (in a Windows command prompt, run ipconfig /all to find out your address; alternatively, visit whatismyipaddress.com).

Mapping addresses to locations

Now for the actual mapping of addresses to geographical locations. This is done using a big list of pairs (address block, location). Determining where computers of a certain address block are located geographically is hard. There are companies doing nothing but maintaining such lists. They sell them to companies that want to know where their visitors are located in order to show information tailored to them, or to exclude people from certain countries (a smaller company may choose to only to do business with clients from the country they are located in themselves).

There are free versions of such address-to-location mapping lists. This program relies on information gathered by MaxMind Inc. You will need the GeoLiteCity.dat file from http://www.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz (around 16 MB). Download the file, and decompress it using some archiver program or gzip -d GeoLiteCity.dat.gz on the command line. After decompression you will have a 50 percent larger file (around 25 MB) called GeoLiteCity.dat.

The second thing you'll need for mapping is Java code which accesses this database file. The GeoLiteCity.dat file uses some binary file format trying to store IP and geo information as compactly as possible and still allow quick access. The company providing the data also offers an API (application programmer interface). It is located at http://www.maxmind.com/download/geoip/api/java/. When I visited, the most recent version was 1.2.1 (62 KB). Download and decompress the file.

Running the program

It is now assumed that you are in a command prompt / shell and that your current directory contains the GeoLiteCity.dat file and a subdirectory com as the result of unzipping the API code from the previous section.

Let's do a test run:

java net.AccessLogGeo GeoLiteCity.dat < access_log

Output should look like this:

1.2.3.4   United States   PA      Philadelphia    39.996796       -75.1485
1.2.3.5   India   07      New Delhi       28.600006       77.20001

There is a single tab character between neighboring items, so alignment will not be perfect when the items differ in their length a lot like United States and India do. However, this type of format is good for processing in other programs. Here's a short description of the columns. The first column is the address. The second contains the country name in English. The third one is the region code; Maxmind has a list to map them to names in case you need such a thing. The fourth column contains the city name (watch out, in some cases there are several names for the same city, e.g. both Cologne and Koeln). The last two columns are latitude and longitude values (try out a pair of the example numbers in a map application search text field, and it should show you the city named in the fourth column). Note that some of the information may be missing, because mapping IP addresses isn't perfect.

Explanation

This section contains a short tour of the code as the program is executed. We start at the beginning of the main method, as does the virtual machine. The program needs exactly one argument, the name of the database file. If the number of arguments differs from one, an explanatory error message is printed to the standard error output (which is by default shown on the console) and the program terminates. Otherwise, a BufferedReader (a class to conveniently read text lines) object is created from standard input. The program could be made to read an input file as well, but this way, our little program can be used in a chain of command line programs. The last preparation step is to open the database file. This is done using the Maxmind API, which is accessed via its LookupService class. The main part of the work is done in the following loop. It reads text lines into String objects, until the end of the input stream is reached. Each line is given to the convert method, which determines location information for the address given in the input line. The database file is then closed.

The geolocation lookup is done in the convert method. It searches for the first space character in the line. If there is no such thing, the methods returns an empty string. Otherwise, the address is isolated and LookupService is used on it to retrieve location information. That information is then formatted for output. The helper method add appends a String to the end of a StringBuffer and adds the separation String, by default a single tab character. StringBuffer objects get used when a complex String is to be assembled. The program does not use simple concatenation of the type loc.countryName + SEP + loc.region + SEP + ... because some of the information may be missing (attributes of the loc object containing a null value). This would lead to the string "null" to be copied to output, which we want to avoid. Finally, the helper method creates a String from the StringBuffer and returns it to be printed to standard output by the calling main method.

Post-processing of location information

This section isn't strictly Java-related, but it explains why the output format using tab-separated columns was chosen. Most Unix systems come with certain command line programs to process textual data: sort, uniq, wc, cut. These programs are also available for Windows and other platforms. Caveat: Windows comes with a sort command line program of its own; it lacks the -n switch, so if the examples below don't work out, you may be using the wrong version of sort.exe. Make sure you run the Unix-based version, e.g. by specifying the complete path (something like c:\gnu\sort.exe instead of just sort).

They are ideally suited to answer some questions you might have. For convenience reasons, let's store the geographical information in a file loc.tsv (tsv = tab-separated values; you can also use .txt or any other extension):

java net.AccessLogGeo GeoLiteCity.dat < access_log > loc.tsv

This way, the smaller-than character < copies input from a file to our program, which reads it via its standard input stream. Our program's output, the tab-separated text columns, are then given with > to a file loc.tsv. Note that any existing file loc.tsv is overwritten. To append to the end of an existing file use >> and the pipe symbol | to use one program's output to the following program's input. The pipe symbol can be entered by pressing AltGr and <.

Now let's find out something about the site visitors which were logged in access_log:

Hopefully, this shows the power of those small command line utilities. Obviously, you could have programmed all that functionality of sorting, removing duplicates, counting etc. in Java yourself. In fact, it would make for a nice exercise. But keep in mind that these utilities exist on many systems, and you might want to format your output in a way to make use of them instead of doing everything from scratch in your program yourself.

Source code

Permission is given to use this code in both free and proprietary programs (it would be nice if you could link to this page in exchange). Note that this applies only to the program below, not to the API code it uses.

package net;

import com.maxmind.geoip.*;
import java.io.*;

/**
 * Read IP addresses from standard input, retrieve geographical information
 * associated with them and print the data to standard output.
 *
 * This program is placed into the Public Domain (does not apply to API code
 * used in this program).
 *
 * Check out http://schmidt.devlib.org/java/geolocation.html for background information.
 * 
 * @author Marco Schmidt
 */
public class AccessLogGeo {
	private static final String SEP = "\t";

	public static final void main(String[] args) throws IOException {
		if (args.length != 1) {
			System.err.println("Usage: AccessLogGeo FILENAME");
			System.err.println("  where FILENAME is the exact path to the database file, e.g.");
			System.err.println("  /home/bob/GeoLiteCity.dat");
			System.err.println("  retrieve new version of gzipped database file at the beginning");
			System.err.println("  of each month from http://www.maxmind.com/download/geoip/database/");
			System.exit(1);
		}
		BufferedReader in = null;
		in = new BufferedReader(new InputStreamReader(System.in));
		LookupService lookup = new LookupService(args[0]);
		String line;
		while ((line = in.readLine()) != null) {
			System.out.println(convert(lookup, line));
		}
		lookup.close();
	}

	public static final String convert(LookupService lookup, String line) {
		if (line == null || line.length() < 2) {
			return "";
		}
		int index = line.indexOf(' ');
		if (index < 1) {
			return "";
		}
		String addr = line.substring(0, index);
		Location loc = lookup.getLocation(addr);
		if (loc == null) {
			return "";
		}
		StringBuffer buf = new StringBuffer();
		add(buf, addr);
		add(buf, loc.countryName);
		add(buf, loc.region);
		add(buf, loc.city);
		add(buf, Float.toString(loc.latitude));
		add(buf, Float.toString(loc.longitude));
		return buf.toString();
	}

	private static void add(StringBuffer buf, String s) {
		if (s != null) {
			buf.append(s);
		}
		buf.append(SEP);
	}
}