Quick & Dirty Palo Alto Log Analysis

OK, so I needed to do some quick and dirty traffic analysis on Palo Alto text logs for a project I was working on. The Palo Alto is great and their console tools are nice. Panorama is not too shabby. But, when I need quick and dirty analysis and want to play with data, I dig into the logs. 
That said, for my quick analysis, I needed to analyze a bunch of text logs and model the traffic flows. To do that, I used simple command line text processing in Unix (Mac OS, but with tweaks also works in Linux, etc.)
I am sharing some of my notes and some of the useful command lines to help others who might be facing a similar need.
First, for my project, I made use of the following field #’s in the text analysis, pulled from the log header for sequence:
  • $8 (source IP) 
  • $9 (dest IP)
  • $26 (dest port)
  • $15 (AppID)
  • $32 (bytes)
Once, I knew the fields that corresponded to values I wanted to study, I started using the core power of command line text processing. And in this case, the power I needed was:
  • cat
  • grep
    • Including, the ever useful grep -v (inverse grep, show me the lines that don’t match my pattern)
  • awk
    • particularly: awk ‘BEGIN { FS = “,”} ; {print $x, $y}’ which prints specific columns in CSV files 
  • sort
    • sort -n (numeric sort)
    • sort -r (reverse sort, descending)
  • uniq
    • uniq -c (count the numbers of duplicates, used for determining “hit rates” or frequency, etc.)
Of course, to learn more about these commands, simply man (command name) and read the details. 😃 
OK, so I will get you started, here are a few of the more useful command lines I used for my quick and dirty analysis:
  • cat log.csv | awk ‘BEGIN { FS = “,”} ; {print $8,$9,$26}’ | sort | uniq -c | sort -n -r > hitrate_by_rate.txt
    • this one produces a list of Source IP/Dest IP/Dest Port unique combinations, sorted in descending order by the number of times they appear in the log
  • cat log.csv | awk ‘BEGIN { FS = “,”} ; {print $8,$9}’ | sort -n | uniq -c | sort -n -r > uniqpairs_by_hitrate.txt
    • this one produces a list of the uniq Source & Destination IP addresses, in descending order by how many times they talk to each other in the log (note that their reversed pairings will be separate, if they are present – that is if A talks to B, there will be an entry for that, but if B initiates conversations with A, that will be a separate line in this data set)
  • cat log.csv | awk ‘BEGIN { FS = “,”} ; {print $15}’ | sort | uniq -c | sort -n -r > appID_by_hitrate.txt
    • this one uses the same exact techniques, but now we are looking at what applications have been identified by the firewall, in descending order by number of times that application identifier appears in the log
Again, these are simple examples, but you can tweak and expand as you need. This trivial approach to command line text analysis certainly helps with logs and traffic data. You can use those same commands to do a wondrous amount of textual analysis and processing. Learn them, live them, love them. 😃 
If you have questions, or want to share some of the ways you use those commands, please drop us a line on Twitter (@microsolved) or hit me up personally for other ideas (@lbhuston). As always, thanks for reading and stay safe out there!