Top Banner
Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008
21

Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Jan 19, 2018

Download

Documents

Actionable No nonsense logging Concise, easy to understand Express symptoms of production issues Anything that makes the log needs to be fixed
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Actionable LoggingFor Smoother Operation and Faster Recovery

Mandi WallsAOL, LLCJune 23, 2008Velocity 2008

Page 2: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Actionable Logging

•What is “Actionable”•Goals of logging in production• Logging quality information• Improving log contents

Page 3: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Actionable

•No nonsense logging•Concise, easy to understand• Express symptoms of production issues•Anything that makes the log needs to be fixed

Page 4: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Why It’s Important

• Expending resources on production systems• The point of logging in production• Diagnosis of issues• The 4am Test

Page 5: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Logging Goals

• Diagnosis and recovery• Statistics and monitoring• Provide insight into the behavior of the application• Indicate potential issues, and areas for improvement• Not the same goals as development and QA environments!

Page 6: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Types of Logs

• Access log• Server log, i.e., catalina.out• Application logs• Special use logs for recording specific groups of activities

Page 7: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Log File Location

•Where logs are located on the system should be predictable and obvious• It may be helpful to locate logs on different disk partitions but link them back to the app•Keep older logs in an obvious place

Page 8: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Log File Management

• Everyone has their own method• Roll logs into files with timestamps:

– host-01.log.003 vs host-01.log.06202008• Roll all the logs at the same time for a given app to make coordination of events easier• Roll when the app needs logs rolled: hourly, daily, weekly• Don’t rely on STDOUT or server files that can’t be rolled without a hassle

Page 9: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Logging Quality Information

• Logs should be expressive but not overly verbose• Keys to making logs more actionable:

– Appropriate Formats– Quality Messages

Page 10: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Information: Format

• Timestamping: what not to do1213988938:tvdata shows/617/3061213988939:tvdata shows/618/307

20/130055 err(4) lang-locale “es-us” not found

SEVERE: Error listenerStart

Page 11: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Information: Format

• Timestamps that mean somethingJun 19, 2008 4:20:25 PM org.apache.catalina.startup.Catalina start

192.168.1.10 - - [20/Jun/2008:15:15:58 -0400] "GET /monitors HTTP/1.0" 200 230 0.049909

• Good timestamps give context for linking to external events like network outages or traffic anomalies

Page 12: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Information: Format

• Other considerations in log file format include:– Creating a common format for multiple products and log

types– Limiting the number of log entries that write to multiple log

lines for faster parsing– Deciding how much is too much information

Page 13: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Info: Good Messages

• Here’s some bad messages:[19/Jun/2008:11:14:03][14960.229405][-conn:thread::6] Error: $$$$$$$$$$$$$$$

[19/Jun/2008:11:58:32][32652.67698738][channels_news] Notice: My gallery : xl

[19/Jun/2008:12:03:29][32652.67010608][channels_games] Notice: 0

[19/Jun/2008:11:58:28][32652.67715090][channels_money] Error: ViewCounter: APP2 returns statusCode=400,

statusText=Invalid request• Other things to avoid: messages with only numerical error codes in them

Page 14: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Info: Good Messages

• Here’s some messages that are reasonably helpful:[19/Jun/2008:12:03:30][32652.66764839][channels_money] Notice: INFO_FEED: moduleId(283403) failed with url=http://rss.businessweek.com/bw_rss/bwdaily

[19/Jun/2008:12:09:52][32652.68059183][channels_news] Error: processModule.inc: us.news.story: can't read "useragent": no such variable

• One that needs a little tweaking:[19/Jun/2008:00:01:48][15446.36831248][channels_games] Error: dom parse timeout doc: error "syntax error" at line 1 character 0

"t <--Error-- imeout"

Page 15: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Quality Info: Making Messages Useful

• Misleading severity [19/Jun/2008:11:55:40][20300.704556][-conn:thread::24] Error: [fatal]: APP1: no Published data for: app1_config3, dirpage.index

• Incorrect severity, particularly of debugging messages left in at production logging levels• Not logging anything for fatal errors

Page 16: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Improving Log Messages

• Log at the first point an error is encountered – don’t log a timeout to a backend as a parse error of data expected from the request• Messages include the method name and key variables to speed up fixes• Suppress anything in the log that isn’t actionable – whether debugging information or chronic issues no one will fix by changing the log level• Make checking the logs part of the QA process

Page 17: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Log Message Convergence

• Actively managing, parsing, pruning logs make new errors more obvious• Check the logs after every install for new messages that indicate issues or are junk that slipped through into production

Page 18: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Things to Avoid in Logs

• Usernames, passwords, database logins– Provide crib notes for anyone gaining unauthorized access to

the system– These are hard to avoid in some environments, particularly if

the username is part of the url– User name can be separated from display name to avoid

revealing too much in logs

Page 19: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

How much is too much?

• A server or application log that has more than 25% of the number of access log entries is a hindrance. Even 10% may be too much in most environments• If a single application log has more entries than its corresponding access log, it’s time to have a long talk with development about removing log entries or creating multiple log files

Page 20: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Conclusion

• The log is the first line of information when a problem occurs• A production log should be focused on providing information to Operations staff, not for developers• When, where, and how messages are logged can help or hinder recovery after a problem

Page 21: Actionable Logging For Smoother Operation and Faster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008.

Questions and Comments