A fix is available
APAR status
Closed as program error.
Error description
============= Machine crashed at v_kmem_hide+0002B4 while executing fvldd.p.1 tests. Machine is in KDB and available for debugging: ■mustfix template continued in the next note <Note by johniac, 2010/04/13 08:56:34 seq: 8 rel: 0 action: assign> m_create_mpool is calling kmem_hide. Not sure if this should go to sysuipc, or the pcikngent ode underneath that. Can you take a look? <Note by johniac, 2010/04/13 08:57:16 seq: 8 rel: 0 action: assign> m_create_mpool is calling kmem_hide. Not sure if this should go to sysuipc, or the pcikngent code underneath that. Can you take a look? <Note by grivera, 2010/04/13 09:52:59 seq: 9 rel: 0 action: note> looking ... <Note by grivera, 2010/04/13 12:08:03 seq: 10 rel: 0 action: note> Apparently m_create_mpool() is tryin gto hide the same area of mmmoety twice. I'm still investigating ... <Note by grivera, 2010/04/13 15:00:07 seq: 11 rel: 0 action: note> After looking at the code that is calling kmem_hide: m_create_mpool() -> kmem_hide_clustpool() -> kmem_hide_cluster() -> kmem_hide() It is not clear how this code could be calling kmem_hide() for the same area of memory twice. The code of kmem_hide_cluster() even has its own flag that keeps track if an mbuf cluster is already hidden: void kmem_hide_cluster(struct mbuf *p_mbuf) { char *extbuf = p_mbuf->m_ext.ext_buf; u_long extsize = p_mbuf->m_ext.ext_size; struct kmemusage *kup; ... kup = btokup(extbuf); if ((((extsize >= PSIZE_64K) && (PAGEADDR_64K(extbuf) == (long)extbuf)) || (kup->ku_flags & NMD_PAGE_4K)) && !(p_mbuf->m_extdebug->flags & CLUST_HIDDEN)) { kmem_hide(extbuf, extsize); p_mbuf->m_extdebug->flags |= CLUST_HIDDEN; } ... } This code has not been changed since mid of 2009. So, I don't see how this can be a regression, at least no in the mbuf code. I'm going to pass this defect to the owner of kmem_hide() for further investigation. <Note by grivera, 2010/04/13 15:01:27 seq: 12 rel: 0 action: assign> <Note by pavaman, 2010/04/13 23:13:03 seq: 13 rel: 0 action: note> According to ARTLab schedule we have 53V SP testing starting from tommorrow. So, will be needing the machine for 53V. So, please take all relevant information about the crash and analyse the defect at the earliest. ==== State: Open by: vaslot on 14 April 2010 08:40:29 ==== Looking... ==== State: Working by: riveraj on 23 April 2010 17:21:48 ==== VMM PFT Entry For Page Frame 0000058EAA of 00001FFFFF pvt = F20080004058EAA0 pft = F200800032157FC0 Not h/w hashed s/w hashed sid : 0000000012002 pno : 0000008841 psize : 4K s/w pp/noex/hkey : 0/0/00 wimg : 2 > in use > on scb list > on lru list > hidden by xmdbg <<<<<<<<< > CMO active > referenced (pvt/pte): 0/0 > modified (pft/pvt/pte): 1/0/0 base psx.................. 00 ( 4K) soft psx.................. 00 ( 4K) owning frameset........... 00000000 source page number............ 8841 dev index in PDT.............. 0000 disk block number......... 00000000 next page sidlist. 000000000008BA6C prev page sidlist. 0000000000049E11 next page aux......... 000000000000 prev page aux......... 000000000000 waitlist.......... 0000000000000000 logage.................... 00000001 short-term pincnt............. 0001 long-term pincnt.............. 0000 next page LRU......... 000000049E11 prev page LRU......... 000000049E10 teststkfix gencnt......... 00000000 alias list................ FFFFFFFF s/w hash list..... FFFFFFFFFFFFFFFF save pp key pgin................ 00 save hw key pgin................ 00 DR reference count........ 00000000 We set pft_hidden() only in kmem_hide() and clear it in kmem_unhide(). Interestingly, v_kmem_hide() and v_kmem_unhide() do not take the SCB lock. I'll have to check with other VMM experts why this is the case. Normally when changing state of any page in a segment, we hold SCB lock. At any rate, if this is a higher level problem in net malloc or related stuff (such as hiding it twice or forgetting to unhide it somewhere), we can probably catch it with some component traces in kmem_hide/unhide. <Note by cde03 (CDE Administration), 2010/04/23 17:53:07, seq: 56 rel: 0 action: note> <cde:lastmodifiedby>riveraj@us.ibm.com at 2010/04/23 18:47:53, action: Modify.</cde:lastmodifiedby> ==== State: Working by: riveraj on 23 April 2010 17:47:50 ==== Your view and stream information: -------------------------------- Your integration stream: AIX710_area@/vobs/AIX_pvob Your integration view: grivera_AIX710_area_twitch1_view Your development stream: grivera_SW000301_aix710_dev Your development view: grivera_SW000301_aix710_dev_twitch1_view Your views are located in: /gsa/ausgsa/home/g/r/grivera/ccviews <Note by grivera (Rivera, Jose German), 2010/04/23 18:44:21, seq: 57 rel: 0 action: note> Final Developer Checklist ------------------------- Created: Fri Apr 23 18:44:21 CDT 2010 Dev Manager: Priya Paul (ppaul@us.ibm.com) Existing Tracks: bos61D bos61F bos61H bos61J Changing Behavior? No Clean Sandbox Build? Yes Tested All Releases? Yes FV Testcase Status: * No FVT needed Reason No FVT Needed: No new functionality was added End Final Developer Checklist <Note by cde03 (CDE Administration), 2010/04/23 20:43:06, seq: 58 rel: 0 action: note> <cde:lastmodifiedby>zdai@zdai.austin.ibm.com at 2010/04/23 21:39:52, action: Modify.</cde:lastmodifiedby> ==== State: Working by: zdai on 23 April 2010 20:39:51 ==== <review type="normal"> <phase>1</phase> <componentList> <component>sysuipc</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>zdai@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-23T20:39:40-19:00</timestamp> <comments></comments> </review> <Note by cde03 (CDE Administration), 2010/04/26 13:08:25, seq: 59 rel: 0 action: note> <cde:lastmodifiedby>aixtools@us.ibm.com at 2010/04/26 14:04:52, action: AddNote.</cde:lastmodifiedby> ==== State: Working by: aixtools on 26 April 2010 13:03:16 ==== <review type="normal"> <phase>2</phase> <componentList> <component>ALL</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-26T13:03:04-05:00</timestamp> <comments>auto-P2 approval for UCMUtilityActivity AIXOS01209976: (release aix710)</comments> </review> ==== State: Working by: aixtools on 26 April 2010 13:03:50 ==== <review type="pkg"> <phase>2</phase> <componentList> <component>ALL</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-26T13:03:42-05:00</timestamp> <comments>auto-P2 approval for UCMUtilityActivity AIXOS01209976: (release aix710)</comments> </review> ==== State: Working by: aixtools on 26 April 2010 13:04:50 ==== P2 review was auto-approved for UCMUtilityActivity AIXOS01209976: (release aix710) TestVerify record for release aix710 (type RISC6000) updated with UCM Project and stream. <Note by jagoodwi (Goodwin, James A.), 2010/04/27 09:38:48, seq: 60 rel: 0 action: note> APAR_ABSTRACT=Machine crashed at v_kmem_hide+0002B4 <Note by aix (aix CMVC Family id), 2010/04/27 10:04:36, seq: 61 rel: 0 action: note> Level 745912.zzz in release bos61L has been assigned to selfix61 by user jagoodwi.
Local fix
Problem summary
============= Machine crashed at v_kmem_hide+0002B4 while executing fvldd.p.1 tests. Machine is in KDB and available for debugging: mustfix template continued in the next note <Note by johniac, 2010/04/13 08:56:34 seq: 8 rel: 0 action: assign> m_create_mpool is calling kmem_hide. Not sure if this should go to sysuipc, or the pcikngent ode underneath that. Can you take a look? <Note by johniac, 2010/04/13 08:57:16 seq: 8 rel: 0 action: assign> m_create_mpool is calling kmem_hide. Not sure if this should go to sysuipc, or the pcikngent code underneath that. Can you take a look? <Note by grivera, 2010/04/13 09:52:59 seq: 9 rel: 0 action: note> looking ... <Note by grivera, 2010/04/13 12:08:03 seq: 10 rel: 0 action: note> Apparently m_create_mpool() is tryin gto hide the same area of mmmoety twice. I'm still investigating ... <Note by grivera, 2010/04/13 15:00:07 seq: 11 rel: 0 action: note> After looking at the code that is calling kmem_hide: m_create_mpool() -> kmem_hide_clustpool() -> kmem_hide_cluster() -> kmem_hide() It is not clear how this code could be calling kmem_hide() for the same area of memory twice. The code of kmem_hide_cluster() even has its own flag that keeps track if an mbuf cluster is already hidden: void kmem_hide_cluster(struct mbuf *p_mbuf) { char *extbuf = p_mbuf->m_ext.ext_buf; u_long extsize = p_mbuf->m_ext.ext_size; struct kmemusage *kup; ... kup = btokup(extbuf); if ((((extsize >= PSIZE_64K) && (PAGEADDR_64K(extbuf) == (long)extbuf)) || (kup->ku_flags & NMD_PAGE_4K)) && !(p_mbuf->m_extdebug->flags & CLUST_HIDDEN)) { kmem_hide(extbuf, extsize); p_mbuf->m_extdebug->flags |= CLUST_HIDDEN; } ... } This code has not been changed since mid of 2009. So, I don't see how this can be a regression, at least no in the mbuf code. I'm going to pass this defect to the owner of kmem_hide() for further investigation. <Note by grivera, 2010/04/13 15:01:27 seq: 12 rel: 0 action: assign> <Note by pavaman, 2010/04/13 23:13:03 seq: 13 rel: 0 action: note> According to ARTLab schedule we have 53V SP testing starting from tommorrow. So, will be needing the machine for 53V. So, please take all relevant information about the crash and analyse the defect at the earliest. ==== State: Open by: vaslot on 14 April 2010 08:40:29 ==== Looking... ==== State: Working by: riveraj on 23 April 2010 17:21:48 ==== VMM PFT Entry For Page Frame 0000058EAA of 00001FFFFF pvt = F20080004058EAA0 pft = F200800032157FC0 Not h/w hashed s/w hashed sid : 0000000012002 pno : 0000008841 psize : 4K s/w pp/noex/hkey : 0/0/00 wimg : 2 > in use > on scb list > on lru list > hidden by xmdbg <<<<<<<<< > CMO active > referenced (pvt/pte): 0/0 > modified (pft/pvt/pte): 1/0/0 base psx.................. 00 ( 4K) soft psx.................. 00 ( 4K) owning frameset........... 00000000 source page number............ 8841 dev index in PDT.............. 0000 disk block number......... 00000000 next page sidlist. 000000000008BA6C prev page sidlist. 0000000000049E11 next page aux......... 000000000000 prev page aux......... 000000000000 waitlist.......... 0000000000000000 logage.................... 00000001 short-term pincnt............. 0001 long-term pincnt.............. 0000 next page LRU......... 000000049E11 prev page LRU......... 000000049E10 teststkfix gencnt......... 00000000 alias list................ FFFFFFFF s/w hash list..... FFFFFFFFFFFFFFFF save pp key pgin................ 00 save hw key pgin................ 00 DR reference count........ 00000000 We set pft_hidden() only in kmem_hide() and clear it in kmem_unhide(). Interestingly, v_kmem_hide() and v_kmem_unhide() do not take the SCB lock. I'll have to check with other VMM experts why this is the case. Normally when changing state of any page in a segment, we hold SCB lock. At any rate, if this is a higher level problem in net malloc or related stuff (such as hiding it twice or forgetting to unhide it somewhere), we can probably catch it with some component traces in kmem_hide/unhide. <Note by cde03 (CDE Administration), 2010/04/23 17:53:07, seq: 56 rel: 0 action: note> <cde:lastmodifiedby>riveraj@us.ibm.com at 2010/04/23 18:47:53, action: Modify.</cde:lastmodifiedby> ==== State: Working by: riveraj on 23 April 2010 17:47:50 ==== Your view and stream information: -------------------------------- Your integration stream: AIX710_area@/vobs/AIX_pvob Your integration view: grivera_AIX710_area_twitch1_view Your development stream: grivera_SW000301_aix710_dev Your development view: grivera_SW000301_aix710_dev_twitch1_view Your views are located in: /gsa/ausgsa/home/g/r/grivera/ccviews <Note by grivera (Rivera, Jose German), 2010/04/23 18:44:21, seq: 57 rel: 0 action: note> Final Developer Checklist ------------------------- Created: Fri Apr 23 18:44:21 CDT 2010 Dev Manager: Priya Paul (ppaul@us.ibm.com) Existing Tracks: bos61D bos61F bos61H bos61J Changing Behavior? No Clean Sandbox Build? Yes Tested All Releases? Yes FV Testcase Status: * No FVT needed Reason No FVT Needed: No new functionality was added End Final Developer Checklist <Note by cde03 (CDE Administration), 2010/04/23 20:43:06, seq: 58 rel: 0 action: note> <cde:lastmodifiedby>zdai@zdai.austin.ibm.com at 2010/04/23 21:39:52, action: Modify.</cde:lastmodifiedby> ==== State: Working by: zdai on 23 April 2010 20:39:51 ==== <review type="normal"> <phase>1</phase> <componentList> <component>sysuipc</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>zdai@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-23T20:39:40-19:00</timestamp> <comments></comments> </review> <Note by cde03 (CDE Administration), 2010/04/26 13:08:25, seq: 59 rel: 0 action: note> <cde:lastmodifiedby>aixtools@us.ibm.com at 2010/04/26 14:04:52, action: AddNote.</cde:lastmodifiedby> ==== State: Working by: aixtools on 26 April 2010 13:03:16 ==== <review type="normal"> <phase>2</phase> <componentList> <component>ALL</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-26T13:03:04-05:00</timestamp> <comments>auto-P2 approval for UCMUtilityActivity AIXOS01209976: (release aix710)</comments> </review> ==== State: Working by: aixtools on 26 April 2010 13:03:50 ==== <review type="pkg"> <phase>2</phase> <componentList> <component>ALL</component> </componentList> <releaseList> <release>aix710</release> </releaseList> <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result> <timestamp>2010-04-26T13:03:42-05:00</timestamp> <comments>auto-P2 approval for UCMUtilityActivity AIXOS01209976: (release aix710)</comments> </review> ==== State: Working by: aixtools on 26 April 2010 13:04:50 ==== P2 review was auto-approved for UCMUtilityActivity AIXOS01209976: (release aix710) TestVerify record for release aix710 (type RISC6000) updated with UCM Project and stream. <Note by jagoodwi (Goodwin, James A.), 2010/04/27 09:38:48, seq: 60 rel: 0 action: note> APAR_ABSTRACT=Machine crashed at v_kmem_hide+0002B4 <Note by aix (aix CMVC Family id), 2010/04/27 10:04:36, seq: 61 rel: 0 action: note> Level 745912.zzz in release bos61L has been assigned to selfix61 by user jagoodwi. <Note by selfix61 (Enno, Robert S. (Bob)), 2010/04/27 10:57:49, seq: 62 rel: 0 action: note> ada generated note : (-c = '') PC -addapar 748941 -apar IZ75175 -product aix -version 61L (rc = 0) APAR_aix_61L = IZ75175 .
Problem conclusion
Ensure that kmem_hide() is only called if the address of the mbuf cluster is page-aligned.
Temporary fix
Comments
APAR Information
APAR number
IZ78643
Reported component name
AIX 610 STD EDI
Reported component ID
5765G6200
Reported release
610
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2010-07-02
Closed date
2010-07-02
Last modified date
2013-03-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX 610 STD EDI
Fixed component ID
5765G6200
Applicable component levels
R610 PSY U834695
UP10/08/20 I 1000
PTF to Fileset Mapping
U834695 bos.mp64 6.1.5.3
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
28 March 2013