IBM Support

IZ78643: MACHINE CRASHED AT V_KMEM_HIDE+0002B4 APPLIES TO AIX 6100-05

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • =============
    
    Machine crashed at v_kmem_hide+0002B4 while executing fvldd.p.1
    tests.  Machine is in KDB and available for debugging:
    
    &#65517;mustfix template continued in the next note&#65529; <Note by johniac,
    2010/04/13 08:56:34 seq: 8 rel: 0 action: assign>
    m_create_mpool is calling kmem_hide. Not sure if this should go
    to sysuipc, or the pcikngent ode underneath that. Can you take
    a look?
    
    <Note by johniac, 2010/04/13 08:57:16 seq: 8 rel: 0 action:
    assign> m_create_mpool is calling kmem_hide. Not sure if this
    should go to sysuipc, or the pcikngent code underneath that.
    Can you take a look?  <Note by grivera, 2010/04/13 09:52:59
    seq: 9 rel: 0 action: note> looking ...  <Note by grivera,
    2010/04/13 12:08:03 seq: 10 rel: 0 action: note> Apparently
    m_create_mpool() is tryin gto hide the same area of mmmoety
    twice.  I'm still investigating ...  <Note by grivera,
    2010/04/13 15:00:07 seq: 11 rel: 0 action: note> After looking
    at the code that is calling kmem_hide:  m_create_mpool() ->
    kmem_hide_clustpool() -> kmem_hide_cluster() -> kmem_hide()
    
    It is not clear how this code could be calling kmem_hide() for
    the same area of memory twice. The code of kmem_hide_cluster()
    even has its own flag that keeps track if an mbuf cluster is
    already hidden:
    
    void kmem_hide_cluster(struct mbuf *p_mbuf) { char *extbuf =
    p_mbuf->m_ext.ext_buf; u_long extsize = p_mbuf->m_ext.ext_size;
    struct kmemusage *kup;
    ...
    kup = btokup(extbuf);
    
    if ((((extsize >= PSIZE_64K) && (PAGEADDR_64K(extbuf) ==
    (long)extbuf)) || (kup->ku_flags & NMD_PAGE_4K)) &&
    !(p_mbuf->m_extdebug->flags & CLUST_HIDDEN)) {
    kmem_hide(extbuf, extsize); p_mbuf->m_extdebug->flags |=
    CLUST_HIDDEN; }
    ...
    }
    
    This code has not been changed since mid of 2009. So, I don't
    see how this can be a regression, at least no in the mbuf
    code.
    
    I'm going to pass this defect to the owner of kmem_hide() for
    further investigation.  <Note by grivera, 2010/04/13 15:01:27
    seq: 12 rel: 0 action: assign> <Note by pavaman, 2010/04/13
    23:13:03 seq: 13 rel: 0 action: note> According to ARTLab
    schedule we have 53V SP testing starting from tommorrow. So,
    will be needing the machine for 53V.  So, please take all
    relevant information about the crash and analyse the defect at
    the earliest.  ==== State: Open by: vaslot on 14 April 2010
    08:40:29 ====
    
    Looking...  ==== State: Working by: riveraj on 23 April 2010
    17:21:48 ====
    
    VMM PFT Entry For Page Frame 0000058EAA of 00001FFFFF
    
    pvt = F20080004058EAA0 pft = F200800032157FC0 Not h/w hashed
    s/w hashed sid : 0000000012002 pno : 0000008841 psize : 4K s/w
    pp/noex/hkey : 0/0/00 wimg : 2
    
    > in use > on scb list > on lru list > hidden by xmdbg
    <<<<<<<<< > CMO active > referenced (pvt/pte): 0/0 > modified
    (pft/pvt/pte): 1/0/0 base psx.................. 00 ( 4K) soft
    psx.................. 00 ( 4K) owning frameset...........
    00000000 source page number............ 8841 dev index in
    PDT.............. 0000 disk block number......... 00000000 next
    page sidlist. 000000000008BA6C prev page sidlist.
    0000000000049E11 next page aux......... 000000000000 prev page
    aux......... 000000000000 waitlist.......... 0000000000000000
    logage.................... 00000001 short-term
    pincnt............. 0001 long-term pincnt.............. 0000
    next page LRU......... 000000049E11 prev page LRU.........
    000000049E10 teststkfix gencnt......... 00000000 alias
    list................ FFFFFFFF s/w hash list.....
    FFFFFFFFFFFFFFFF save pp key pgin................ 00 save hw
    key pgin................ 00 DR reference count........ 00000000
    
    We set pft_hidden() only in kmem_hide() and clear it in
    kmem_unhide(). Interestingly, v_kmem_hide() and v_kmem_unhide()
    do not take the SCB lock. I'll have to check with other VMM
    experts why this is the case. Normally when changing state of
    any page in a segment, we hold SCB lock.
    
    At any rate, if this is a higher level problem in net malloc or
    related stuff (such as hiding it twice or forgetting to unhide
    it somewhere), we can probably catch it with some component
    traces in kmem_hide/unhide.
    
        <Note by cde03 (CDE Administration), 2010/04/23 17:53:07,
        seq: 56 rel: 0  action: note>
    <cde:lastmodifiedby>riveraj@us.ibm.com at 2010/04/23 18:47:53,
    action: Modify.</cde:lastmodifiedby> ==== State: Working by:
    riveraj on 23 April 2010 17:47:50 ==== Your view and stream
    information:  -------------------------------- Your integration
    stream:    AIX710_area@/vobs/AIX_pvob Your integration
    view:      grivera_AIX710_area_twitch1_view Your development
    stream:    grivera_SW000301_aix710_dev Your development
    view:      grivera_SW000301_aix710_dev_twitch1_view Your views
    are located in:  /gsa/ausgsa/home/g/r/grivera/ccviews
    
        <Note by grivera (Rivera, Jose German), 2010/04/23
        18:44:21, seq: 57 rel: 0  action: note>
    
    Final Developer Checklist -------------------------
    Created:                Fri Apr 23 18:44:21 CDT 2010 Dev
    Manager:            Priya Paul  (ppaul@us.ibm.com)
    
    Existing Tracks:        bos61D bos61F bos61H bos61J
    
    Changing Behavior?      No Clean Sandbox Build?    Yes Tested
    All Releases?    Yes
    
    FV Testcase Status:   * No FVT needed Reason No FVT Needed:
    No new functionality was added
    
    End Final Developer Checklist
    
        <Note by cde03 (CDE Administration), 2010/04/23 20:43:06,
        seq: 58 rel: 0  action: note>
    <cde:lastmodifiedby>zdai@zdai.austin.ibm.com at 2010/04/23
    21:39:52, action: Modify.</cde:lastmodifiedby> ==== State:
    Working by: zdai on 23 April 2010 20:39:51 ==== <review
    type="normal"> <phase>1</phase> <componentList>
    <component>sysuipc</component> </componentList> <releaseList>
    <release>aix710</release> </releaseList>
    <reviewer>zdai@us.ibm.com</reviewer> <result>pass</result>
    <timestamp>2010-04-23T20:39:40-19:00</timestamp>
    <comments></comments> </review>
    
        <Note by cde03 (CDE Administration), 2010/04/26 13:08:25,
        seq: 59 rel: 0  action: note>
    <cde:lastmodifiedby>aixtools@us.ibm.com at 2010/04/26 14:04:52,
    action: AddNote.</cde:lastmodifiedby> ==== State: Working by:
    aixtools on 26 April 2010 13:03:16 ==== <review type="normal">
    <phase>2</phase> <componentList> <component>ALL</component>
    </componentList> <releaseList> <release>aix710</release>
    </releaseList> <reviewer>aixtools@us.ibm.com</reviewer>
    <result>pass</result>
    <timestamp>2010-04-26T13:03:04-05:00</timestamp>
    <comments>auto-P2 approval for UCMUtilityActivity
    AIXOS01209976: (release aix710)</comments> </review>
    
    ==== State: Working by: aixtools on 26 April 2010 13:03:50 ====
    
    <review type="pkg"> <phase>2</phase> <componentList>
    <component>ALL</component> </componentList> <releaseList>
    <release>aix710</release> </releaseList>
    <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result>
    <timestamp>2010-04-26T13:03:42-05:00</timestamp>
    <comments>auto-P2 approval for UCMUtilityActivity
    AIXOS01209976: (release aix710)</comments> </review>
    
    ==== State: Working by: aixtools on 26 April 2010 13:04:50 ====
    
    P2 review was auto-approved for UCMUtilityActivity
    AIXOS01209976: (release aix710)
    
    TestVerify record for release aix710 (type RISC6000) updated
    with UCM Project and stream.
    
        <Note by jagoodwi (Goodwin, James A.), 2010/04/27 09:38:48,
        seq: 60 rel: 0  action: note> APAR_ABSTRACT=Machine crashed
    at v_kmem_hide+0002B4
    
        <Note by aix (aix CMVC Family id), 2010/04/27 10:04:36,
        seq: 61 rel: 0  action: note> Level 745912.zzz in release
    bos61L has been assigned to selfix61 by user jagoodwi.
    

Local fix

Problem summary

  • =============
    
    Machine crashed at v_kmem_hide+0002B4 while executing fvldd.p.1
    tests.  Machine is in KDB and available for debugging:
    
     mustfix template continued in the next note  <Note by johniac,
    2010/04/13 08:56:34 seq: 8 rel: 0 action: assign>
    m_create_mpool is calling kmem_hide. Not sure if this should go
    to sysuipc, or the pcikngent ode underneath that. Can you take
    a look?
    
    <Note by johniac, 2010/04/13 08:57:16 seq: 8 rel: 0 action:
    assign> m_create_mpool is calling kmem_hide. Not sure if this
    should go to sysuipc, or the pcikngent code underneath that.
    Can you take a look?  <Note by grivera, 2010/04/13 09:52:59
    seq: 9 rel: 0 action: note> looking ...  <Note by grivera,
    2010/04/13 12:08:03 seq: 10 rel: 0 action: note> Apparently
    m_create_mpool() is tryin gto hide the same area of mmmoety
    twice.  I'm still investigating ...  <Note by grivera,
    2010/04/13 15:00:07 seq: 11 rel: 0 action: note> After looking
    at the code that is calling kmem_hide:  m_create_mpool() ->
    kmem_hide_clustpool() -> kmem_hide_cluster() -> kmem_hide()
    
    It is not clear how this code could be calling kmem_hide() for
    the same area of memory twice. The code of kmem_hide_cluster()
    even has its own flag that keeps track if an mbuf cluster is
    already hidden:
    
    void kmem_hide_cluster(struct mbuf *p_mbuf) { char *extbuf =
    p_mbuf->m_ext.ext_buf; u_long extsize = p_mbuf->m_ext.ext_size;
    struct kmemusage *kup;
    ...
    kup = btokup(extbuf);
    
    if ((((extsize >= PSIZE_64K) && (PAGEADDR_64K(extbuf) ==
    (long)extbuf)) || (kup->ku_flags & NMD_PAGE_4K)) &&
    !(p_mbuf->m_extdebug->flags & CLUST_HIDDEN)) {
    kmem_hide(extbuf, extsize); p_mbuf->m_extdebug->flags |=
    CLUST_HIDDEN; }
    ...
    }
    
    This code has not been changed since mid of 2009. So, I don't
    see how this can be a regression, at least no in the mbuf
    code.
    
    I'm going to pass this defect to the owner of kmem_hide() for
    further investigation.  <Note by grivera, 2010/04/13 15:01:27
    seq: 12 rel: 0 action: assign> <Note by pavaman, 2010/04/13
    23:13:03 seq: 13 rel: 0 action: note> According to ARTLab
    schedule we have 53V SP testing starting from tommorrow. So,
    will be needing the machine for 53V.  So, please take all
    relevant information about the crash and analyse the defect at
    the earliest.  ==== State: Open by: vaslot on 14 April 2010
    08:40:29 ====
    
    Looking...  ==== State: Working by: riveraj on 23 April 2010
    17:21:48 ====
    
    VMM PFT Entry For Page Frame 0000058EAA of 00001FFFFF
    
    pvt = F20080004058EAA0 pft = F200800032157FC0 Not h/w hashed
    s/w hashed sid : 0000000012002 pno : 0000008841 psize : 4K s/w
    pp/noex/hkey : 0/0/00 wimg : 2
    
    > in use > on scb list > on lru list > hidden by xmdbg
    <<<<<<<<< > CMO active > referenced (pvt/pte): 0/0 > modified
    (pft/pvt/pte): 1/0/0 base psx.................. 00 ( 4K) soft
    psx.................. 00 ( 4K) owning frameset...........
    00000000 source page number............ 8841 dev index in
    PDT.............. 0000 disk block number......... 00000000 next
    page sidlist. 000000000008BA6C prev page sidlist.
    0000000000049E11 next page aux......... 000000000000 prev page
    aux......... 000000000000 waitlist.......... 0000000000000000
    logage.................... 00000001 short-term
    pincnt............. 0001 long-term pincnt.............. 0000
    next page LRU......... 000000049E11 prev page LRU.........
    000000049E10 teststkfix gencnt......... 00000000 alias
    list................ FFFFFFFF s/w hash list.....
    FFFFFFFFFFFFFFFF save pp key pgin................ 00 save hw
    key pgin................ 00 DR reference count........ 00000000
    
    We set pft_hidden() only in kmem_hide() and clear it in
    kmem_unhide(). Interestingly, v_kmem_hide() and v_kmem_unhide()
    do not take the SCB lock. I'll have to check with other VMM
    experts why this is the case. Normally when changing state of
    any page in a segment, we hold SCB lock.
    
    At any rate, if this is a higher level problem in net malloc or
    related stuff (such as hiding it twice or forgetting to unhide
    it somewhere), we can probably catch it with some component
    traces in kmem_hide/unhide.
    
        <Note by cde03 (CDE Administration), 2010/04/23 17:53:07,
        seq: 56 rel: 0  action: note>
    <cde:lastmodifiedby>riveraj@us.ibm.com at 2010/04/23 18:47:53,
    action: Modify.</cde:lastmodifiedby> ==== State: Working by:
    riveraj on 23 April 2010 17:47:50 ==== Your view and stream
    information:  -------------------------------- Your integration
    stream:    AIX710_area@/vobs/AIX_pvob Your integration
    view:      grivera_AIX710_area_twitch1_view Your development
    stream:    grivera_SW000301_aix710_dev Your development
    view:      grivera_SW000301_aix710_dev_twitch1_view Your views
    are located in:  /gsa/ausgsa/home/g/r/grivera/ccviews
    
        <Note by grivera (Rivera, Jose German), 2010/04/23
        18:44:21, seq: 57 rel: 0  action: note>
    
    Final Developer Checklist -------------------------
    Created:                Fri Apr 23 18:44:21 CDT 2010 Dev
    Manager:            Priya Paul  (ppaul@us.ibm.com)
    
    Existing Tracks:        bos61D bos61F bos61H bos61J
    
    Changing Behavior?      No Clean Sandbox Build?    Yes Tested
    All Releases?    Yes
    
    FV Testcase Status:   * No FVT needed Reason No FVT Needed:
    No new functionality was added
    
    End Final Developer Checklist
    
        <Note by cde03 (CDE Administration), 2010/04/23 20:43:06,
        seq: 58 rel: 0  action: note>
    <cde:lastmodifiedby>zdai@zdai.austin.ibm.com at 2010/04/23
    21:39:52, action: Modify.</cde:lastmodifiedby> ==== State:
    Working by: zdai on 23 April 2010 20:39:51 ==== <review
    type="normal"> <phase>1</phase> <componentList>
    <component>sysuipc</component> </componentList> <releaseList>
    <release>aix710</release> </releaseList>
    <reviewer>zdai@us.ibm.com</reviewer> <result>pass</result>
    <timestamp>2010-04-23T20:39:40-19:00</timestamp>
    <comments></comments> </review>
    
        <Note by cde03 (CDE Administration), 2010/04/26 13:08:25,
        seq: 59 rel: 0  action: note>
    <cde:lastmodifiedby>aixtools@us.ibm.com at 2010/04/26 14:04:52,
    action: AddNote.</cde:lastmodifiedby> ==== State: Working by:
    aixtools on 26 April 2010 13:03:16 ==== <review type="normal">
    <phase>2</phase> <componentList> <component>ALL</component>
    </componentList> <releaseList> <release>aix710</release>
    </releaseList> <reviewer>aixtools@us.ibm.com</reviewer>
    <result>pass</result>
    <timestamp>2010-04-26T13:03:04-05:00</timestamp>
    <comments>auto-P2 approval for UCMUtilityActivity
    AIXOS01209976: (release aix710)</comments> </review>
    
    ==== State: Working by: aixtools on 26 April 2010 13:03:50 ====
    
    <review type="pkg"> <phase>2</phase> <componentList>
    <component>ALL</component> </componentList> <releaseList>
    <release>aix710</release> </releaseList>
    <reviewer>aixtools@us.ibm.com</reviewer> <result>pass</result>
    <timestamp>2010-04-26T13:03:42-05:00</timestamp>
    <comments>auto-P2 approval for UCMUtilityActivity
    AIXOS01209976: (release aix710)</comments> </review>
    
    ==== State: Working by: aixtools on 26 April 2010 13:04:50 ====
    
    P2 review was auto-approved for UCMUtilityActivity
    AIXOS01209976: (release aix710)
    
    TestVerify record for release aix710 (type RISC6000) updated
    with UCM Project and stream.
    
        <Note by jagoodwi (Goodwin, James A.), 2010/04/27 09:38:48,
        seq: 60 rel: 0  action: note> APAR_ABSTRACT=Machine crashed
    at v_kmem_hide+0002B4
    
        <Note by aix (aix CMVC Family id), 2010/04/27 10:04:36,
        seq: 61 rel: 0  action: note> Level 745912.zzz in release
    bos61L has been assigned to selfix61 by user jagoodwi.
    
        <Note by selfix61 (Enno, Robert S. (Bob)), 2010/04/27
        10:57:49, seq: 62 rel: 0  action: note> ada generated note
    : (-c = '')
      PC   -addapar 748941 -apar IZ75175  -product aix -version 61L
      (rc = 0)
    
    
    APAR_aix_61L = IZ75175
    .
    

Problem conclusion

  • Ensure that kmem_hide() is only called if the address of the
    mbuf cluster is page-aligned.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ78643

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2010-07-02

  • Closed date

    2010-07-02

  • Last modified date

    2013-03-28

  • APAR is sysrouted FROM one or more of the following:

    IZ75175

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U834695

       UP10/08/20 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
28 March 2013