Mage_Weee and why it is important for tax calculation

TL;DR

Install Topological Sort to fix the sorting algorithm.

Problem

Mage_Weee is a module which is needed in Magento to calculate the so called Waste Electrical and Electronic Equipment Directive. It is a tax in Europe for electronic stuff, so that it is already paid, when you throws it away. Don't ask me. We don't sell any kind of electric stuff, so it is a good decision to remove the module.

<!-- app/etc/modules/zzzzzz_DeactivatedModules.xml-->  
<?xml version="1.0"?>  
<config>  
    <modules>
        <Mage_Weee>
            <active>false</active>
        </Mage_Weee>
    </modules>
</config>  

Missing Mage_Weee adds tax on product instead of including it

What happens then was very irritating for me.

Before:

After:

You might sense the miscalculation of the tax. Our tax settings are still correct and everything is fine, but the tax is added to the price, instead of being included.

Easiest fix here is to just NOT deactivate Mage_Weee, but this is unsatisfying. So I dig deeper (and have already an idea, what happens):

Magento total models

Magento is calculating all the stuff in quote and order with total models. You can check what total models are called in

\Mage_Sales_Model_Quote_Address::collectTotals
foreach ($this->getTotalCollector()->getCollectors() as $model) {  
            echo get_class($model) . "\n";
            $model->collect($this);
        }

Billing address with Mage_Weee

  • Mage_Sales_Model_Quote_Address_Total_Nominal
  • Mage_Sales_Model_Quote_Address_Total_Subtotal
  • Mage_Sales_Model_Quote_Address_Total_Msrp
  • Mage_SalesRule_Model_Quote_Freeshipping
  • Mage_Tax_Model_Sales_Total_Quote_Subtotal
  • Mage_Weee_Model_Total_Quote_Weee
  • Mage_Sales_Model_Quote_Address_Total_Shipping
  • Mage_Tax_Model_Sales_Total_Quote_Shipping
  • Mage_SalesRule_Model_Quote_Discount
  • Mage_Tax_Model_Sales_Total_Quote_Tax
  • Mage_Sales_Model_Quote_Address_Total_Grand

Billing address without Mage_Weee

  • Mage_SalesRule_Model_Quote_Freeshipping
  • Mage_Tax_Model_Sales_Total_Quote_Subtotal
  • Mage_Tax_Model_Sales_Total_Quote_Shipping
  • Mage_SalesRule_Model_Quote_Discount
  • Mage_Tax_Model_Sales_Total_Quote_Tax
  • Mage_Sales_Model_Quote_Address_Total_Msrp
  • Mage_Sales_Model_Quote_Address_Total_Nominal
  • Mage_Sales_Model_Quote_Address_Total_Subtotal
  • Mage_Sales_Model_Quote_Address_Total_Shipping
  • Mage_Sales_Model_Quote_Address_Total_Grand

As you can see, the order is totally different. I'm cleaning up the list a little bit:

WITH WeeeWithOUT Weee
Mage_Sales_Model_Quote_Address_Total_Subtotal Mage_Tax_Model_Sales_Total_Quote_Subtotal
Mage_Tax_Model_Sales_Total_Quote_Subtotal Mage_Tax_Model_Sales_Total_Quote_Shipping
Mage_Sales_Model_Quote_Address_Total_Shipping Mage_Tax_Model_Sales_Total_Quote_Tax
Mage_Tax_Model_Sales_Total_Quote_Shipping Mage_Sales_Model_Quote_Address_Total_Subtotal
Mage_Tax_Model_Sales_Total_Quote_Tax Mage_Sales_Model_Quote_Address_Total_Shipping
Mage_Sales_Model_Quote_Address_Total_Grand Mage_Sales_Model_Quote_Address_Total_Grand

I think (I didn't check it yet), that the Mage_Tax totals expect, that the Mage_Sales already ran and collected all the sums, etc. So when you are changing the order from first Mage_Sales, then Mage_Tax, you get wrong results.

Ordering multiple dependend nodes in a graph is a hard problem

The problem lays in the ordering algorithm. All the totals have a before and after in their definition.

<!-- app/code/core/Mage/Tax/etc/config.xml:160 -->
    <sales>
        <quote>
            <totals>
                <tax_subtotal>
                    <class>tax/sales_total_quote_subtotal</class>
                    <after>freeshipping</after>
                    <before>tax,discount</before>
                </tax_subtotal>
                <tax_shipping>
                    <class>tax/sales_total_quote_shipping</class>
                    <after>shipping,tax_subtotal</after>
                    <before>tax,discount</before>
                </tax_shipping>
               [...]

All the ordering is done here:
\Mage_Sales_Model_Config_Ordered and this is buggy.

How to solve it the great way

The hard solution for this is:

  1. Implement a real graph algorithm which solves all the dependencies and orders the total models correct
  2. Play with all the before and after nodes until the order is correct.

I neither implemented a cool algorithm yet (or looked it up) to solve the problem, nor did I play with the values until everything is fine. I'm staying with the solution: Not deactivating Mage_Weee for the moment.

Update: Thanks to @Daniel_Sloof, more informations from @VinaiKopp

Update 2: I found even more on this topic on Stackoverflow from @s3lf

Domains, Domains, Domains

I dumped a shop from my customer and imported it to my local system, as you can guess, the local domain is another one. All my domains are like: <customer>.dev.

Where to change domains

Secure/Unsecure Base URL

From core_config_data:

default 0   web/unsecure/base_url   http://customer.dev/
default 0   web/secure/base_url     http://customer.dev/

Check the skin/media/js url although!

default 0   web/unsecure/base_link_url  {{unsecure_base_url}}
default 0   web/unsecure/base_skin_url  {{unsecure_base_url}}skin/
default 0   web/unsecure/base_media_url {{unsecure_base_url}}media/
default 0   web/unsecure/base_js_url    {{unsecure_base_url}}js/
default 0   web/secure/base_link_url    {{secure_base_url}}
default 0   web/secure/base_skin_url    {{secure_base_url}}skin/
default 0   web/secure/base_media_url   {{secure_base_url}}media/
default 0   web/secure/base_js_url      {{secure_base_url}}js/

Cookies

For cookies you should add the domain and the path, to secure your cookie. In your development environment, you can just use these settings:

default 0   web/cookie/cookie_path      NULL
default 0   web/cookie/cookie_domain    NULL

SSL Everywhere or HSTS

We have to secure all the data of our users, not only registration, checkout and login. We need to secure the session data too.

SSL Everywhere or HTTP Strict Transport Security

(I hope) Everyone knows, that it is important to secure (read as encrypt) our customer’s data. Because of all the evil hackers in the world and the bad ISPs which intercept all our data.

TL;DR

Use HTTP Strict Transport Security!

The first problem: unencrypted personal data

But the real problems are the all-day security problems:
* I’m sitting at Starbucks, surfing over their unencrypted wifi, enjoy my coffee and work. Hopefully, all connections are encrypted: email, jabber, VPN to the office, what’s app... But often this is not the case. What happens, if a connection is not encrypted? Everyone in the wifi can listen.

You can encrypt the wifi, but for example WEP doesn't solve the problem, because it doesn't have user isolation. But WPA helps.

If the connection is not encrypted, one can read your emails, your whats app messages or your email login.

First solution: TLS (formerly known as SSL)

Because of the painted scenario all our login and registration pages are SSL secured. We encrypt every transmission, where important data are sent. You can use TLS for nearly everything:

  • SMTP -> SMTPS
  • IMAP -> IMAPS
  • POP3 -> POP3S
  • HTTP -> HTTPS
  • and so on

Second problem: unencrypted session data

Do you only encrypt the login and registration page? Oh, the checkout too. Great! But what is with all the other pages, like »Home«, »Privacy Policy«, »About Us« and so on?
Do you think there are no important data transmitted? You are wrong. With every request there is the session ID sent in a cookie. This means, you convey (a maybe authorized session id) unencrypted personal data in your HTTP header.

The attack vector might be (in Magento):
* getting personal data: address, order history, wishlist, payment data * order with the cusomer’s account to a different address, billing the account owner * spending the bonus points or the customer’s credit * download already paid virtual content which is found in the account

Second solution: SSL Everywhere

Use SSL everywhere. On every request. If the user comes to your page redirect him directly on HTTPS. Use an official signed certificate, so the user knows, he can trust you.

Third problem: SSL Stripping and ARP Spoofing

There is still at least one problem left. It is called "HTTPS stripping attacks". Moxie Marlinspike implemented a tool called sslstrip and recorded a nice video to demonstrate it.

A hacker can pipe all your traffic (under the correct circumstances) through his own machine and transform all the secured (https) links to insecure links (http).

ARP Spoofing

ARP is a protocol to find the shortest path to another address inside the network, for example between your computer and the router in the network. Alt text visualization made by 0x55534C, thanks for that.

ARP Spoofing means, you flood the network with ARP packets and define the way through your computer as the fastest way to the router. This way, all the traffic is piped through your machine.

Now you have the full control over all the traffic. You can read it, you can change it or you can drop some or all packets.

Encryption helps

If your packets are encrypted, you don't have this problem. Because the man-in-the-middle (MITM) can't do anything without your recognition. SSL checks itself for integrity.

The precise problem: Redirection at the beginning

But you remember? The very first connection to your shop is to HTTP://www.my-shop.example. This means, the connection is unencrypted and an attacker can do his job.

The attacker (let's call him Mallory) ARP spoofs the victims (Alice) laptop, reroutes Alice's traffic through his machine and removes the HTTP Location header. Then Mallory loads the https version of the site Alice wants, changes every https:// to http:// and pipes it to Alice's computer. Now Mallory can do her evil work.

It is important to understand, that most users don't realize or check, wether they are on a https:// site. Or wether their address bar is green or blue. Don't rely on the user.

Third/Final solution: Use HTTP Strict Transport Security (HSTS)

HSTS is a server side HTTP Header, specified in RFC 6796.

It does two things:
1. It ensures, that the connection is secure. If it is not possible to connect to the server on a secure connection, then an error is shown. This is applied in Chrome, Firefox and Opera so, that a connection via http is no longer possible.
2. It transforms every link on the page from http to https.

How HSTS works

HSTS is a HTTP Header:

Strict-Transport-Security "max-age=31536000"
Strict-Transport-Security "max-age=31536000; includeSubDomains"

This means: Don't allow insecure connections for the next 365 days to my domain, with or without subdomains.

There is still one problem: The very first request may still be http. This consideration is correct. But the idea is: Hopefully the user are at home or in any secure network, when the user makes this request. If not, he is doomed. ;-)

However, if this first request is made, he is secure for the next 365 (or whatever the timespan is) days (on this device!).

As you can see: This final solution I showed to you doesn't guarantee complete security but they minimize the risk for a security breach.

Advertisement: Magento Module for HSTS

I implemented a module, which does all this for magento: Ikonoshirt_StrictTransportSecurity

More Information

Add Amin User

Magento Cheat Sheet, how to add a magento admin user, directly thorugh SQL

Thanks to Atwix, i found the SQL which is needed to create a magento admin user.

Cheat sheet for me:

SET @fistname = 'Fabian';
SET @lastname = 'Blechschmidt';
SET @email = 'mailaddress@example.com';
SET @username = 'fabian';
SET @password = 'password';
SET @salt = 'Fl';

INSERT INTO admin_user
SELECT
NULL user_id,
@fistname firstname,
@lastname lastname,
@email email,
@username username,
CONCAT(MD5(CONCAT(@salt, @password)), ':', @salt) password,
NOW( ) created,
NULL modified,
NULL logdate,
0 lognum,
0 reload_acl_flag,
1 is_active,
(SELECT MAX(extra) FROM admin_user WHERE extra IS NOT NULL) extra,
NULL rp_token,
NOW() rp_token_created_at;

INSERT INTO admin_role
SELECT
NULL role_id,
(SELECT role_id FROM admin_role WHERE role_name = 'Administrators') parent_id,
2 tree_level,
0 sort_order,
'U' role_type,
(SELECT user_id FROM admin_user WHERE username = @username) user_id,
@username role_name

You have imported the sample data (which are based on EE)? Use this:

SET @fistname = 'Fabian';  
SET @lastname = 'Blechschmidt';  
SET @email = 'mailaddress@example.com';  
SET @username = 'fabian';  
SET @password = 'password';  
SET @salt = 'Fl';


INSERT INTO admin_user  
SELECT  
NULL user_id,  
@fistname firstname,
@lastname lastname,
@email email,
@username username,
CONCAT(MD5(CONCAT(@salt, @password)), ':', @salt) password,  
NOW( ) created,  
NULL modified,  
NULL logdate,  
0 lognum,  
0 reload_acl_flag,  
1 is_active,  
(SELECT MAX(extra) FROM admin_user WHERE extra IS NOT NULL) extra,
NULL rp_token,  
NOW() rp_token_created_at,  
NULL failures_num,  
NULL first_failure,  
NULL lock_expires;

INSERT INTO admin_role  
SELECT  
NULL role_id,  
(SELECT role_id FROM admin_role WHERE role_name = 'Administrators') parent_id,
2 tree_level,  
0 sort_order,  
'U' role_type,  
(SELECT user_id FROM admin_user WHERE username = @username) user_id,
@username role_name,
1 gws_is_all,  
1 gws_websites,  
1 gws_store_groups;  

NOT LOGGED IN with ID 4

Standard customer group with "wrong" id and the consequences

Yesterday I debugged an interessting behaviour:

All categories were shown in the shop, no products. If you request the product directly /catalog/product/view/id/12345 you found the product and everything was shown.

The standard errors were tested:

  • reindex
  • clean cache
  • activate logging
  • use the default template

Nothing helped. It took me about an hour to find the following bug. And to be honest, it was real luck. If you ask me, with this error you can search for days!

How to find the problem?

I checked the database for the indexes, everything looked great, magento said, all indexes are ready and useable.

I had a look into the backend and everything was fine. Products were actived, had a qty > 0 and were in stock. I tried to add a product to the website but it didn't help.

Then I dig into the ProductList of the Category view. I die($_productCollection->getSelect()) the query and played with it. after removing the priceindex I got results!

Yea! Results!

The price_index filtered for customer_group = 0. I know this is the id for the NOT LOGGED IN, so I had a look into the customer_group table. Don't ask me why but THIS IS WRONG:

customer_group_id  |  customer_group_code  |  tax_class_id
-------------------|-----------------------|--------------
                4  |        NOT LOGGED IN  |             3

I had a look into the price_index and found prices for the customer groups with the IDs 1,2,3 and 4. No prices for 0.

Ok, I built a long query to change this, I tried it in a big transaction but it didn't work - don't ask me why, it threw after the first query a

Cannot add or update a child row: a foreign key constraint fails

if you can tell me why, please send me an email!

So I skipped the foreign key check:

START TRANSACTION;
UPDATE `salesrule_customer_group` SET `customer_group_id` = '0' WHERE `salesrule_customer_group`.`customer_group_id` =4;
UPDATE `salesrule_product_attribute` SET `customer_group_id` = '0' WHERE `salesrule_product_attribute`.`customer_group_id` =4;
UPDATE `customer_group` SET `customer_group_id` = '0' WHERE `customer_group`.`customer_group_id` =4;
UPDATE `catalog_product_bundle_price_index` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalog_product_entity_group_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalog_product_entity_tier_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalog_product_index_group_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalog_product_index_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalog_product_index_tier_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalogindex_minimal_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalogrule_customer_group` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalogrule_group_website` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalogrule_product` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `catalogrule_product_price` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
UPDATE `weee_discount` SET `customer_group_id` = '0' WHERE `customer_group_id` =4;
COMMIT;

But why is the wrong ID a problem?

Magento expects that the NOTLOGGEDIN customer group have the id 0:

Mage_Customer_Model_Group::NOT_LOGGED_IN_ID  = 0

And this const is used in:

// app/code/core/Mage/Catalog/Model/Resource/Product/Collection.php:1340
// Mage_Catalog_Model_Resource_Product_Collection::addPriceData()

$customerGroupId = Mage::getSingleton('customer/session')->getCustomerGroupId();

// app/code/core/Mage/Customer/Model/Session.php:174
// Mage_Customer_Model_Session::getCustomerGroupId()

public function getCustomerGroupId()
{
    if ($this->getData('customer_group_id')) {
        return $this->getData('customer_group_id');
    }
    if ($this->isLoggedIn() && $this->getCustomer()) {
        return $this->getCustomer()->getGroupId();
    }
    return Mage_Customer_Model_Group::NOT_LOGGED_IN_ID;
}

How does this happen?

I have no idea. Mage_Customer has three install scripts, all of them create the customer group NOT LOGGED IN explicit with the ID 0.

Conclusion

The index is created with a big SQL statement (I think, didn't check this) and therefore uses the customer_group_id from the database and magento's product list uses the constat from Mage_Customer_Model_Group if they don't match, you have a problem.

UPDATE

Damian tweeted that you have to place a

SET sql_mode='NO_AUTO_VALUE_ON_ZERO';

at the beginning of your SQL dump. If you don't do this, your admin store, the customer_group, etc. geting new IDs.

Joining a flat table on EAV

How to use the Mage_Eav_Model_Entity_Collection_Abstract::joinTable() method to join a flat table on a magento EAV collection

Joining a "normal" (aka flat table) to a magento EAV table is easy - if you know how to do.

As always there are a lot resources with tipps I don't like about joining Tables, because they all use getSelect()->join() and fall back to Zend_Db to join the tables. This can (but don't have to) lead to a few problems. In my past I got for example problems with getSize(), because the changes are only made to the Select-Statement not the Count-Statement.

joinTable() TL;DR

But magento collections have a method joinTable() it took me 45min to fizzle out how it is used. To avoid this for you, I share it.

$productCollection->joinTable(
    array('bonus' => 'mycompany/bonus'), 'product_id=entity_id',
    array('bonus_id' => 'bonus_id')
);

The parameters are:

public function joinTable($table, $bind, $fields = null, $cond = null, $joinType = 'inner')
  1. Table is easy, it is the magento namespace/entity format, which you use in your configuration, resource models and the collection. You can use an array of the format array('alias' => 'namespace/entity')
  2. Bind means, the ON statement in your SQL. This was the hardest part and I will explain it in details later. It is important to have your field of the flat table BEFORE and your attribute of the EAV table AFTER the equal sign. Don't use main_table. before the attribute. Magento will do this for you. More on this later.
  3. Fields is an array. If you use a string instead, you get this: Warning: Invalid argument supplied for foreach() in /var/www/magento-1.6.2.0/app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php on line 775. You can use an array of the format array('field1', 'field2', '...') or array('alias' => 'field1', '...')
  4. Condition is a * WHERE ON condition* in the SQL.
  5. joinType. I hope you know what it is. But you have only the choice between LEFT and INNER. app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php:793

more details

The method can be found here:

app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php:754

check the tablename and alias

First, the array of the tablename is splitted into alias and tablename, the tablename is resolved and the alias is checked for existance.

$tableAlias = null;
if (is_array($table)) {
    list($tableAlias, $tableName) = each($table);
} else {
    $tableName = $table;
}

// validate table
if (strpos($tableName, '/') !== false) {
    $tableName = Mage::getSingleton('core/resource')->getTableName($tableName);
}
if (empty($tableAlias)) {
    $tableAlias = $tableName;
}

check the fields

Then the fields are checked. Are they already defined? If not, add them.

// validate fields and aliases
if (!$fields) {
    throw Mage::exception('Mage_Eav', Mage::helper('eav')->__('Invalid joint fields'));
}
foreach ($fields as $alias=>$field) {
    if (isset($this->_joinFields[$alias])) {
        throw Mage::exception(
            'Mage_Eav',
            Mage::helper('eav')->__('A joint field with this alias (%s) is already declared', $alias)
        );
    }
    $this->_joinFields[$alias] = array(
        'table' => $tableAlias,
        'field' => $field,
    );
}

the hard part, check the bind

The bind is exploded at the =, and the foreign key is expected after the equality sign. It took me 25min to find this. I didn't think about this. I thought it will be took directly into the SQL, but no, magento processes it a lot. If the column of your flat table is the second "argument", magento can't find the attribute and throws an exception: Invalid attribute name: product_id

I started with 'main_table.entity_id=bonus.product_id' and ended with 'product_id=entity_id' and here is the code:

// validate bind
list($pk, $fk) = explode('=', $bind);
$bindCond = $tableAlias . '.' . $pk . '=' . $this->_getAttributeFieldName($fk);

Then the join method is choosed and the conditions are added. Interessting to see here is, that the whole condition is put into the ON statement. To be honest, I don't know the difference between ON and WHERE, but this looks like ON is processed on the table and WHERE is processed on the result?

And the second gem, I found here is str_replace('{{table}}', $tableAlias, $cond)., so you can use {{table}} if your condition is a string instead of the alias.

// process join type
switch ($joinType) {
    case 'left':
        $joinMethod = 'joinLeft';
        break;

    default:
        $joinMethod = 'join';
}
$condArr = array($bindCond);

// add where condition if needed
if ($cond !== null) {
    if (is_array($cond)) {
        foreach ($cond as $k => $v) {
            $condArr[] = $this->_getConditionSql($tableAlias.'.'.$k, $v);
        }
    } else {
        $condArr[] = str_replace('{{table}}', $tableAlias, $cond);
    }
}
$cond = '('.implode(') AND (', $condArr).')';

// join table
$this->getSelect()->$joinMethod(array($tableAlias => $tableName), $cond, $fields);

return $this;

The end

And in the end, magento uses getSelect->join() wonderful :-) BUT magento makes a lot of checks for you and using this method feels better.